Objective: Determine whether agreement among annotators improves after being trained to use an annotation schema that specifies: what types of clinical conditions to annotate, the linguistic form of the annotations, and which modifiers to include.
Methods: Three physicians and 3 lay people individually annotated all clinical conditions in 23 emergency department reports. For annotations made using a Baseline Schema and annotations made after training on a detailed annotation schema, we compared: (1) variability of annotation length and number and (2) annotator agreement, using the F-measure.
Results: Physicians showed higher agreement and lower variability after training on the detailed annotation schema than when applying the Baseline Schema. Lay people agreed with physicians almost as well as other physicians did but showed a slower learning curve.
Conclusion: Training annotators on the annotation schema we developed increased agreement among annotators and should be useful in generating reference standard sets for natural language processing studies. The methodology we used to evaluate the schema could be applied to other types of annotation or classification tasks in biomedical informatics.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.ijmedinf.2007.01.002 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!