AI Article Synopsis

  • The paper discusses the Mayo Clinic's coreference resolution system, MedCoref, developed for the i2b2/VA/Cincinnati shared task, focusing on linking entities across medical documents.
  • It employs a multi-pass sieve algorithm combining deterministic rules and machine learning to enhance accuracy in identifying treatment, problems, tests, people, and pronouns in clinical notes.
  • The system achieved strong performance scores of 0.836 and 0.843 for the training and test sets, respectively, demonstrating the efficacy of combining simple rules with advanced techniques in complex data environments.

Article Abstract

Objective: This paper describes the coreference resolution system submitted by Mayo Clinic for the 2011 i2b2/VA/Cincinnati shared task Track 1C. The goal of the task was to construct a system that links the markables corresponding to the same entity.

Materials And Methods: The task organizers provided progress notes and discharge summaries that were annotated with the markables of treatment, problem, test, person, and pronoun. We used a multi-pass sieve algorithm that applies deterministic rules in the order of preciseness and simultaneously gathers information about the entities in the documents. Our system, MedCoref, also uses a state-of-the-art machine learning framework as an alternative to the final, rule-based pronoun resolution sieve.

Results: The best system that uses a multi-pass sieve has an overall score of 0.836 (average of B(3), MUC, Blanc, and CEAF F score) for the training set and 0.843 for the test set.

Discussion: A supervised machine learning system that typically uses a single function to find coreferents cannot accommodate irregularities encountered in data especially given the insufficient number of examples. On the other hand, a completely deterministic system could lead to a decrease in recall (sensitivity) when the rules are not exhaustive. The sieve-based framework allows one to combine reliable machine learning components with rules designed by experts.

Conclusion: Using relatively simple rules, part-of-speech information, and semantic type properties, an effective coreference resolution system could be designed. The source code of the system described is available at https://sourceforge.net/projects/ohnlp/files/MedCoref.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3422831PMC
http://dx.doi.org/10.1136/amiajnl-2011-000766DOI Listing

Publication Analysis

Top Keywords

multi-pass sieve
12
machine learning
12
coreference resolution
8
system
8
resolution system
8
coreference analysis
4
analysis clinical
4
clinical notes
4
notes multi-pass
4
sieve alternate
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!