Objective: We present the Berlin-Tübingen-Oncology corpus (BRONCO), a large and freely available corpus of shuffled sentences from German oncological discharge summaries annotated with diagnosis, treatments, medications, and further attributes including negation and speculation. The aim of BRONCO is to foster reproducible and openly available research on Information Extraction from German medical texts.
Materials And Methods: BRONCO consists of 200 manually deidentified discharge summaries of cancer patients.
Motivation: Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms. Further, annotation tools are also used to extract new information for a particular use case. However, owing to the high number of existing annotation tools, finding the one that best fits particular needs is a demanding task that requires searching the scientific literature followed by installing and trying various tools.
View Article and Find Full Text PDFBackground: Diagnosis and treatment decisions in cancer increasingly depend on a detailed analysis of the mutational status of a patient's genome. This analysis relies on previously published information regarding the association of variations to disease progression and possible interventions. Clinicians to a large degree use biomedical search engines to obtain such information; however, the vast majority of scientific publications focus on basic science and have no direct clinical impact.
View Article and Find Full Text PDFPurpose: Precision oncology depends on the availability of up-to-date, comprehensive, and accurate information about associations between genetic variants and therapeutic options. Recently, a number of knowledge bases (KBs) have been developed that gather such information on the basis of expert curation of the scientific literature. We performed a quantitative and qualitative comparison of Clinical Interpretations of Variants in Cancer, OncoKB, Cancer Gene Census, Database of Curated Mutations, CGI Biomarkers (the cancer genome interpreter biomarker database), Tumor Alterations Relevant for Genomics-Driven Therapy, and the Precision Medicine Knowledge Base.
View Article and Find Full Text PDFBackground: The decreasing cost of obtaining high-quality calls of genomic variants and the increasing availability of clinically relevant data on such variants are important drivers for personalized oncology. To allow rational genome-based decisions in diagnosis and treatment, clinicians need intuitive access to up-to-date and comprehensive variant information, encompassing, for instance, prevalence in populations and diseases, functional impact at the molecular level, associations to druggable targets, or results from clinical trials. In practice, collecting such comprehensive information on genomic variants is difficult since the underlying data is dispersed over a multitude of distributed, heterogeneous, sometimes conflicting, and quickly evolving data sources.
View Article and Find Full Text PDF