Purpose: To compare the accuracy and reliability of a natural language processing (NLP) algorithm with manual coding by radiologists, and the combination of the two methods, for the identification of patients whose computed tomography (CT) reports raised the concern for lung cancer.
Methods: An NLP algorithm was developed using Clinical Text Analysis and Knowledge Extraction System (cTAKES) with the Yale cTAKES Extensions and trained to differentiate between language indicating benign lesions and lesions concerning for lung cancer. A random sample of 450 chest CT reports performed at Veterans Affairs Connecticut Healthcare System between January 2014 and July 2015 was selected.
Arsenic has a dual role as causative and curative agent of human disease. Therefore, there is considerable interest in elucidating arsenic toxicity and detoxification mechanisms. By an ensemble modelling approach, we identified a best parsimonious mathematical model which recapitulates and predicts intracellular arsenic dynamics for different conditions and mutants, thereby providing novel insights into arsenic toxicity and detoxification mechanisms in yeast, which could partly be confirmed experimentally by dedicated experiments.
View Article and Find Full Text PDFObjective: To compare linear and Laplacian SVMs on a clinical text classification task; to evaluate the effect of unlabeled training data on Laplacian SVM performance.
Background: The development of machine-learning based clinical text classifiers requires the creation of labeled training data, obtained via manual review by clinicians. Due to the effort and expense involved in labeling data, training data sets in the clinical domain are of limited size.
Background: Timeliness of care improves patient satisfaction and might improve outcomes. The CCCP was established in November 2007 to improve timeliness of care of NSCLC at the Veterans Affairs Connecticut Healthcare System (VACHS).
Patients And Methods: We performed a retrospective cohort analysis of patients diagnosed with NSCLC at VACHS between 2005 and 2010.
J Am Med Inform Assoc
December 2013
Background: Word sense disambiguation (WSD) methods automatically assign an unambiguous concept to an ambiguous term based on context, and are important to many text-processing tasks. In this study we developed and evaluated a knowledge-based WSD method that uses semantic similarity measures derived from the Unified Medical Language System (UMLS) and evaluated the contribution of WSD to clinical text classification.
Methods: We evaluated our system on biomedical WSD datasets and determined the contribution of our WSD system to clinical document classification on the 2007 Computational Medicine Challenge corpus.
BMC Bioinformatics
October 2012
Background: Semantic similarity measures estimate the similarity between concepts, and play an important role in many text processing tasks. Approaches to semantic similarity in the biomedical domain can be roughly divided into knowledge based and distributional based methods. Knowledge based approaches utilize knowledge sources such as dictionaries, taxonomies, and semantic networks, and include path finding measures and intrinsic information content (IC) measures.
View Article and Find Full Text PDFIn this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to enable discrimination between documents of different classes. We developed novel information-theoretic techniques that utilize the taxonomical structure of the Unified Medical Language System (UMLS) to improve feature ranking, and we developed a semantic similarity measure that projects clinical text into a feature space that improves classification.
View Article and Find Full Text PDFJ Am Med Inform Assoc
January 2012
Background: Open-source clinical natural-language-processing (NLP) systems have lowered the barrier to the development of effective clinical document classification systems. Clinical natural-language-processing systems annotate the syntax and semantics of clinical text; however, feature extraction and representation for document classification pose technical challenges.
Methods: The authors developed extensions to the clinical Text Analysis and Knowledge Extraction System (cTAKES) that simplify feature extraction, experimentation with various feature representations, and the development of both rule and machine-learning based document classifiers.
Next-generation sequencing technologies enable the identification of sequence variation in the genome and transcriptome. Differences between the reference genome and transcript libraries complicate the determination of the effect of genomic sequence variants on protein products; similarly, these differences complicate the mapping of sequence variants found in transcripts to their respective genomic position. We have developed MU2A, a publicly available web service for variant annotation that reconciles differences between the genome and transcriptome, enabling the rapid and accurate determination of the effects of genomic variants on protein products, and the mapping of variants detected in transcripts to genomic coordinates.
View Article and Find Full Text PDF