Objective: To annotate a corpus of randomized controlled trial (RCT) publications with the checklist items of CONSORT reporting guidelines and using the corpus to develop text mining methods for RCT appraisal.
Methods: We annotated a corpus of 50 RCT articles at the sentence level using 37 fine-grained CONSORT checklist items. A subset (31 articles) was double-annotated and adjudicated, while 19 were annotated by a single annotator and reconciled by another.
Background: In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets.
View Article and Find Full Text PDFBackground: With the substantial growth in the biomedical research literature, a larger number of claims are published daily, some of which seemingly disagree with or contradict prior claims on the same topics. Resolving such contradictions is critical to advancing our understanding of human disease and developing effective treatments. Automated text analysis techniques can facilitate such analysis by extracting claims from the literature, flagging those that are potentially contradictory, and identifying any study characteristics that may explain such contradictions.
View Article and Find Full Text PDFQuantifying scientific impact of researchers and journals relies largely on citation counts, despite the acknowledged limitations of this approach. The need for more suitable alternatives has prompted research into developing advanced metrics, such as h-index and Relative Citation Ratio (RCR), as well as better citation categorization schemes to capture the various functions that citations serve in a publication. One such scheme involves citation sentiment: whether a reference paper is cited positively (agreement with the findings of the reference paper), negatively (disagreement), or neutrally.
View Article and Find Full Text PDFObjective: To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency.
Methods: To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated.
Informatics methodologies exploit computer-assisted techniques to help biomedical researchers manage large amounts of information. In this paper, we focus on the biomedical research literature (MEDLINE). We first provide an overview of some text mining techniques that offer assistance in research by identifying biomedical entities (e.
View Article and Find Full Text PDFBiomedical knowledge claims are often expressed as hypotheses, speculations, or opinions, rather than explicit facts (propositions). Much biomedical text mining has focused on extracting propositions from biomedical literature. One such system is SemRep, which extracts propositional content in the form of subject-predicate-object triples called predications.
View Article and Find Full Text PDFBackground: Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature.
View Article and Find Full Text PDFWe describe the use of a domain-independent methodology to extend a natural language processing (NLP) application, SemRep (Rindflesch, Fiszman, & Libbus, 2005), based on the knowledge sources afforded by the Unified Medical Language System (UMLS®) (Humphreys, Lindberg, Schoolman, & Barnett, 1998) to support the area of health promotion within the public health domain. Public health professionals require good information about successful health promotion policies and programs that might be considered for application within their own communities. Our effort seeks to improve access to relevant information for the public health profession, to help those in the field remain an information-savvy workforce.
View Article and Find Full Text PDFThe semantic relatedness between two concepts, according to human perception, is domain-rooted and reflects prior knowledge. We developed a new method for semantic relatedness assessment that reflects human judgment, utilizing semantic predications extracted from PubMed citations by SemRep. We compared the new method to other approaches utilizing path-based, statistical, and context vector methods, using a gold standard for evaluation.
View Article and Find Full Text PDFIn this study we report on potential drug-drug interactions between drugs occurring in patient clinical data. Results are based on relationships in SemMedDB, a database of structured knowledge extracted from all MEDLINE citations (titles and abstracts) using SemRep. The core of our methodology is to construct two potential drug-drug interaction schemas, based on relationships extracted from SemMedDB.
View Article and Find Full Text PDFWe describe a domain-independent methodology to extend SemRep coverage beyond the biomedical domain. SemRep, a natural language processing application originally designed for biomedical texts, uses the knowledge sources provided by the Unified Medical Language System (UMLS©). Ontological and terminological extensions to the system are needed in order to support other areas of knowledge.
View Article and Find Full Text PDFSummary: Effective access to the vast biomedical knowledge present in the scientific literature is challenging. Semantic relations are increasingly used in knowledge management applications supporting biomedical research to help address this challenge. We describe SemMedDB, a repository of semantic predications (subject-predicate-object triples) extracted from the entire set of PubMed citations.
View Article and Find Full Text PDFStudy Objectives: Sleep quality commonly diminishes with age, and, further, aging men often exhibit a wider range of sleep pathologies than women. We used a freely available, web-based discovery technique (Semantic MEDLINE) supported by semantic relationships to automatically extract information from MEDLINE titles and abstracts.
Design: We assumed that testosterone is associated with sleep (the A-C relationship in the paradigm) and looked for a mechanism to explain this association (B explanatory link) as a potential or partial mechanism underpinning the etiology of eroded sleep quality in aging men.
We present an extension to literature-based discovery that goes beyond making discoveries to a principled way of navigating through selected aspects of some biomedical domain. The method is a type of "discovery browsing" that guides the user through the research literature on a specified phenomenon. Poorly understood relationships may be explored through novel points of view, and potentially interesting relationships need not be known ahead of time.
View Article and Find Full Text PDFBackground: Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems.
View Article and Find Full Text PDFAutomatic summarization has been proposed to help manage the results of biomedical information retrieval systems. Semantic MEDLINE, for example, summarizes semantic predications representing assertions in MEDLINE citations. Results are presented as a graph which maintains links to the original citations.
View Article and Find Full Text PDFExplosion of disaster health information results in information overload among response professionals. The objective of this project was to determine the feasibility of applying semantic natural language processing (NLP) technology to addressing this overload. The project characterizes concepts and relationships commonly used in disaster health-related documents on influenza pandemics, as the basis for adapting an existing semantic summarizer to the domain.
View Article and Find Full Text PDFStud Health Technol Inform
December 2010
With the development of electronic personal health records, more patients are gaining access to their own medical records. However, comprehension of medical record content remains difficult for many patients. Because each record is unique, it is also prohibitively costly to employ human translators to solve this problem.
View Article and Find Full Text PDFWe are developing a freely available Spanish medical syntactic lexicon, initially populated with medical terms from a bilingual list, and then from corpus based term discovery. The lexical records are a simplification of the SPECIALST English lexicon. Lexical variant generation and normalization tools will be provided along with the lexicon.
View Article and Find Full Text PDFAccurate readability assessment of health related materials is a critical first step in producing easily understandable consumer health information resources and personal health records. Existing general readability formulas may not always be appropriate for the medical/consumer health domain. We developed a new health-specific readability pilot measure, based on the differences in semantic and syntactic features as well as text unit length.
View Article and Find Full Text PDFAMIA Annu Symp Proc
October 2007
Identifying risk factors and biomarkers for diseases is an important aspect of biomedical research. However, much of the underlying information resides in the research literature and is not available in executable form. We propose a methodology based on automatic semantic interpretation (using SemRep) to capture risk factors and biomarkers for diseases asserted in MEDLINE citations.
View Article and Find Full Text PDFAMIA Annu Symp Proc
September 2007
We conducted a user study of monolingual and bilingual Spanish-speaking consumers (n=36) to evaluate a Spanish-language ClinicalTrials.gov prototype. The prototype leverages an existing English-only consumer health resource by combining (1) Spanish-English cross-language information retrieval (CLIR) and (2) English-Spanish document display techniques.
View Article and Find Full Text PDFStud Health Technol Inform
June 2005
Researchers and practitioners frequently use readability formulas to predict the suitability of health-related texts for consumers (e.g., patient instructions, informed consent documents).
View Article and Find Full Text PDF