Publications by authors named "Sonya E Shooshan"

Radiology reports contain a radiologist's interpretations of images, and these images frequently describe spatial relations. Important radiographic findings are mostly described in reference to an anatomical location through spatial prepositions. Such spatial relationships are also linked to various differential diagnoses and often described through uncertainty phrases.

View Article and Find Full Text PDF

This paper addresses the task of answering consumer health questions about medications. To better understand the challenge and needs in terms of methods and resources, we first introduce a gold standard corpus for Medication Question Answering created using real consumer questions. The gold standard (https://github.

View Article and Find Full Text PDF

Medication doses, one of the determining factors in medication safety and effectiveness, are present in the literature, but only in free-text form. We set out to determine if the systems developed for extracting drug prescription information from clinical text would yield comparable results on scientific literature and if sequence-to-sequence learning with neural networks could improve over the current state-of-the-art. We developed a collection of 694 PubMed Central documents annotated with drug dose information using the i2b2 schema.

View Article and Find Full Text PDF

Objective: Automated understanding of consumer health inquiries might be hindered by misspellings. To detect and correct various types of spelling errors in consumer health questions, we developed a distributable spell-checking tool, CSpell, that handles nonword errors, real-word errors, word boundary infractions, punctuation errors, and combinations of the above.

Methods: We developed a novel approach of using dual embedding within Word2vec for context-dependent corrections.

View Article and Find Full Text PDF

Background: Consumers increasingly use online resources for their health information needs. While current search engines can address these needs to some extent, they generally do not take into account that most health information needs are complex and can only fully be expressed in natural language. Consumer health question answering (QA) systems aim to fill this gap.

View Article and Find Full Text PDF

Adverse drug reactions (ADRs), unintended and sometimes dangerous effects that a drug may have, are one of the leading causes of morbidity and mortality during medical care. To date, there is no structured machine-readable authoritative source of known ADRs. The United States Food and Drug Administration (FDA) partnered with the National Library of Medicine to create a pilot dataset containing standardised information about known adverse reactions for 200 FDA-approved drugs.

View Article and Find Full Text PDF

We present an approach for manually and automatically classifying the resource type of medical questions. Three types of resources are considered: patient-specific, general knowledge, and research. Using this approach, an automatic question answering system could select the best type of resource from which to consider answers.

View Article and Find Full Text PDF

Objective: Clinical documents made available for secondary use play an increasingly important role in discovery of clinical knowledge, development of research methods, and education. An important step in facilitating secondary use of clinical document collections is easy access to descriptions and samples that represent the content of the collections. This paper presents an approach to developing a collection of radiology examinations, including both the images and radiologist narrative reports, and making them publicly available in a searchable database.

View Article and Find Full Text PDF

This paper describes a supervised machine learning approach for identifying heart disease risk factors in clinical text, and assessing the impact of annotation granularity and quality on the system's ability to recognize these risk factors. We utilize a series of support vector machine models in conjunction with manually built lexicons to classify triggers specific to each risk factor. The features used for classification were quite simple, utilizing only lexical information and ignoring higher-level linguistic information such as syntax and semantics.

View Article and Find Full Text PDF

To incorporate ontological concepts in natural language processing (NLP) it is often necessary to combine simple concepts into complex concepts (post-coordination). This is especially true in consumer language, where a more limited vocabulary forces consumers to utilize highly productive language that is almost impossible to pre-coordinate in an ontology. Our work focuses on recognizing an important case for post-coordination in natural language: spatial relations between disorders and anatomical structures.

View Article and Find Full Text PDF

Objective: The authors used the i2b2 Medication Extraction Challenge to evaluate their entity extraction methods, contribute to the generation of a publicly available collection of annotated clinical notes, and start developing methods for ontology-based reasoning using structured information generated from the unstructured clinical narrative.

Design: Extraction of salient features of medication orders from the text of de-identified hospital discharge summaries was addressed with a knowledge-based approach using simple rules and lookup lists. The entity recognition tool, MetaMap, was combined with dose, frequency, and duration modules specifically developed for the Challenge as well as a prototype module for reason identification.

View Article and Find Full Text PDF

Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients' problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text.

View Article and Find Full Text PDF

The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload.

View Article and Find Full Text PDF

Background: Indexing is a crucial step in any information retrieval system. In MEDLINE, a widely used database of the biomedical literature, the indexing process involves the selection of Medical Subject Headings in order to describe the subject matter of articles. The need for automatic tools to assist MEDLINE indexers in this task is growing with the increasing number of publications being added to MEDLINE.

View Article and Find Full Text PDF

Given the growth in UMLS Metathesaurus content and the consequent growth in language complexity, it is not surprising that NLP applications that depend on the UMLS are experiencing increased difficulty in maintaining adequate levels of performance. This phenomenon underscores the need for UMLS content views which can support NLP processing of both the biomedical literature and clinical text. We report on experiments designed to provide guidance as to whether to adopt a conservative vs.

View Article and Find Full Text PDF

Objective: This paper reports on the latest results of an Indexing Initiative effort addressing the automatic attachment of subheadings to MeSH main headings recommended by the NLM's Medical Text Indexer.

Material And Methods: Several linguistic and statistical approaches are used to retrieve and attach the subheadings. Continuing collaboration with NLM indexers also provided insight on how automatic methods can better enhance indexing practice.

View Article and Find Full Text PDF

The number of articles in the MEDLINE database is expected to increase tremendously in the coming years. To ensure that all these documents are indexed with continuing high quality, it is necessary to develop tools and methods that help the indexers in their daily task. We present three methods addressing a novel aspect of automatic indexing of the biomedical literature, namely producing MeSH main heading/subheading pair recommendations.

View Article and Find Full Text PDF

The U.S. National Library of Medicine (NLM) has created a metasearch engine called the NLM Gateway at the URL "gateway.

View Article and Find Full Text PDF

A PHP Error was encountered

Severity: Notice

Message: fwrite(): Write of 34 bytes failed with errno=28 No space left on device

Filename: drivers/Session_files_driver.php

Line Number: 272

Backtrace:

A PHP Error was encountered

Severity: Warning

Message: session_write_close(): Failed to write session data using user defined save handler. (session.save_path: /var/lib/php/sessions)

Filename: Unknown

Line Number: 0

Backtrace: