Publications by authors named "James G Mork"

Objectives: The National Library of Medicine (NLM) currently indexes close to a million articles each year pertaining to more than 5300 medicine and life sciences journals. Of these, a significant number of articles contain critical information about the structure, genetics, and function of genes and proteins in normal and disease states. These articles are identified by the NLM curators, and a manual link is created between these articles and the corresponding gene records at the NCBI Gene database.

View Article and Find Full Text PDF

Biomedical experts are facing challenges in keeping up with the vast amount of biomedical knowledge published daily. With millions of citations added to databases like MEDLINE/PubMed each year, efficiently accessing relevant information becomes crucial. Traditional term-based searches may lead to irrelevant or missed documents due to homonyms, synonyms, abbreviations, or term mismatch.

View Article and Find Full Text PDF

MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs a team of MeSH indexers, and in recent years they have been asked to index close to 1 million articles per year in order to keep MEDLINE up to date.

View Article and Find Full Text PDF

Sentence boundary detection (SBD) is a fundamental building block in the Natural Language Processing (NLP) pipeline. Incorrect SBD may impact subsequent processing stages resulting in decreased performance. In well-behaved corpora, a few simple rules based on punctuation and capitalization are sufficient for successfully detecting sentence boundaries.

View Article and Find Full Text PDF

This year less than 200 National Library of Medicine indexers expect to index 1 million articles, and this would not be possible without the assistance of the Medical Text Indexer (MTI) system. MTI is an automated indexing system that provides MeSH main heading/subheading pair recommendations to assist indexers with their heavy workload. Over the years, a lot of research effort has focused on improving main heading prediction performance, but automated fine-grained indexing with main heading/subheading pairs has received much less attention.

View Article and Find Full Text PDF

Chemical entity recognition is essential for indexing scientific literature in the MEDLINE database at the National Library of Medicine. However, the tool currently used to suggest terms for indexing, the Medical Text Indexer, was not originally conceived as a chemical recognition tool. It has instead been adapted to the task via its use of MetaMap and the addition of in-house patterns and rules.

View Article and Find Full Text PDF

Natural language processing (NLP) plays a vital role in modern medical informatics. It converts narrative text or unstructured data into knowledge by analyzing and extracting concepts. A comprehensive lexical system is the foundation to the success of NLP applications and an essential component at the beginning of the NLP pipeline.

View Article and Find Full Text PDF

MEDLINE is the National Library of Medicine's premier bibliographic database for biomedical literature. A highly valuable feature of the database is that each record is manually indexed with a controlled vocabulary called MeSH. Most MEDLINE journals are indexed cover-to-cover, but there are about 200 selectively indexed journals for which only articles related to biomedicine and life sciences are indexed.

View Article and Find Full Text PDF

Medication doses, one of the determining factors in medication safety and effectiveness, are present in the literature, but only in free-text form. We set out to determine if the systems developed for extracting drug prescription information from clinical text would yield comparable results on scientific literature and if sequence-to-sequence learning with neural networks could improve over the current state-of-the-art. We developed a collection of 694 PubMed Central documents annotated with drug dose information using the i2b2 schema.

View Article and Find Full Text PDF

Objective: This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH).

View Article and Find Full Text PDF

Background: Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings.

View Article and Find Full Text PDF

Characteristics of the subjects of biomedical research are important in determining if a publication describing the research is relevant to a search. To facilitate finding relevant publications, MEDLINE citations provide Medical Subject Headings that describe the subjects' characteristics, such as their species, gender, and age. We seek to improve the recommendation of these headings by the Medical Text Indexer (MTI) that supports manual indexing of MEDLINE.

View Article and Find Full Text PDF

MeSH indexing of MEDLINE is becoming a more difficult task for the group of highly qualified indexing staff at the US National Library of Medicine, due to the large yearly growth of MEDLINE and the increasing size of MeSH. Since 2002, this task has been assisted by the Medical Text Indexer or MTI program. We extend previous machine learning analysis by adding a more diverse set of MeSH headings targeting examples where MTI has been shown to perform poorly.

View Article and Find Full Text PDF

This paper presents a two-step approach to generating comprehensive abstractive overviews for biomedical topics. It starts with a sensitivity-maximizing search of MEDLINE/PubMed and MeSH-based filtering of the results that are then processed using NLP methods to extract relations between entities of interest. We evaluate this approach in a case study based on the IOM report on the role of vitamin D in human health.

View Article and Find Full Text PDF

Background: MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary.

View Article and Find Full Text PDF

Background: A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries.

View Article and Find Full Text PDF

MEDLINE indexing performed by the US National Library of Medicine staff describes the essence of a biomedical publication in about 14 Medical Subject Headings (MeSH). Since 2002, this task is assisted by the Medical Text Indexer (MTI) program. We present a bottom-up approach to MEDLINE indexing in which the abstract is searched for indicators for a specific MeSH recommendation in a two-step process.

View Article and Find Full Text PDF

Objective: The authors used the i2b2 Medication Extraction Challenge to evaluate their entity extraction methods, contribute to the generation of a publicly available collection of annotated clinical notes, and start developing methods for ontology-based reasoning using structured information generated from the unstructured clinical narrative.

Design: Extraction of salient features of medication orders from the text of de-identified hospital discharge summaries was addressed with a knowledge-based approach using simple rules and lookup lists. The entity recognition tool, MetaMap, was combined with dose, frequency, and duration modules specifically developed for the Challenge as well as a prototype module for reason identification.

View Article and Find Full Text PDF

Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients' problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text.

View Article and Find Full Text PDF

The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload.

View Article and Find Full Text PDF

Given the growth in UMLS Metathesaurus content and the consequent growth in language complexity, it is not surprising that NLP applications that depend on the UMLS are experiencing increased difficulty in maintaining adequate levels of performance. This phenomenon underscores the need for UMLS content views which can support NLP processing of both the biomedical literature and clinical text. We report on experiments designed to provide guidance as to whether to adopt a conservative vs.

View Article and Find Full Text PDF

Objective: This paper reports on the latest results of an Indexing Initiative effort addressing the automatic attachment of subheadings to MeSH main headings recommended by the NLM's Medical Text Indexer.

Material And Methods: Several linguistic and statistical approaches are used to retrieve and attach the subheadings. Continuing collaboration with NLM indexers also provided insight on how automatic methods can better enhance indexing practice.

View Article and Find Full Text PDF