Publications by James G Mork | LitMetric

Publications by authors named "James G Mork"

Page 1 of 2

The road from manual to automatic semantic indexing of biomedical literature: a 10 years journey.

Anastasia Krithara James G Mork Anastasios Nentidis Georgios Paliouras

Front Res Metr Anal

September 2023

Biomedical experts are facing challenges in keeping up with the vast amount of biomedical knowledge published daily. With millions of citations added to databases like MEDLINE/PubMed each year, efficiently accessing relevant information becomes crucial. Traditional term-based searches may lead to irrelevant or missed documents due to homonyms, synonyms, abbreviations, or term mismatch.

View Article and Find Full Text PDF

The NLM indexer assignment dataset: a new large-scale dataset for reviewer assignment research.

Alastair R Rae James G Mork Dina Demner-Fushman

J Assoc Inf Sci Technol

February 2023

MEDLINE is the National Library of Medicine's (NLM) journal citation database. It contains over 28 million references to biomedical and life science journal articles, and a key feature of the database is that all articles are indexed with NLM Medical Subject Headings (MeSH). The library employs a team of MeSH indexers, and in recent years they have been asked to index close to 1 million articles per year in order to keep MEDLINE up to date.

View Article and Find Full Text PDF

Hybrid Ensemble-Rule Algorithm for Improved MEDLINE® Sentence Boundary Detection.

Daniel X Le James G Mork Sameer Antani

AMIA Annu Symp Proc

April 2022

Sentence boundary detection (SBD) is a fundamental building block in the Natural Language Processing (NLP) pipeline. Incorrect SBD may impact subsequent processing stages resulting in decreased performance. In well-behaved corpora, a few simple rules based on punctuation and capitalization are sufficient for successfully detecting sentence boundaries.

View Article and Find Full Text PDF

Automatic MeSH Indexing: Revisiting the Subheading Attachment Problem.

Alastair R Rae David O Pritchard James G Mork Dina Demner-Fushman

AMIA Annu Symp Proc

June 2021

This year less than 200 National Library of Medicine indexers expect to index 1 million articles, and this would not be possible without the assistance of the Medical Text Indexer (MTI) system. MTI is an automated indexing system that provides MeSH main heading/subheading pair recommendations to assist indexers with their heavy workload. Over the years, a lot of research effort has focused on improving main heading prediction performance, but automated fine-grained indexing with main heading/subheading pairs has received much less attention.

View Article and Find Full Text PDF

Proper filter usage to retrieve multiwords from the MEDLINE n-gram set: Reply to the Turki et al commentary "Enhancing filter-based parenthetic abbreviation extraction methods".

Chris J Lu Amanda Payne James G Mork

J Am Med Inform Assoc

March 2021

View Article and Find Full Text PDF

Chemical Entity Recognition for MEDLINE Indexing.

Max E Savery Willie J Rogers Malvika Pillai James G Mork Dina Demner-Fushman

AMIA Jt Summits Transl Sci Proc

May 2020

Chemical entity recognition is essential for indexing scientific literature in the MEDLINE database at the National Library of Medicine. However, the tool currently used to suggest terms for indexing, the Medical Text Indexer, was not originally conceived as a chemical recognition tool. It has instead been adapted to the task via its use of MetaMap and the addition of in-house patterns and rules.

View Article and Find Full Text PDF

The Unified Medical Language System SPECIALIST Lexicon and Lexical Tools: Development and applications.

Chris J Lu Amanda Payne James G Mork

J Am Med Inform Assoc

May 2020

Natural language processing (NLP) plays a vital role in modern medical informatics. It converts narrative text or unstructured data into knowledge by analyzing and extracting concepts. A comprehensive lexical system is the foundation to the success of NLP applications and an essential component at the beginning of the NLP pipeline.

View Article and Find Full Text PDF

A High Recall Classifier for Selecting Articles for MEDLINE Indexing.

Alastair R Rae Max E Savery James G Mork Dina Demner-Fushman

AMIA Annu Symp Proc

August 2020

MEDLINE is the National Library of Medicine's premier bibliographic database for biomedical literature. A highly valuable feature of the database is that each record is manually indexed with a controlled vocabulary called MeSH. Most MEDLINE journals are indexed cover-to-cover, but there are about 200 selectively indexed journals for which only articles related to biomedicine and life sciences are indexed.

View Article and Find Full Text PDF

Finding medication doses in the liteature.

Dina Demner-Fushman James G Mork Willie J Rogers Sonya E Shooshan Laritza Rodriguez

AMIA Annu Symp Proc

October 2019

Medication doses, one of the determining factors in medication safety and effectiveness, are present in the literature, but only in free-text form. We set out to determine if the systems developed for extracting drug prescription information from clinical text would yield comparable results on scientific literature and if sequence-to-sequence learning with neural networks could improve over the current state-of-the-art. We developed a collection of 694 PubMed Central documents annotated with drug dose information using the i2b2 schema.

View Article and Find Full Text PDF

Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study.

Kevin B Read Jerry R Sheehan Michael F Huerta Lou S Knecht James G Mork

PLoS One

May 2016

Objective: This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH).

View Article and Find Full Text PDF

Feature engineering for MEDLINE citation categorization with MeSH.

Antonio Jose Jimeno Yepes Laura Plaza Jorge Carrillo-de-Albornoz James G Mork Alan R Aronson

BMC Bioinformatics

April 2015

Background: Research in biomedical text categorization has mostly used the bag-of-words representation. Other more sophisticated representations of text based on syntactic, semantic and argumentative properties have been less studied. In this paper, we evaluate the impact of different text representations of biomedical texts as features for reproducing the MeSH annotations of some of the most frequent MeSH headings.

View Article and Find Full Text PDF

Extracting Characteristics of the Study Subjects from Full-Text Articles.

Dina Demner-Fushman James G Mork

AMIA Annu Symp Proc

February 2017

Characteristics of the subjects of biomedical research are important in determining if a publication describing the research is relevant to a search. To facilitate finding relevant publications, MEDLINE citations provide Medical Subject Headings that describe the subjects' characteristics, such as their species, gender, and age. We seek to improve the recommendation of these headings by the Medical Text Indexer (MTI) that supports manual indexing of MEDLINE.

View Article and Find Full Text PDF

Comparison and combination of several MeSH indexing approaches.

Antonio Jose Jimeno Yepes James G Mork Dina Demner-Fushman Alan R Aronson

AMIA Annu Symp Proc

May 2014

MeSH indexing of MEDLINE is becoming a more difficult task for the group of highly qualified indexing staff at the US National Library of Medicine, due to the large yearly growth of MEDLINE and the increasing size of MeSH. Since 2002, this task has been assisted by the Medical Text Indexer or MTI program. We extend previous machine learning analysis by adding a more diverse set of MeSH headings targeting examples where MTI has been shown to perform poorly.

View Article and Find Full Text PDF

Mining MEDLINE for problems associated with vitamin D.

Dina Demner-Fushman James G Mork Alan R Aronson

AMIA Annu Symp Proc

May 2014

This paper presents a two-step approach to generating comprehensive abstractive overviews for biomedical topics. It starts with a sensitivity-maximizing search of MEDLINE/PubMed and MeSH-based filtering of the results that are then processed using NLP methods to extract relations between entities of interest. We evaluate this approach in a case study based on the IOM report on the role of vitamin D in human health.

View Article and Find Full Text PDF

MeSH indexing based on automatically generated summaries.

Antonio J Jimeno-Yepes Laura Plaza James G Mork Alan R Aronson Alberto Díaz

BMC Bioinformatics

June 2013

Background: MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary.

View Article and Find Full Text PDF

GeneRIF indexing: sentence selection based on machine learning.

Antonio J Jimeno-Yepes J Caitlin Sticco James G Mork Alan R Aronson

BMC Bioinformatics

May 2013

Background: A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries.

View Article and Find Full Text PDF

A bottom-up approach to MEDLINE indexing recommendations.

Antonio Jimeno-Yepes Bartłomiej Wilkowski James G Mork Elizabeth Van Lenten Dina Demner Fushman

AMIA Annu Symp Proc

February 2013

MEDLINE indexing performed by the US National Library of Medicine staff describes the essence of a biomedical publication in about 14 Medical Subject Headings (MeSH). Since 2002, this task is assisted by the Medical Text Indexer (MTI) program. We present a bottom-up approach to MEDLINE indexing in which the abstract is searched for indicators for a specific MeSH recommendation in a two-step process.

View Article and Find Full Text PDF

A retrospective cohort study of structured abstracts in MEDLINE, 1992-2006.

Anna M Ripple James G Mork Lou S Knecht Betsy L Humphreys

J Med Libr Assoc

April 2011

View Article and Find Full Text PDF

Extracting Rx information from clinical narrative.

James G Mork Olivier Bodenreider Dina Demner-Fushman Rezarta Islamaj Dogan François-Michel Lang

J Am Med Inform Assoc

November 2010

Objective: The authors used the i2b2 Medication Extraction Challenge to evaluate their entity extraction methods, contribute to the generation of a publicly available collection of annotated clinical notes, and start developing methods for ontology-based reasoning using structured information generated from the unstructured clinical narrative.

Design: Extraction of salient features of medication orders from the text of de-identified hospital discharge summaries was addressed with a knowledge-based approach using simple rules and lookup lists. The entity recognition tool, MetaMap, was combined with dose, frequency, and duration modules specifically developed for the Challenge as well as a prototype module for reason identification.

View Article and Find Full Text PDF

UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text.

Dina Demner-Fushman James G Mork Sonya E Shooshan Alan R Aronson

J Biomed Inform

August 2010

Identification of medical terms in free text is a first step in such Natural Language Processing (NLP) tasks as automatic indexing of biomedical literature and extraction of patients' problem lists from the text of clinical notes. Many tools developed to perform these tasks use biomedical knowledge encoded in the Unified Medical Language System (UMLS) Metathesaurus. We continue our exploration of automatic approaches to creation of subsets (UMLS content views) which can support NLP processing of either the biomedical literature or clinical text.

View Article and Find Full Text PDF

Comment on 'MeSH-up: effective MeSH text classification for improved document retrieval'.

Aurélie Névéol James G Mork Alan R Aronson

Bioinformatics

October 2009

neveola@ncbi.nlm.nih.

View Article and Find Full Text PDF

A recent advance in the automatic indexing of the biomedical literature.

Aurélie Névéol Sonya E Shooshan Susanne M Humphrey James G Mork Alan R Aronson

J Biomed Inform

October 2009

The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload.

View Article and Find Full Text PDF

Methodology for creating UMLS content views appropriate for biomedical natural language processing.

Alan R Aronson James G Mork Aurélie Névéol Sonya E Shooshan Dina Demner-Fushman

AMIA Annu Symp Proc

November 2008

Given the growth in UMLS Metathesaurus content and the consequent growth in language complexity, it is not surprising that NLP applications that depend on the UMLS are experiencing increased difficulty in maintaining adequate levels of performance. This phenomenon underscores the need for UMLS content views which can support NLP processing of both the biomedical literature and clinical text. We report on experiments designed to provide guidance as to whether to adopt a conservative vs.

View Article and Find Full Text PDF

Fine-grained indexing of the biomedical literature: MeSH subheading attachment for a MEDLINE indexing tool.

Aurélie Névéol Sonya E Shooshan James G Mork Alan R Aronson

AMIA Annu Symp Proc

October 2007

Objective: This paper reports on the latest results of an Indexing Initiative effort addressing the automatic attachment of subheadings to MeSH main headings recommended by the NLM's Medical Text Indexer.

Material And Methods: Several linguistic and statistical approaches are used to retrieve and attach the subheadings. Continuing collaboration with NLM indexers also provided insight on how automatic methods can better enhance indexing practice.

View Article and Find Full Text PDF

Evaluation of French and English MeSH indexing systems with a parallel corpus.

Aurélie Névéol James G Mork Alan R Aronson Stéfan J Darmoni

AMIA Annu Symp Proc

February 2007

Objective: This paper presents the evaluation of two MeSH indexing systems for French and English on a parallel corpus.

Material And Methods: We describe two automatic MeSH in-dexing systems - MTI for English, and MAIF for French. The French version of the evaluation resources has been manually indexed with MeSH keyword/qualifier pairs.

View Article and Find Full Text PDF

A PHP Error was encountered

Severity: Notice

Message: fwrite(): Write of 34 bytes failed with errno=28 No space left on device

Filename: drivers/Session_files_driver.php

Line Number: 272

Backtrace:

A PHP Error was encountered

Severity: Warning

Message: session_write_close(): Failed to write session data using user defined save handler. (session.save_path: /var/lib/php/sessions)

Filename: Unknown

Line Number: 0

Backtrace: