AI Article Synopsis

  • * Natural language processing (NLP) methods can significantly streamline the extraction process, particularly as the COVID-19 pandemic highlighted gaps in essential data, such as demographics and clinical outcomes in genomic records.
  • * The development of automated pipelines using machine learning and NLP will allow for better identification of key patient characteristics from relevant articles, enhancing the richness of data available for epidemiological studies.

Article Abstract

There are many studies that require researchers to extract specific information from the published literature, such as details about sequence records or about a randomized control trial. While manual extraction is cost efficient for small studies, larger studies such as systematic reviews are much more costly and time-consuming. To avoid exhaustive manual searches and extraction, and their related cost and effort, natural language processing (NLP) methods can be tailored for the more subtle extraction and decision tasks that typically only humans have performed. The need for such studies that use the published literature as a data source became even more evident as the COVID-19 pandemic raged through the world and millions of sequenced samples were deposited in public repositories such as GISAID and GenBank, promising large genomic epidemiology studies, but more often than not lacked many important details that prevented large-scale studies. Thus, granular geographic location or the most basic patient-relevant data such as demographic information, or clinical outcomes were not noted in the sequence record. However, some of these data was indeed published, but in the text, tables, or supplementary material of a corresponding published article. We present here methods to identify relevant journal articles that report having produced and made available in GenBank or GISAID, new SARS-CoV-2 sequences, as those that initially produced and made available the sequences are the most likely articles to include the high-level details about the patients from whom the sequences were obtained. Human annotators validated the approach, creating a gold standard set for training and validation of a machine learning classifier. Identifying these articles is a crucial step to enable future automated informatics pipelines that will apply Machine Learning and Natural Language Processing to identify patient characteristics such as co-morbidities, outcomes, age, gender, and race, enriching SARS-CoV-2 sequence databases with actionable information for defining large genomic epidemiology studies. Thus, enriched patient metadata can enable secondary data analysis, at scale, to uncover associations between the viral genome (including variants of concern and their sublineages), transmission risk, and health outcomes. However, for such enrichment to happen, the right papers need to be found and very detailed data needs to be extracted from them. Further, finding the very specific articles needed for inclusion is a task that also facilitates scoping and systematic reviews, greatly reducing the time needed for full-text analysis and extraction.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418574PMC
http://dx.doi.org/10.1101/2023.07.29.23293370DOI Listing

Publication Analysis

Top Keywords

systematic reviews
12
genomic epidemiology
12
published literature
8
extraction cost
8
natural language
8
language processing
8
large genomic
8
epidemiology studies
8
machine learning
8
studies
7

Similar Publications

Background: Knee injuries resulting in purely cartilaginous defects are rare, and controversy remains regarding the reliability of chondral-only fixation.

Purpose: To systematically review the literature for fixation methods and outcomes after primary fixation of chondral-only defects within the knee.

Study Design: Systematic review; Level of evidence, 5.

View Article and Find Full Text PDF

Background: Selective androgen receptor modulators (SARMs) are small-molecule compounds that exert agonist and antagonist effects on androgen receptors in a tissue-specific fashion. Because of their performance-enhancing implications, SARMs are increasingly abused by athletes. To date, SARMs have no Food and Drug Administration approved use, and recent case reports associate the use of SARMs with deleterious effects such as drug-induced liver injury, myocarditis, and tendon rupture.

View Article and Find Full Text PDF

Background: To summarize the statistical performance of machine learning in predicting revision, secondary knee injury, or reoperations following anterior cruciate ligament reconstruction (ACLR), and to provide a general overview of the statistical performance of these models.

Methods: Three online databases (PubMed, MEDLINE, EMBASE) were searched from database inception to February 6, 2024, to identify literature on the use of machine learning to predict revision, secondary knee injury (e.g.

View Article and Find Full Text PDF

Ethiopia hygiene practice during complementary feeding and associated factors; systematic review and meta-analysis.

BMC Pediatr

January 2025

Health Promotion and Health Behavior Department, Institute of Public Health, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia.

Background: Complementary feeding is crucial for infant growth, but poor hygiene during this period increases the risk of malnutrition and illness. In Ethiopia, national data on hygiene practices during complementary feeding, particularly among mothers of children aged 6-24 months, is limited. This study aims to synthesize existing data through a systematic review and meta-analysis to evaluate the status of hygiene practices and identify key influencing factors, informing public health strategies to improve child health outcomes.

View Article and Find Full Text PDF

Background: There is evidence that exercise may reduce the risk of gestational diabetes mellitus (GDM) and improve other obstetric outcomes in overweight or obese pregnant women. However, the available evidence is of low quality and inconclusive. The purpose of this study is to assess the effects of exercise, compared with usual care, in reducing GDM and other obstetric risks, in overweight and obese pregnant women.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!