Wheat varieties show a large diversity of traits and phenotypes. Linking them to genetic variability is essential for shorter and more efficient wheat breeding programs. A growing number of plant molecular information networks provide interlinked interoperable data to support the discovery of gene-phenotype interactions. A large body of scientific literature and observational data obtained in-field and under controlled conditions document wheat breeding experiments. The cross-referencing of this complementary information is essential. Text from databases and scientific publications has been identified early on as a relevant source of information. However, the wide variety of terms used to refer to traits and phenotype values makes it difficult to find and cross-reference the textual information, e.g. simple dictionary lookup methods miss relevant terms. Corpora with manually annotated examples are thus needed to evaluate and train textual information extraction methods. While several corpora contain annotations of human and animal phenotypes, no corpus is available for plant traits. This hinders the evaluation of text mining-based crop knowledge graphs (e.g. AgroLD, KnetMiner, WheatIS-FAIDARE) and limits the ability to train machine learning methods and improve the quality of information. The Triticum aestivum trait Corpus is a new gold standard for traits and phenotypes of wheat. It consists of 528 PubMed references that are fully annotated by trait, phenotype, and species. We address the interoperability challenge of crossing sparse assay data and publications by using the Wheat Trait and Phenotype Ontology to normalize trait mentions and the species taxonomy of the National Center for Biotechnology Information to normalize species. The paper describes the construction of the corpus. A study of the performance of state-of-the-art language models for both named entity recognition and linking tasks trained on the corpus shows that it is suitable for training and evaluation. This corpus is currently the most comprehensive manually annotated corpus for natural language processing studies on crop phenotype information from the literature.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11175518PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0305475PLOS

Publication Analysis

Top Keywords

manually annotated
12
trait phenotype
12
wheat breeding
12
traits phenotypes
8
wheat
6
corpus
6
trait
5
phenotype
5
taec manually
4
annotated
4

Similar Publications

Background/objectives: Improved survival due to advances in medical therapy has resulted in increasing numbers of cancer patients living with bone metastases; however, our understanding of the prognostic implications of bone metastases requires larger population-based studies outlining their incidence and prevalence in different primary cancer types, including those with lower incidence. This study aimed to evaluate the incidence and prevalence of bone metastases in solid organ tumors by analyzing reports of staging CT studies with natural language processing (NLP).

Methods: In this retrospective study, 639,470 reports representing 129,326 unique patients were analyzed; 6279 randomly selected reports were manually annotated and labeled for the presence or absence of bone metastases.

View Article and Find Full Text PDF

Background: Establishing accurate, reliable, and convenient methods for enamel segmentation and analysis is crucial for effectively planning endodontic, orthodontic, and restorative treatments, as well as exploring the evolutionary patterns of mammals. However, no mature, non-destructive method currently exists in clinical dentistry to quickly, accurately, and comprehensively assess the integrity and thickness of enamel chair-side. This study aims to develop a deep learning work, 2.

View Article and Find Full Text PDF

Electroencephalography (EEG) is invaluable in the management of acute neurological emergencies. Characteristic EEG changes have been identified in diverse neurologic conditions including stroke, trauma, and anoxia, and the increased utilization of continuous EEG (cEEG) has identified potentially harmful activity even in patients without overt clinical signs or neurologic diagnoses. Manual annotation by expert neurophysiologists is a major resource limitation in investigating the prognostic and therapeutic implications of these EEG patterns and in expanding EEG use to a broader set of patients who are likely to benefit.

View Article and Find Full Text PDF

Instance segmentation of surgical instruments is a long-standing research problem, crucial for the development of many applications for computer-assisted surgery. This problem is commonly tackled via fully-supervised training of deep learning models, requiring expensive pixel-level annotations to train. In this work, we develop a framework for instance segmentation not relying on spatial annotations for training.

View Article and Find Full Text PDF

Objectives: Accurate kidney and tumor segmentation of computed tomography (CT) scans is vital for diagnosis and treatment, but manual methods are time-consuming and inconsistent, highlighting the value of AI automation. This study develops a fully automated AI model using vision transformers (ViTs) and convolutional neural networks (CNNs) to detect and segment kidneys and kidney tumors in Contrast-Enhanced (CECT) scans, with a focus on improving sensitivity for small, indistinct tumors.

Methods: The segmentation framework employs a ViT-based model for the kidney organ, followed by a 3D UNet model with enhanced connections and attention mechanisms for tumor detection and segmentation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!