FoodAtlas: Automated knowledge extraction of food and chemicals from literature.

Jason Youn Fangzhou Li Gabriel Simmons Shanghyeon Kim Ilias Tagkopoulos

Comput Biol Med

Department of Computer Science, University of California, Davis, Davis, CA, 95616, USA; Genome Center, University of California, Davis, Davis, CA, 95616, USA; USDA/NSF AI Institute for Next Generation Food Systems, Davis, CA, 95616, USA. Electronic address:

Published: October 2024

Automated generation of knowledge graphs that accurately capture published information can help with knowledge organization and access, which have the potential to accelerate discovery and innovation. Here, we present an integrated pipeline to construct a large-scale knowledge graph using large language models in an active learning setting. We apply our pipeline to the association of raw food, ingredients, and chemicals, a domain that lacks such knowledge resources. By using an iterative active learning approach of 4120 manually curated premise-hypothesis pairs as training data for ten consecutive cycles, the entailment model extracted 230,848 food-chemical composition relationships from 155,260 scientific papers, with 106,082 (46.0 %) of them never been reported in any published database. To augment the knowledge incorporated in the knowledge graph, we further incorporated information from 5 external databases and ontology sources. We then applied a link prediction model to identify putative food-chemical relationships that were not part of the constructed knowledge graph. Validation of the 443 hypotheses generated by the link prediction model resulted in 355 new food-chemical relationships, while results show that the model score correlates well (R = 0.70) with the probability of a novel finding. This work demonstrates how automated learning from literature at scale can accelerate discovery and support practical applications through reproducible, evidence-based capture of latent interactions of diverse entities, such as food and chemicals.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.compbiomed.2024.109072	DOI Listing

Publication Analysis

Top Keywords

knowledge graph

knowledge

food chemicals

accelerate discovery

active learning

link prediction

prediction model

food-chemical relationships

foodatlas automated

automated knowledge

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered