Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts where enzymes and the chemical reactions they catalyze are annotated using identifiers from the protein knowledgebase UniProtKB and the chemical ontology ChEBI.
View Article and Find Full Text PDFExpert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI).
View Article and Find Full Text PDFIntroduction: Lipids are key compounds in the study of metabolism and are increasingly studied in biology projects. It is a very broad family that encompasses many compounds, and the name of the same compound may vary depending on the community where they are studied.
Objectives: In addition, their structures are varied and complex, which complicates their analysis.
ViralZone (http://viralzone.expasy.org) is a knowledge repository for viruses that links biological knowledge and databases.
View Article and Find Full Text PDF