Chemical curation to improve data accuracy: recent development of the 3DMET database.

Biophys Physicobiol

Advanced Analysis Center, National Agriculture and Food Research Organization, Tsukuba, Ibaraki 305-8602, Japan.

Published: April 2018

We have developed a three-dimensional structure database of natural metabolites (3DMET). Early development of the 3DMET database relied on content auto-generated from 2D-structures of other chemical databases. From 2009, we began manual curation, obtaining new compounds from published works. In the process of curation, problems of digitizing 3D-structures from structure drawings of documents were accumulated. As the same as auto-generation, structure drawings should be also payed attention about stereochemistry. Our experiences in manual curation of 3DMET, as described herein, may be useful to others in this field of research and for the development of supporting systems of a chemical structure database. Manual curation is still necessary for proper database entry of the 3D-configurations of chiral atoms, a problem encountered frequently among natural products.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5992871PMC
http://dx.doi.org/10.2142/biophysico.15.0_87DOI Listing

Publication Analysis

Top Keywords

manual curation
12
development 3dmet
8
3dmet database
8
structure database
8
structure drawings
8
database
5
chemical curation
4
curation improve
4
improve data
4
data accuracy
4

Similar Publications

Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.

J Med Internet Res

December 2024

Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China.

Background: Large language models (LLMs) are increasingly integrated into medical education, with transformative potential for learning and assessment. However, their performance across diverse medical exams globally has remained underexplored.

Objective: This study aims to introduce MedExamLLM, a comprehensive platform designed to systematically evaluate the performance of LLMs on medical exams worldwide.

View Article and Find Full Text PDF

Purpose: While public databases like Transfermarkt provide valuable data for assessing the impact of anterior cruciate ligament (ACL) injuries in professional footballers, they require robust verification methods due to accuracy concerns. We hypothesised that an artificial intelligence (AI)-powered framework could cross-check ACL tear-related information from large publicly available data sets with high specificity.

Methods: The AI-powered framework uses Google Programmable Search Engine to search a curated, multilingual list of websites and OpenAI's GPT to translate search queries, appraise search results and analyse injury-related information in search result items (SRIs).

View Article and Find Full Text PDF

Background And Purpose: Segmentation imperfections (noise) in radiotherapy organ-at-risk segmentation naturally arise from specialist experience and image quality. Using clinical contours can result in sub-optimal convolutional neural network (CNN) training and performance, but manual curation is costly. We address the impact of simulated and clinical segmentation noise on CNN parotid gland (PG) segmentation performance and provide proof-of-concept for an easily implemented auto-curation countermeasure.

View Article and Find Full Text PDF

Accurate variant classification is critical for genetic diagnosis. Variants without clear classification, known as "variants of uncertain significance" (VUS), pose a significant diagnostic challenge. This study examines AlphaMissense performance in variant classification, specifically for VUS.

View Article and Find Full Text PDF

Motivation: Thousands of genomes are publicly available, however, most genes in those genomes have poorly defined functions. This is partly due to a gap between previously published, experimentally-characterized protein activities and activities deposited in databases. This activity deposition is bottlenecked by the time-consuming biocuration process.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!