A general procedure for finding potentially erroneous entries in the database of retention indices.

Anal Chim Acta

Chemistry Department, Lomonosov Moscow State University, Leninskie Gory 1-3, 119991, Moscow, Russia. Electronic address:

Published: April 2024

Background: The NIST retention index database is one the most widely used sources of retention indices. In both untargeted analysis and machine learning studies filtering for potential errors is rather lacking or nonexistent. According to our estimates about 80% of the compounds from both NIST 17 and NIST 20 retention index databases have only one RI value per stationary phase, which makes searching for erroneous values with statistical methods impossible. Manual inspection is also impractical because the database contains more than 300 000 entries.

Results: We suggest a two-step procedure to find potentially erroneous retention indices based on machine learning. The first step is to use five predictive models to obtain predicted retention index values for the whole database. The second one is to compare these predicted values against the experimental ones. We consider a retention index erroneous if its accuracy (the difference between predicted and experimental value) is in the bottom 5% for each of the five models simultaneously. Using this method, we were able to detect 2093 outlier entries for standard and semi-standard non-polar stationary phases in the NIST 17 retention index database, 566 of those were corrected or removed by the developers in the NIST 20.

Significance: This is a novel approach to find potentially erroneous entries in a large-scale database with mostly unique entries, which can be applied not only to retention indices. The procedure can help filter and report mishandled data to improve the quality of the dataset for machine learning applications and experimental use.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.aca.2024.342375DOI Listing

Publication Analysis

Top Keywords

retention indices
16
nist retention
12
machine learning
12
retention
9
erroneous entries
8
retention database
8
find erroneous
8
database
6
erroneous
5
nist
5

Similar Publications

The rise of quality education has led to increased attention on music training as a vital means of enhancing personal qualities. However, with numerous music training institutions competing in the market, distinguishing oneself has become an urgent challenge. This study explores the key factors influencing customers' willingness to renew their enrollment at music training institutions through a questionnaire survey.

View Article and Find Full Text PDF

The ability to maintain a diverse scientific workforce is vital to promoting the US's economic and technological competitiveness. Data have shown disparities in science, mathematics, medical, and engineering programs across each level of education from high school to doctoral studies for students from underrepresented groups (URG). Research suggests that many URG students are pushed out of the biomedical track early in their academic careers, particularly during the first year.

View Article and Find Full Text PDF

Background: Although gut-derived uremic toxins are increased in azotemic chronic kidney disease (CKD) in cats and implicated in disease progression, it remains unclear if augmented formation or retention of these toxins is associated with the development of renal azotemia.

Objectives: Assess the association between gut-derived toxins (ie, indoxyl-sulfate, p-cresyl-sulfate, and trimethylamine-N-oxide [TMAO]) and the onset of azotemic CKD in cats.

Animals: Forty-eight client-owned cats.

View Article and Find Full Text PDF

Brain-derived neurotrophic factor (BDNF) plays important roles in brain development and neural function. Constitutive knockout of the splicing regulator RBM4 reduces BDNF expression in the developing brain and causes cerebellar hypoplasia, an autism-like feature. Here, we show that Rbm4 knockout induced intron 6 retention of Hsf1, leading to downregulation of HSF1 protein and its downstream target BDNF.

View Article and Find Full Text PDF

Microplastics (MPs) seriously threaten soil quality and crop health, particularly in agricultural systems using plastic mulch and sewage sludge, with their abundance being strongly influenced by soil properties such as texture, structure, and chemical content. Considering this, the present study assessed MP contamination in arid agricultural soils, focusing on their abundance, morphology, composition, and association with heavy metals to evaluate environmental risks. Soil samples were collected from ten plastic-mulched fields and a control site across a 50 sq.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!