Correct identification of a peptide sequence from MS/MS data is still a challenging research problem, particularly in proteomic analyses of higher eukaryotes where protein databases are large. The scoring methods of search programs often generate cases where incorrect peptide sequences score higher than correct peptide sequences (referred to as distraction). Because smaller databases yield less distraction and better discrimination between correct and incorrect assignments, we developed a method for editing a peptide-centric database (PC-DB) to remove unlikely sequences and strategies for enabling search programs to utilize this peptide database. Rules for unlikely missed cleavage and nontryptic proteolysis products were identified by data mining 11 849 high-confidence peptide assignments. We also evaluated ion exchange chromatographic behavior as an editing criterion to generate subset databases. When used to search a well-annotated test data set of MS/MS spectra, we found no loss of critical information using PC-DBs, validating the methods for generating and searching against the databases. On the other hand, improved confidence in peptide assignments was achieved for tryptic peptides, measured by changes in DeltaCN and RSP. Decreased distraction was also achieved, consistent with the 3-9-fold decrease in database size. Data mining identified a major class of common nonspecific proteolytic products corresponding to leucine aminopeptidase (LAP) cleavages. Large improvements in identifying LAP products were achieved using the PC-DB approach when compared with conventional searches against protein databases. These results demonstrate that peptide properties can be used to reduce database size, yielding improved accuracy and information capture due to reduced distraction, but with little loss of information compared to conventional protein database searches.

Download full-text PDF

Source
http://dx.doi.org/10.1021/ac051127fDOI Listing

Publication Analysis

Top Keywords

data mining
12
peptide-centric database
8
ms/ms spectra
8
protein databases
8
search programs
8
peptide sequences
8
peptide assignments
8
database size
8
compared conventional
8
peptide
7

Similar Publications

New Insights into the Pathogenesis of Alcoholic Liver Disease Based on Global Research.

Dig Dis Sci

January 2025

Provincial-Level Key Laboratory for Molecular Medicine of Major Diseases and The Prevention and Treatment With Traditional Chinese Medicine Research in Gansu Colleges and University, Gansu University of Chinese Medicine, Lanzhou, China.

Background And Aims: Alcoholic liver disease (ALD) is the leading cause of death among alcohol-related diseases, yet its pathogenesis remains incompletely understood. This article employs data mining methods to conduct an indepth study of articles on ALD published in the past three decades, aiming to elucidate the pathogenesis of ALD.

Methods: Firstly, articles related to the pathogenesis of ALD were retrieved from the Web of Science (WOS) database.

View Article and Find Full Text PDF

During the Covid-19 pandemic, the widespread use of social media platforms has facilitated the dissemination of information, fake news, and propaganda, serving as a vital source of self-reported symptoms related to Covid-19. Existing graph-based models, such as Graph Neural Networks (GNNs), have achieved notable success in Natural Language Processing (NLP). However, utilizing GNN-based models for propaganda detection remains challenging because of the challenges related to mining distinct word interactions and storing nonconsecutive and broad contextual data.

View Article and Find Full Text PDF

Citronellol (CT) is a naturally occurring lipophilic monoterpenoid which has shown anticancer effects in numerous cancerous cell lines. This study was, therefore, designed to examine CT's potential as an anticancer agent against glioblastoma (GBM). Network pharmacology analysis was employed to identify potential anticancer targets of CT.

View Article and Find Full Text PDF

Unveiling unexpected adverse events: post-marketing safety surveillance of gilteritinib and midostaurin from the FDA Adverse Event Reporting database.

Ther Adv Drug Saf

January 2025

Department of Pharmacy, Daping Hospital, Army Medical University, No. 10 Changjiang Branch Road, Yuzhong District, Chongqing 400042, China.

Background: Gilteritinib and midostaurin are FLT3 inhibitors that have made significant progress in the treatment of acute myeloid leukemia. However, their real-world safety profile in a large sample population is incomplete.

Objectives: We aimed to provide a pharmacovigilance study of the adverse events (AEs) associated with gilteritinib and midostaurin through the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS) database.

View Article and Find Full Text PDF

Dynamic functional connectivity (DFC) has shown promise in the diagnosis of Autism Spectrum Disorder (ASD). However, extracting highly discriminative information from the complex DFC matrix remains a challenging task. In this paper, we propose an ASD classification framework PSA-FCN which is based on time-aligned DFC and Prob-Sparse Self-Attention to address this problem.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!