Large databases (>10(6) sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3633484PMC
http://dx.doi.org/10.1002/pmic.201200352DOI Listing

Publication Analysis

Top Keywords

two-step method
20
peptide sequence
12
sequence matches
12
one-step method
12
method
9
database search
8
large databases
8
metaproteomic proteogenomic
8
false negatives
8
subset database
8

Similar Publications

With the rapid advancement of proteomics, numerous scholars have investigated the intricate relationships between plasma proteins and various diseases. Therefore, this study aims to elucidate the relationship between BDH1 and type 2 diabetes using Mendelian randomization (MR) and to identify novel targets for the prevention and treatment of type 2 diabetes through proteomics. This study primarily employed the Mendelian Randomization (MR) method, leveraging genetic data from numerous large-scale, publicly accessible genome-wide association studies (GWAS).

View Article and Find Full Text PDF

Introduction: This study investigates the mechanisms underlying pitch class-color synesthesia, a cognitive trait in which musical pitches evoke color perceptions. Synesthesia in music particularly involves the association of pitch classes (e.g.

View Article and Find Full Text PDF

Purpose: Prostate cancer (PCa) is the second most common cancer in males worldwide, requiring improvements in diagnostic imaging to identify and treat it at an early stage. Bi-parametric magnetic resonance imaging (bpMRI) is recognized as an essential diagnostic technique for PCa, providing shorter acquisition times and cost-effectiveness. Nevertheless, accurate diagnosis using bpMRI images is difficult due to the inconspicuous and diverse characteristics of malignant tumors and the intricate structure of the prostate gland.

View Article and Find Full Text PDF

In biomedical studies, gene-environment (G-E) interactions have been demonstrated to have important implications for analyzing disease outcomes beyond the main G and main E effects. Many approaches have been developed for G-E interaction analysis, yielding important findings. However, hierarchical multi-label classification, which provides insightful information on disease outcomes, remains unexplored in G-E analysis literature.

View Article and Find Full Text PDF

Causal relationships between immune cells, inflammatory factors, and preeclampsia: A two-step, two-sample mendelian randomization study.

J Reprod Immunol

January 2025

Department of Gynecology and Obstetrics, Tianjin Medical University General Hospital, Tianjin, China; Tianjin Key Laboratory of Female Reproductive Health and Eugenics, Tianjin Medical University General Hospital, Tianjin, China. Electronic address:

Background: Preeclampsia (PE) is a complex hypertensive disorder that occurs during pregnancy, with the immune system playing a key role. Although immune modulation is implicated in PE progression, the roles of specific immune cells and inflammatory mediators remain unclear.

Methods: We conducted a two-sample, two-step Mendelian randomization (MR) analysis, primarily using the inverse-variance weighted method, to investigate the causal effect of immune cell traits on PE.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!