Optimizing metaproteomics database construction: lessons from a study of the vaginal microbiome.

Elliot M Lee Sujatha Srinivasan Samuel O Purvine Tina L Fiedler Owen P Leiser Sean C Proll Samuel S Minot Brooke L Deatherage Kaiser David N Fredricks

mSystems

Fred Hutchinson Cancer Research Center, Seattle, Washington, DC, USA.

Published: August 2023

Metaproteomics, a method for untargeted, high-throughput identification of proteins in complex samples, provides functional information about microbial communities and can tie functions to specific taxa. Metaproteomics often generates less data than other omics techniques, but analytical workflows can be improved to increase usable data in metaproteomic outputs. Identification of peptides in the metaproteomic analysis is performed by comparing mass spectra of sample peptides to a reference database of protein sequences. Although these protein databases are an integral part of the metaproteomic analysis, few studies have explored how database composition impacts peptide identification. Here, we used cervicovaginal lavage (CVL) samples from a study of bacterial vaginosis (BV) to compare the performance of databases built using six different strategies. We evaluated broad versus sample-matched databases, as well as databases populated with proteins translated from metagenomic sequencing of the same samples versus sequences from public repositories. Smaller sample-matched databases performed significantly better, driven by the statistical constraints on large databases. Additionally, large databases attributed up to 34% of significant bacterial hits to taxa absent from the sample, as determined orthogonally by 16S rRNA gene sequencing. We also tested a set of hybrid databases which included bacterial proteins from NCBI RefSeq and translated bacterial genes from the samples. These hybrid databases had the best overall performance, identifying 1,068 unique human and 1,418 unique bacterial proteins, ~30% more than a database populated with proteins from typical vaginal bacteria and fungi. Our findings can help guide the optimal identification of proteins while maintaining statistical power for reaching biological conclusions. IMPORTANCE Metaproteomic analysis can provide valuable insights into the functions of microbial and cellular communities by identifying a broad, untargeted set of proteins. The databases used in the analysis of metaproteomic data influence results by defining what proteins can be identified. Moreover, the size of the database impacts the number of identifications after accounting for false discovery rates (FDRs). Few studies have tested the performance of different strategies for building a protein database to identify proteins from metaproteomic data and those that have largely focused on highly diverse microbial communities. We tested a range of databases on CVL samples and found that a hybrid sample-matched approach, using publicly available proteins from organisms present in the samples, as well as proteins translated from metagenomic sequencing of the samples, had the best performance. However, our results also suggest that public sequence databases will continue to improve as more bacterial genomes are published.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10469846	PMC
http://dx.doi.org/10.1128/msystems.00678-22	DOI Listing

Publication Analysis

Top Keywords

metaproteomic analysis

databases

proteins

identification proteins

microbial communities

cvl samples

sample-matched databases

populated proteins

proteins translated

translated metagenomic

Similar Publications

π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing.

Nat Commun

January 2025

Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China.

Xiang Zhang Tianze Ling Zhi Jin Sheng Xu Zhiqiang Gao

Peptide sequencing via tandem mass spectrometry (MS/MS) is essential in proteomics. Unlike traditional database searches, deep learning excels at de novo peptide sequencing, even for peptides missing from existing databases. Current deep learning models often rely on autoregressive generation, which suffers from error accumulation and slow inference speeds.

View Article and Find Full Text PDF

Similar Publications

Tracing active members in microbial communities by BONCAT and click chemistry-based enrichment of newly synthesized proteins.

ISME Commun

January 2024

Otto-von-Guericke University Magdeburg, Bioprocess Engineering, Universitätsplatz 2, 39106 Magdeburg, Saxony-Anhalt, Germany.

Patrick Hellwig Daniel Kautzner Robert Heyer Anna Dittrich Daniel Wibberg

A comprehensive understanding of microbial community dynamics is fundamental to the advancement of environmental microbiology, human health, and biotechnology. Metaproteomics, defined as the analysis of all proteins present within a microbial community, provides insights into these complex systems. Microbial adaptation and activity depend to an important extent on newly synthesized proteins (nP), however, the distinction between nP and bulk proteins is challenging.

View Article and Find Full Text PDF

Similar Publications

Stable isotope fingerprinting can directly link intestinal microorganisms with their carbon source and captures diet-induced substrate switching .

bioRxiv

December 2024

Department of Plant and Microbial Biology, North Carolina State University, Raleigh NC.

Angie Mordant J Alfredo Blakeley-Ruiz Manuel Kleiner

Unlabelled: Diet has strong impacts on the composition and function of the gut microbiota with implications for host health. Therefore, it is critical to identify the dietary components that support growth of specific microorganisms . We used protein-based stable isotope fingerprinting (Protein-SIF) to link microbial species in gut microbiota to their carbon sources by measuring each microbe's natural C content (δC) and matching it to the C content of available substrates.

View Article and Find Full Text PDF

Similar Publications

Faecal metaproteomics analysis reveals a high cardiovascular risk profile across healthy individuals and heart failure patients.

Gut Microbes

December 2025

Hypertension Research Laboratory, School of Biological Sciences, Faculty of Science, Monash, Clayton, Australia.

Chaoran Yang Leticia Camargo Tavares Han-Chung Lee Joel R Steele Rosilene V Ribeiro

The gut microbiota is a crucial link between diet and cardiovascular disease (CVD). Using fecal metaproteomics, a method that concurrently captures human gut and microbiome proteins, we determined the crosstalk between gut microbiome, diet, gut health, and CVD. Traditional CVD risk factors (age, BMI, sex, blood pressure) explained < 10% of the proteome variance.

View Article and Find Full Text PDF

Similar Publications

Multi-level analysis of gut microbiome extracellular vesicles-host interaction reveals a connection to gut-brain axis signaling.

Microbiol Spectr

December 2024

NuGut Research Platform, School of Nutrition Sciences, Faculty of Health Sciences, University of Ottawa, Ottawa, Canada.

Walid Mottawea Basit Yousuf Salma Sultan Tamer Ahmed JuDong Yeo

Unlabelled: Microbiota-released extracellular vesicles (MEVs) have emerged as a key player in intercellular signaling. However, their involvement in the gut-brain axis has been poorly investigated. We hypothesize that MEVs cross host cellular barriers and deliver their cargoes of bioactive compounds to the brain.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!