Publications by authors named "Baozhen Shan"

De novo peptide sequencing is one of the most fundamental research areas in mass spectrometry-based proteomics. Many methods have often been evaluated using a couple of simple metrics that do not fully reflect their overall performance. Moreover, there has not been an established method to estimate the false discovery rate (FDR) of de novo peptide-spectrum matches.

View Article and Find Full Text PDF

False changes discovered by quantitative proteomics reduce the trust of biologists in proteomics and limit the applications of proteomics to unlock biological mechanisms, which suppresses the application of proteomics techniques in the pharmaceutical industry more than it does in academic research. To remove false changes that arise during LC-MS/MS data acquisition, we evaluated the contributions of peptide abundance and number of unique peptides on reproducibility. Lower abundance and only one unique peptide have a higher risk of generating a higher coefficient of variation (CV), resulting in less accurate quantification.

View Article and Find Full Text PDF

Considering the substantial impact of venous ulcers on quality of life and healthcare systems, this study evaluated the efficacy and safety of platelet-rich plasma (PRP) in comparison to conventional therapy. A systematic review of four databases identified 16 randomized clinical trials, including 20 study groups. PRP significantly enhanced complete ulcer healing, exhibiting an odds ratio (OR) of 5.

View Article and Find Full Text PDF

Despite the advantages of fewer missing values by collecting fragment ion data on all analytes in the sample as well as the potential for deeper coverage, the adoption of data-independent acquisition (DIA) in proteomics core facility settings has been slow. The Association of Biomolecular Resource Facilities conducted a large interlaboratory study to evaluate DIA performance in proteomics laboratories with various instrumentation. Participants were supplied with generic methods and a uniform set of test samples.

View Article and Find Full Text PDF

Here we present GlycanFinder, a database search and de novo sequencing tool for the analysis of intact glycopeptides from mass spectrometry data. GlycanFinder integrates peptide-based and glycan-based search strategies to address the challenge of complex fragmentation of glycopeptides. A deep learning model is designed to capture glycan tree structures and their fragment ions for de novo sequencing of glycans that do not exist in the database.

View Article and Find Full Text PDF

Integrating data-dependent acquisition (DDA) and data-independent acquisition (DIA) approaches can enable highly sensitive mass spectrometry, especially for imunnopeptidomics applications. Here we report a streamlined platform for both DDA and DIA data analysis. The platform integrates deep learning-based solutions of spectral library search, database search, and de novo sequencing under a unified framework, which not only boosts the sensitivity but also accurately controls the specificity of peptide identification.

View Article and Find Full Text PDF

A promising technique of discovering disease biomarkers is to measure the relative protein abundance in multiple biofluid samples through liquid chromatography with tandem mass spectrometry (LC-MS/MS) based quantitative proteomics. The key step involves peptide feature detection in the LC-MS map, along with its charge and intensity. Existing heuristic algorithms suffer from inaccurate parameters and human errors.

View Article and Find Full Text PDF

Liquid chromatography with tandem mass spectrometry (LC-MS/MS) based quantitative proteomics provides the relative different protein abundance in healthy and disease-afflicted patients, which offers the information for molecular interactions, signaling pathways, and biomarker identification to serve the drug discovery and clinical research. Typical analysis workflow begins with the peptide feature detection and intensity calculation from LC-MS map. We are the first to propose a deep learning based model, DeepIso, that combines recent advances in Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) to detect peptide features of different charge states, as well as, estimate their intensity.

View Article and Find Full Text PDF

Accurate and sensitive identification of peptides from MS/MS spectra is a very challenging problem in computational shotgun proteomics. To tackle this problem, spectral library search has been one of the competitive solutions. However, most existing library search tools were developed on the basis of one peptide per spectrum, which prevents them from working properly on chimeric spectra where two or more peptides are co-fragmented.

View Article and Find Full Text PDF
Article Synopsis
  • DeepNovo-DIA is a new method for sequencing peptides using data-independent acquisition (DIA) mass spectrometry.
  • It utilizes neural networks to analyze and interpret complex data involving precursor and fragment ions across various dimensions like mass/charge ratio (m/z) and retention time.
  • This combination of DIA and de novo sequencing has enabled the discovery of new peptides in human antibodies and antigens.
View Article and Find Full Text PDF

Motivation: Enzymatic digestion under appropriate reducing conditions followed by mass spectrometry analysis has emerged as the primary method for disulfide bond analysis. The large amount of mass spectral data collected in the mass spectrometry experiment requires effective computational approaches to automate the interpretation process. Although different approaches have been developed for such purpose, they always choose to ignore the frequently observed internal ion fragments and they lack a reasonable quality control strategy and calibrated scoring scheme for the statistical validation and ranking of the reported results.

View Article and Find Full Text PDF

De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides.

View Article and Find Full Text PDF

De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences.

View Article and Find Full Text PDF

Glycosylation is one of the most commonly observed post-translational modifications (PTMs) in eukaryotes. It is believed that more than 50% eukaryotic proteins are glycosylated. To reveal the biological functions of protein-linked glycans involved in numerous biological processes, the high-throughput identification of both glycoproteins and the attached glycan structures becomes fundamentally important.

View Article and Find Full Text PDF

The milk of the one-humped camel (Camelus dromedarius) reportedly offers medicinal benefits, perhaps because of its unique bioactive components. Milk proteins were determined by (1) two-dimensional gel electrophoresis and peptide mass mapping and (2) liquid chromatography-tandem mass spectrometry (LC-MS/MS) following one-dimensional polyacrylamide gel electrophoresis. Over 200 proteins were identified: some known camel proteins including heavy-chain immunoglobulins and others exhibiting regions of exact homology with proteins from other species.

View Article and Find Full Text PDF

The workshop "Bioinformatics for Biotechnology Applications (HavanaBioinfo 2012)", held December 8-11, 2012 in Havana, aimed at exploring new bioinformatics tools and approaches for large-scale proteomics, genomics and chemoinformatics. Major conclusions of the workshop include the following: (i) development of new applications and bioinformatics tools for proteomic repository analysis is crucial; current proteomic repositories contain enough data (spectra/identifications) that can be used to increase the annotations in protein databases and to generate new tools for protein identification; (ii) spectral libraries, de novo sequencing and database search tools should be combined to increase the number of protein identifications; (iii) protein probabilities and FDR are not yet sufficiently mature; (iv) computational proteomics software needs to become more intuitive; and at the same time appropriate education and training should be provided to help in the efficient exchange of knowledge between mass spectrometrists and experimental biologists and bioinformaticians in order to increase their bioinformatics background, especially statistics knowledge.

View Article and Find Full Text PDF

Many software tools have been developed for the automated identification of peptides from tandem mass spectra. The accuracy and sensitivity of the identification software via database search are critical for successful proteomics experiments. A new database search tool, PEAKS DB, has been developed by incorporating the de novo sequencing results into the database search.

View Article and Find Full Text PDF

Tandem mass spectrometry (MS/MS) has been routinely used to identify peptides from a protein sequence database. To identify post-translationally modified peptides, most existing software requires the specification of a few possible modifications. However, such knowledge of possible modifications is not always available.

View Article and Find Full Text PDF

Background: Tandem mass spectrometry (MS/MS) has become the primary way for protein identification in proteomics. A good score function for measuring the match quality between a peptide and an MS/MS spectrum is instrumental for the protein identification. Traditionally the to-be-measured peptides are fragmented with the collision induced dissociation (CID) method.

View Article and Find Full Text PDF

Determining glycan structures is vital to comprehend cell-matrix, cell-cell, and even intracellular biological events. Glycan sequencing, which determines the primary structure of a glycan using tandem mass spectrometry (MS/MS), remains one of the most important tasks in proteomics. Analogous to peptide de novo sequencing, glycan de novo sequencing determines the structure without the aid of a known glycan database.

View Article and Find Full Text PDF