Adopting proteogenomics approach to validate single nucleotide variation events by identifying corresponding single amino acid variant peptides from mass spectrometry (MS)-based proteomics data facilitates translational and clinical research. Although variant peptides are usually identified from MS data with a stringent false discovery rate (FDR), FDR control could fail to eliminate dubious results caused by several issues; thus, postexamination to eliminate dubious results is required. However, comprehensive postexaminations of identification results are still lacking.
View Article and Find Full Text PDFIdentifying peptides and proteins from mass spectrometry (MS) data, spectral library searching has emerged as a complementary approach to the conventional database searching. However, for the spectrum-centric analysis of data-independent acquisition (DIA) data, spectral library searching has not been widely exploited because existing spectral library search tools are mainly designed and optimized for the analysis of data-dependent acquisition (DDA) data. We present Calibr, a spectral library search tool for spectrum-centric DIA data analysis.
View Article and Find Full Text PDFLung adenocarcinoma (LUAD) patients in East Asia predominantly harbor oncogenic mutations. However, there remains a limited understanding of the biological characteristics and therapeutic vulnerabilities of the concurrent mutations of and other genes in LUAD. Here, we performed comprehensive bioinformatics analyses on 88 treatment-naïve East Asian LUAD patients.
View Article and Find Full Text PDFPhosphoproteomics can provide insights into cellular signaling dynamics. To achieve deep and robust quantitative phosphoproteomics profiling for minute amounts of sample, we here develop a global phosphoproteomics strategy based on data-independent acquisition (DIA) mass spectrometry and hybrid spectral libraries derived from data-dependent acquisition (DDA) and DIA data. Benchmarking the method using 166 synthetic phosphopeptides shows high sensitivity (<0.
View Article and Find Full Text PDFMass spectrometry-based proteomics using isobaric labeling for multiplex quantitation has become a popular approach for proteomic studies. We present Multi-Q 2, an isobaric-labeling quantitation tool which can yield the largest quantitation coverage and improved quantitation accuracy compared to three state-of-the-art methods. Multi-Q 2 supports identification results from several popular proteomic data analysis platforms for quantitation, offering up to 12% improvement in quantitation coverage for accepting identification results from multiple search engines when compared with MaxQuant and PatternLab.
View Article and Find Full Text PDFConcatenated target-decoy database searches are commonly used in proteogenomic research for variant peptide identification. Currently, protein-based and peptide-based sequence databases are applied to store variant sequences for database searches. The protein-based database records a full-length wild-type protein sequence but using the given variant events to replace the original amino acids, whereas the peptide-based database retains only the in silico digested peptides containing the variants.
View Article and Find Full Text PDFIdentifying single-amino-acid variants (SAVs) from mass spectrometry-based experiments is critical for validating single-nucleotide variants (SNVs) at the protein level to facilitate biomedical research. Currently, two approaches are usually applied to convert SNV annotations into SAV-harboring protein sequences. One approach generates one sequence containing exactly one SAV, and the other all SAVs.
View Article and Find Full Text PDFN-linked glycosylation is one of the predominant post-translational modifications involved in a number of biological functions. Since experimental characterization of glycosites is challenging, glycosite prediction is crucial. Several predictors have been made available and report high performance.
View Article and Find Full Text PDFWhen conducting proteomics experiments to detect missing proteins and protein isoforms in the human proteome, it is desirable to use a protease that can yield more unique peptides with properties amenable for mass spectrometry analysis. Though trypsin is currently the most widely used protease, some proteins can yield only a limited number of unique peptides by trypsin digestion. Other proteases and multiple proteases have been applied in reported studies to increase the number of identified proteins and protein sequence coverage.
View Article and Find Full Text PDFProtein and peptide identification and quantitation are essential tasks in proteomics research and involve a series of steps in analyzing mass spectrometry data. Trans-Proteomic Pipeline (TPP) provides a wide range of useful tools through its web interfaces for analyses such as sequence database search, statistical validation, and quantitation. To utilize the powerful functionality of TPP without the need for manual intervention to launch each step, we developed a software tool, called WinProphet, to create and automatically execute a pipeline for proteomic analyses.
View Article and Find Full Text PDFHuman embryonic stem cells (hESCs) have the capacity for self-renewal and multilineage differentiation, which are of clinical importance for regeneration medicine. Despite the significant progress of hESC study, the complete hESC proteome atlas, especially the surface protein composition, awaits delineation. According to the latest release of neXtProt database (January 17, 2018; 19 658 PE1, 2, 3, and 4 human proteins), membrane proteins present the major category (1047; 48%) among all 2186 missing proteins (MPs).
View Article and Find Full Text PDFIn proteogenomic studies, many genome-annotated events, for example, single amino acid variation (SAAV) and short INDEL, are often unobserved in shotgun proteomics. Therefore, we propose an analysis pipeline called LeTE-fusion (Le, peptide length; T, theoretical values; E, experimental data) to first investigate whether peptides with certain lengths are observed more often in mass spectrometry (MS)-based proteomics, which may hinder peptide identification causing difficulty in detecting genome-annotated events. By applying LeTE-fusion on different MS-based proteome data sets, we found peptides within 7-20 amino acids are more frequently identified, possibly attributed to MS-related factors instead of proteases.
View Article and Find Full Text PDFTo confirm the existence of missing proteins, we need to identify at least two unique peptides with length of 9-40 amino acids of a missing protein in bottom-up mass-spectrometry-based proteomic experiments. However, an identified unique peptide of the missing protein, even identified with high level of confidence, could possibly coincide with a peptide of a commonly observed protein due to isobaric substitutions, mass modifications, alternative splice isoforms, or single amino acid variants (SAAVs). Besides unique peptides of missing proteins, identified variant peptides (SAAV-containing peptides) could also alternatively map to peptides of other proteins due to the aforementioned issues.
View Article and Find Full Text PDFAlthough EGFR tyrosine kinase inhibitors (TKIs) have demonstrated good efficacy in non-small-cell lung cancer (NSCLC) patients harboring EGFR mutations, most patients develop intrinsic and acquired resistance. We quantitatively profiled the phosphoproteome and proteome of drug-sensitive and drug-resistant NSCLC cells under gefitinib treatment. The construction of a dose-dependent responsive kinase-substrate network of 1548 phosphoproteins and 3834 proteins revealed CK2-centric modules as the dominant core network for the potential gefitinib resistance-associated proteins.
View Article and Find Full Text PDFMAGIC-web is the first web server, to the best of our knowledge, that performs both untargeted and targeted analyses of mass spectrometry-based glycoproteomics data for site-specific N-linked glycoprotein identification. The first two modules, MAGIC and MAGIC+, are designed for untargeted and targeted analysis, respectively. MAGIC is implemented with our previously proposed novel Y1-ion pattern matching method, which adequately detects Y1- and Y0-ion without prior information of proteins and glycans, and then generates in silico MS(2) spectra that serve as input to a database search engine (e.
View Article and Find Full Text PDFMembrane proteins are crucial targets for cancer biomarker discovery and drug development. However, in addition to the inherent challenges of hydrophobicity and low abundance, complete membrane proteome coverage of clinical specimen is usually hindered by the requirement of large amount of starting materials. Toward comprehensive membrane proteomic profiling for small amounts of samples (10 μg), we developed high-pH reverse phase (Hp-RP) combined with stop-and-go extraction tip (StageTip) technique, as a fast (∼15 min.
View Article and Find Full Text PDFProtein experiment evidence at protein level from mass spectrometry and antibody experiments are essential to characterize the human proteome. neXtProt (2014-09 release) reported 20 055 human proteins, including 16 491 proteins identified at protein level and 3564 proteins unidentified. Excluding 616 proteins at uncertain level, 2948 proteins were regarded as missing proteins.
View Article and Find Full Text PDFDespite significant efforts in the past decade toward complete mapping of the human proteome, 3564 proteins (neXtProt, 09-2014) are still "missing proteins". Over one-third of these missing proteins are annotated as membrane proteins, owing to their relatively challenging accessibility with standard shotgun proteomics. Using nonsmall cell lung cancer (NSCLC) as a model study, we aim to mine missing proteins from disease-associated membrane proteome, which may be still largely under-represented.
View Article and Find Full Text PDFMethodologies to enrich heterogeneous types of phosphopeptides are critical for comprehensive mapping of the under-explored phosphoproteome. Taking advantage of the distinct binding affinities of Ga(3+) and Fe(3+) for phosphopeptides, we designed a metal-directed immobilized metal ion affinity chromatography for the sequential enrichment of phosphopeptides. In Raji B cells, the sequential Ga(3+)-Fe(3+)-immobilized metal affinity chromatography (IMAC) strategy displayed a 1.
View Article and Find Full Text PDFChromosome 4 is the fourth largest chromosome, containing approximately 191 megabases (~6.4% of the human genome) with 757 protein-coding genes. A number of marker genes for many diseases have been found in this chromosome, including genetic diseases (e.
View Article and Find Full Text PDF