In the version of this article published, the P values for the enrichment of single mutation categories were inadvertently not corrected for multiple testing. After multiple-testing correction, only two of the six mutation categories mentioned are still statistically significant. To reflect this, the text "More specifically, paternally derived DNMs are enriched in transitions in A[.
View Article and Find Full Text PDFSummary: Gap-filling is a necessary step to produce quality genome-scale metabolic reconstructions capable of flux-balance simulation. Most available gap-filling tools use an organism-agnostic approach, where reactions are selected from a database to fill gaps without consideration of the target organism. Conversely, our likelihood based gap-filling with probabilistic annotations selects candidate reactions based on a likelihood score derived specifically from the target organism's genome.
View Article and Find Full Text PDFThe results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis.
View Article and Find Full Text PDFDe novo mutations (DNMs) originating in gametogenesis are an important source of genetic variation. We use a data set of 7,216 autosomal DNMs with resolved parent of origin from whole-genome sequencing of 816 parent-offspring trios to investigate differences between maternally and paternally derived DNMs and study the underlying mutational mechanisms. Our results show that the number of DNMs in offspring increases not only with paternal age, but also with maternal age, and that some genome regions show enrichment for maternally derived DNMs.
View Article and Find Full Text PDFThe identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150-1000× compression) that enables such analyses.
View Article and Find Full Text PDFInferring which protein species have been detected in bottom-up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stands as one of the greatest barriers in collaborative efforts such as the Human Proteome Project and public repositories such as the PRoteomics IDEntifications (PRIDE) database.
View Article and Find Full Text PDFPeptideAtlas, SRMAtlas, and PASSEL are Web-accessible resources to support discovery and targeted proteomics research. PeptideAtlas is a multi-species compendium of shotgun proteomic data provided by the scientific community; SRMAtlas is a resource of high-quality, complete proteome SRM assays generated in a consistent manner for the targeted identification and quantification of proteins; and PASSEL is a repository that compiles and represents selected reaction monitoring data, all in an easy-to-use interface. The databases are generated from native mass spectrometry data files that are analyzed in a standardized manner including statistical validation of the results.
View Article and Find Full Text PDFThe kidney, urine, and plasma proteomes are intimately related: proteins and metabolic waste products are filtered from the plasma by the kidney and excreted via the urine, while kidney proteins may be secreted into the circulation or released into the urine. Shotgun proteomics data sets derived from human kidney, urine, and plasma samples were collated and processed using a uniform software pipeline, and relative protein abundances were estimated by spectral counting. The resulting PeptideAtlas builds yielded 4005, 2491, and 3553 nonredundant proteins at 1% FDR for the kidney, urine, and plasma proteomes, respectively - for kidney and plasma, the largest high-confidence protein sets to date.
View Article and Find Full Text PDFResearch advancing our understanding of Mycobacterium tuberculosis (Mtb) biology and complex host-Mtb interactions requires consistent and precise quantitative measurements of Mtb proteins. We describe the generation and validation of a compendium of assays to quantify 97% of the 4,012 annotated Mtb proteins by the targeted mass spectrometric method selected reaction monitoring (SRM). Furthermore, we estimate the absolute abundance for 55% of all Mtb proteins, revealing a dynamic range within the Mtb proteome of over four orders of magnitude, and identify previously unannotated proteins.
View Article and Find Full Text PDFThe Human Proteome Project was launched in September 2010 with the goal of characterizing at least one protein product from each protein-coding gene. Here we assess how much of the proteome has been detected to date via tandem mass spectrometry by analyzing PeptideAtlas, a compendium of human derived LC-MS/MS proteomics data from many laboratories around the world. All data sets are processed with a consistent set of parameters using the Trans-Proteomic Pipeline and subjected to a 1% protein FDR filter before inclusion in PeptideAtlas.
View Article and Find Full Text PDFProteome information resources of farm animals are lagging behind those of the classical model organisms despite their important biological and economic relevance. Here, we present a Bovine PeptideAtlas, representing a first collection of Bos taurus proteome data sets within the PeptideAtlas framework. This database was built primarily as a source of information for designing selected reaction monitoring assays for studying milk production and mammary gland health, but it has an intrinsic general value for the farm animal research community.
View Article and Find Full Text PDFThe rigorous testing of hypotheses on suitable sample cohorts is a major limitation in translational research. This is particularly the case for the validation of protein biomarkers; the lack of accurate, reproducible, and sensitive assays for most proteins has precluded the systematic assessment of hundreds of potential marker proteins described in the literature. Here, we describe a high-throughput method for the development and refinement of selected reaction monitoring (SRM) assays for human proteins.
View Article and Find Full Text PDFPublic repositories for proteomics data have accelerated proteomics research by enabling more efficient cross-analyses of datasets, supporting the creation of protein and peptide compendia of experimental results, supporting the development and testing of new software tools, and facilitating the manuscript review process. The repositories available to date have been designed to accommodate either shotgun experiments or generic proteomic data files. Here, we describe a new kind of proteomic data repository for the collection and representation of data from selected reaction monitoring (SRM) measurements.
View Article and Find Full Text PDFHuman blood plasma can be obtained relatively noninvasively and contains proteins from most, if not all, tissues of the body. Therefore, an extensive, quantitative catalog of plasma proteins is an important starting point for the discovery of disease biomarkers. In 2005, we showed that different proteomics measurements using different sample preparation and analysis techniques identify significantly different sets of proteins, and that a comprehensive plasma proteome can be compiled only by combining data from many different experiments.
View Article and Find Full Text PDFPeptideAtlas is a web-accessible database of LC-MS/MS shotgun proteomics results from hundreds of experiments conducted in diverse laboratories, with all data processed via a uniform analysis pipeline. A total of 91 experiments on human plasma and serum are included. Using the PeptideAtlas web interface, users can browse and search the Human Plasma PeptideAtlas for identified peptides and identified proteins, view spectra, and select proteotypic peptides.
View Article and Find Full Text PDFThe Trans-Proteomic Pipeline (TPP) is a suite of software tools for the analysis of MS/MS data sets. The tools encompass most of the steps in a proteomic data analysis workflow in a single, integrated software system. Specifically, the TPP supports all steps from spectrometer output file conversion to protein-level statistical validation, including quantification by stable isotope ratios.
View Article and Find Full Text PDFElectron transfer dissociation (ETD) is an alternative fragmentation technique to CID that has recently become commercially available. ETD has several advantages over CID. It is less prone to fragmenting amino acid side chains, especially those that are modified, thus yielding fragment ion spectra with more uniform peak intensities.
View Article and Find Full Text PDFSelected reaction monitoring (SRM) uses sensitive and specific mass spectrometric assays to measure target analytes across multiple samples, but it has not been broadly applied in proteomics owing to the tedious assay development process for each protein. We describe a method based on crude synthetic peptide libraries for the high-throughput development of SRM assays. We illustrate the power of the approach by generating and applying validated SRM assays for all Saccharomyces cerevisiae kinases and phosphatases.
View Article and Find Full Text PDF