Publications by authors named "Valmir C Barbosa"

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimension. Like division by the standard deviation, the great majority of scaling techniques can be said to have roots in some sort of statistical take on the data.

View Article and Find Full Text PDF

Complex protein mixtures typically generate many tandem mass spectra produced by different peptides coisolated in the gas phase. Widely adopted proteomic data analysis environments usually fail to identify most of these spectra, succeeding at best in identifying only one of the multiple cofragmenting peptides. We present PatternLab V (PLV), an updated version of PatternLab that integrates the YADA 3 deconvolution algorithm to handle such cases efficiently.

View Article and Find Full Text PDF

Motivation: There are several well-established paradigms for identifying and pinpointing discriminative peptides/proteins using shotgun proteomic data; examples are peptide-spectrum matching, de novo sequencing, open searches, and even hybrid approaches. Such an arsenal of complementary paradigms can provide deep data coverage, albeit some unidentified discriminative peptides remain.

Results: We present DiagnoMass, software tool that groups similar spectra into spectral clusters and then shortlists those clusters that are discriminative for biological conditions.

View Article and Find Full Text PDF

Motivation: Confident deconvolution of proteomic spectra is critical for several applications such as de novo sequencing, cross-linking mass spectrometry and handling chimeric mass spectra.

Results: In general, all deconvolution algorithms may eventually report mass peaks that are not compatible with the chemical formula of any peptide. We show how to remove these artifacts by considering their mass defects.

View Article and Find Full Text PDF

Shotgun proteomics aims to identify and quantify the thousands of proteins in complex mixtures such as cell and tissue lysates and biological fluids. This approach uses liquid chromatography coupled with tandem mass spectrometry and typically generates hundreds of thousands of mass spectra that require specialized computational environments for data analysis. PatternLab for proteomics is a unified computational environment for analyzing shotgun proteomic data.

View Article and Find Full Text PDF
Article Synopsis
  • In proteomics, identifying peptides from mass spectra involves clustering spectral data, and the validation of these clusters is crucial as it affects the choice of algorithms used.
  • A new partition assessment tool is introduced that selectively biases toward the number of peptide ion species, which is beneficial for estimating peptide counts in complex mixtures.
  • The study evaluates eight clustering algorithms across seven datasets, highlighting the trade-offs between different approaches and their implications for proteomic analysis.
View Article and Find Full Text PDF

Bacterial quorum sensing is the communication that takes place between bacteria as they secrete certain molecules into the intercellular medium that later get absorbed by the secreting cells themselves and by others. Depending on cell density, this uptake has the potential to alter gene expression and thereby affect global properties of the community. We consider the case of multiple bacterial species coexisting, referring to each one of them as a genotype and adopting the usual denomination of the molecules they collectively secrete as public goods.

View Article and Find Full Text PDF

We present the Mixed-Data Acquisition (MDA) strategy for mass spectrometry data acquisition. MDA combines Data-Dependent Acquisition (DDA) and Data-Independent Acquisition (DIA) in the same run, thus doing away with the requirements for separate DDA spectral libraries. MDA is a natural result from advances in mass spectrometry, such as high scan rates and multiple analyzers, and is tailored toward exploiting these features.

View Article and Find Full Text PDF

We present a new module integrated into the widely adopted PatternLab for proteomics to enable analysis of isotope-labeled peptides produced using dimethyl or SILAC. The accurate quantitation of proteins lies within the heart of proteomics; dimethylation has shown to be reliable, inexpensive, and applicable to any sample type. We validate our algorithm using an M.

View Article and Find Full Text PDF

Background: Worldwide, breast cancer is the main cause of cancer mortality in women. Most cases originate in mammary ductal cells that produce the nipple aspirate fluid (NAF). In cancer patients, this secretome contains proteins associated with the tumor microenvironment.

View Article and Find Full Text PDF

Motivation: We present the first tool for unbiased quality control of top-down proteomics datasets. Our tool can select high-quality top-down proteomics spectra, serve as a gateway for building top-down spectral libraries and, ultimately, improve identification rates.

Results: We demonstrate that a twofold rate increase for two E.

View Article and Find Full Text PDF

Analyzing the information content of DNA, though holding the promise to help quantify how the processes of evolution have led to information gain throughout the ages, has remained an elusive goal. Paradoxically, one of the main reasons for this has been precisely the great diversity of life on the planet: if on the one hand this diversity is a rich source of data for information-content analysis, on the other hand there is so much variation as to make the task unmanageable. During the past decade or so, however, succinct fragments of the COI mitochondrial gene, which is present in all animal phyla and in a few others, have been shown to be useful for species identification through DNA barcoding.

View Article and Find Full Text PDF

Cross-linking coupled with mass spectrometry (XL-MS) has emerged as a powerful strategy for the identification of protein-protein interactions, characterization of interaction regions, and obtainment of structural information on proteins and protein complexes. In XL-MS, proteins or complexes are covalently stabilized with cross-linkers and digested, followed by identification of the cross-linked peptides by tandem mass spectrometry (MS/MS). This provides spatial constraints that enable modeling of protein (complex) structures and regions of interaction.

View Article and Find Full Text PDF

Venoms are a rich source for the discovery of molecules with biotechnological applications, but their analysis is challenging even for state-of-the-art proteomics. Here we report on a large-scale proteomic assessment of the venom of Loxosceles intermedia, the so-called brown spider. Venom was extracted from 200 spiders and fractioned into two aliquots relative to a 10 kDa cutoff mass.

View Article and Find Full Text PDF

Motivation: Around 75% of all mass spectra remain unidentified by widely adopted proteomic strategies. We present DiagnoProt, an integrated computational environment that can efficiently cluster millions of spectra and use machine learning to shortlist high-quality unidentified mass spectra that are discriminative of different biological conditions.

Results: We exemplify the use of DiagnoProt by shortlisting 4366 high-quality unidentified tandem mass spectra that are discriminative of different types of the Aspergillus fungus.

View Article and Find Full Text PDF

This work introduces a new methodology for the early detection of epileptic seizure based on the WiSARD weightless neural network model and a new approach in terms of preprocessing the electroencephalogram (EEG) data. WiSARD has, among other advantages, the capacity of perform the training phase in a very fast way. This speed in training is due to the fact that WiSARD's neurons work like Random Access Memories (RAM) addressed by input patterns.

View Article and Find Full Text PDF

PatternLab for proteomics is an integrated computational environment that unifies several previously published modules for the analysis of shotgun proteomic data. The contained modules allow for formatting of sequence databases, peptide spectrum matching, statistical filtering and data organization, extracting quantitative information from label-free and chemically labeled data, and analyzing statistics for differential proteomics. PatternLab also has modules to perform similarity-driven studies with de novo sequencing data, to evaluate time-course experiments and to highlight the biological significance of data with regard to the Gene Ontology database.

View Article and Find Full Text PDF

PepExplorer aids in the biological interpretation of de novo sequencing results; this is accomplished by assembling a list of homolog proteins obtained by aligning results from widely adopted de novo sequencing tools against a target-decoy sequence database. Our tool relies on pattern recognition to ensure that the results satisfy a user-given false-discovery rate (FDR). For this, it employs a radial basis function neural network that considers the precursor charge states, de novo sequencing scores, the peptide lengths, and alignment scores.

View Article and Find Full Text PDF

Chemical cross-linking has emerged as a powerful approach for the structural characterization of proteins and protein complexes. However, the correct identification of covalently linked (cross-linked or XL) peptides analyzed by tandem mass spectrometry is still an open challenge. Here we present SIM-XL, a software tool that can analyze data generated through commonly used cross-linkers (e.

View Article and Find Full Text PDF

The production of structurally significant product ions during the dissociation of phosphopeptides is a key to the successful determination of phosphorylation sites. These diagnostic ions can be generated using the widely adopted MS/MS approach, MS3 (Data Dependent Neutral Loss - DDNL), or by multistage activation (MSA). The main purpose of this work is to introduce a false-localization rate (FLR) probabilistic model to enable unbiased phosphoproteomics studies.

View Article and Find Full Text PDF

Peptide spectrum matching is the current gold standard for protein identification via mass-spectrometry-based proteomics. Peptide spectrum matching compares experimental mass spectra against theoretical spectra generated from a protein sequence database to perform identification, but protein sequences not present in a database cannot be identified unless their sequences are in part conserved. The alternative approach, de novo sequencing, can make it possible to infer a peptide sequence directly from a mass spectrum, but interpreting long lists of peptide sequences resulting from large-scale experiments is not trivial.

View Article and Find Full Text PDF

Accessing localized proteomic profiles has emerged as a fundamental strategy to understand the biology of diseases, as recently demonstrated, for example, in the context of determining cancer resection margins with improved precision. Here, we analyze a gastric cancer biopsy sectioned into 10 parts, each one subjected to MudPIT analysis. We introduce a software tool, named Shotgun Imaging Analyzer and inspired in MALDI imaging, to enable the overlaying of a protein's expression heat map on a tissue picture.

View Article and Find Full Text PDF

Given two subsets A and B of nodes in a directed graph, the conduciveness of the graph from A to B is the ratio representing how many of the edges outgoing from nodes in A are incoming to nodes in B. When the graph's nodes stand for the possible solutions to certain problems of combinatorial optimization, choosing its edges appropriately has been shown to lead to conduciveness properties that provide useful insight into the performance of algorithms to solve those problems. Here we study the conduciveness of CA-rule graphs, that is, graphs whose node set is the set of all CA rules given a cell's number of possible states and neighborhood size.

View Article and Find Full Text PDF

Summary: Protein identification by mass spectrometry is commonly accomplished using a peptide sequence matching search algorithm, whose sensitivity varies inversely with the size of the sequence database and the number of post-translational modifications considered. We present the Spectrum Identification Machine, a peptide sequence matching tool that capitalizes on the high-intensity b1-fragment ion of tandem mass spectra of peptides coupled in solution with phenylisotiocyanate to confidently sequence the first amino acid and ultimately reduce the search space. We demonstrate that in complex search spaces, a gain of some 120% in sensitivity can be achieved.

View Article and Find Full Text PDF