Motivation: Genomic analyses of many solid cancers have demonstrated extensive genetic heterogeneity between as well as within individual tumors. However, statistical methods for classifying tumors by subtype based on genomic biomarkers generally entail an all-or-none decision, which may be misleading for clinical samples containing a mixture of subtypes and/or normal cell contamination.
Results: We have developed a mixed-membership classification model, called glad, that simultaneously learns a sparse biomarker signature for each subtype as well as a distribution over subtypes for each sample.
Plasmodium parasites, the causal agents of malaria, result in more than 1 million deaths annually. Plasmodium are unicellular eukaryotes with small ∼23 Mb genomes encoding ∼5200 protein-coding genes. The protein-coding genes comprise about half of these genomes.
View Article and Find Full Text PDFType I collagen, the predominant protein of vertebrates, polymerizes with type III and V collagens and non-collagenous molecules into large cable-like fibrils, yet how the fibril interacts with cells and other binding partners remains poorly understood. To help reveal insights into the collagen structure-function relationship, a data base was assembled including hundreds of type I collagen ligand binding sites and mutations on a two-dimensional model of the fibril. Visual examination of the distribution of functional sites, and statistical analysis of mutation distributions on the fibril suggest it is organized into two domains.
View Article and Find Full Text PDFAdv Neural Inf Process Syst
January 2008
Statistical evolutionary models provide an important mechanism for describing and understanding the escape response of a viral population under a particular therapy. We present a new hierarchical model that incorporates spatially varying mutation and recombination rates at the nucleotide level. It also maintains separate parameters for treatment and control groups, which allows us to estimate treatment effects explicitly.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
May 2005
Sequence comparison across multiple organisms aids in the detection of regions under selection. However, resource limitations require a prioritization of genomes to be sequenced. This prioritization should be grounded in two considerations: the lineal scope encompassing the biological phenomena of interest, and the optimal species within that scope for detecting functional elements.
View Article and Find Full Text PDFWe previously characterized nutrient-specific transcriptional changes in Escherichia coli upon limitation of nitrogen (N) or sulfur (S). These global homeostatic responses presumably minimize the slowing of growth under a particular condition. Here, we characterize responses to slow growth per se that are not nutrient-specific.
View Article and Find Full Text PDFWe determined global transcriptional responses of Escherichia coli K-12 to sulfur (S)- or nitrogen (N)-limited growth in adapted batch cultures and cultures subjected to nutrient shifts. Using two limitations helped to distinguish between nutrient-specific changes in mRNA levels and common changes related to the growth rate. Both homeostatic and slow growth responses were amplified upon shifts.
View Article and Find Full Text PDFMotivation: Phylogenetic shadowing is a comparative genomics principle that allows for the discovery of conserved regions in sequences from multiple closely related organisms. We develop a formal probabilistic framework for combining phylogenetic shadowing with feature-based functional annotation methods. The resulting model, a generalized hidden Markov phylogeny (GHMP), applies to a variety of situations where functional regions are to be inferred from evolutionary constraints.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
August 2003
High-pressure liquid chromatography-tandem mass spectrometry was used to obtain a protein profile of Escherichia coli strain MG1655 grown in minimal medium with glycerol as the carbon source. By using cell lysate from only 3 x 108 cells, at least four different tryptic peptides were detected for each of 404 proteins in a short 4-h experiment. At least one peptide with a high reliability score was detected for 986 proteins.
View Article and Find Full Text PDFNonhuman primates represent the most relevant model organisms to understand the biology of Homo sapiens. The recent divergence and associated overall sequence conservation between individual members of this taxon have nonetheless largely precluded the use of primates in comparative sequence studies. We used sequence comparisons of an extensive set of Old World and New World monkeys and hominoids to identify functional regions in the human genome.
View Article and Find Full Text PDF