As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function.
View Article and Find Full Text PDFThis study compared the complete genome sequences of 16 NL63 strain human coronaviruses (hCoVs) from respiratory specimens of paediatric patients with respiratory disease in Colorado, USA, and characterized the epidemiology and clinical characteristics associated with circulating NL63 viruses over a 3-year period. From 1 January 2009 to 31 December 2011, 92 of 9380 respiratory specimens were found to be positive for NL63 RNA by PCR, an overall prevalence of 1 %. NL63 viruses were circulating during all 3 years, but there was considerable yearly variation in prevalence and the month of peak incidence.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
May 2011
A whole-genome phylogeny of the Escherichia coli/Shigella group was constructed by using the feature frequency profile (FFP) method. This alignment-free approach uses the frequencies of l-mer features of whole genomes to infer phylogenic distances. We present two phylogenies that accentuate different aspects of E.
View Article and Find Full Text PDFWe present a whole-proteome phylogeny of prokaryotes constructed by comparing feature frequency profiles (FFPs) of whole proteomes. Features are l-mers of amino acids, and each organism is represented by a profile of frequencies of all features. The selection of feature length is critical in the FFP method, and we have developed a procedure for identifying the optimal feature lengths for inferring the phylogeny of prokaryotes, strictly speaking, a proteome phylogeny.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
October 2009
Ten complete mammalian genome sequences were compared by using the "feature frequency profile" (FFP) method of alignment-free comparison. This comparison technique reveals that the whole nongenic portion of mammalian genomes contains evolutionary information that is similar to their genic counterparts--the intron and exon regions. We partitioned the complete genomes of mammals (such as human, chimp, horse, and mouse) into their constituent nongenic, intronic, and exonic components.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
August 2009
The vast sequence divergence among different virus groups has presented a great challenge to alignment-based sequence comparison among different virus families. Using an alignment-free comparison method, we construct the whole-proteome phylogeny for a population of viruses from 11 viral families comprising 142 large dsDNA eukaryote viruses. The method is based on the feature frequency profiles (FFP), where the length of the feature (l-mer) is selected to be optimal for phylogenomic inference.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
February 2009
For comparison of whole-genome (genic + nongenic) sequences, multiple sequence alignment of a few selected genes is not appropriate. One approach is to use an alignment-free method in which feature (or l-mer) frequency profiles (FFP) of whole genomes are used for comparison-a variation of a text or book comparison method, using word frequency profiles. In this approach it is critical to identify the optimal resolution range of l-mers for the given set of genomes compared.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
March 2006
A method is presented for scoring the model quality of experimental and theoretical protein structures. The structural model to be evaluated is dissected into small fragments via a sliding window, where each fragment is represented by a vector of multiple phi-psi angles. The sliding window ranges in size from a length of 1-10 phi-psi pairs (3-12 residues).
View Article and Find Full Text PDFWe have mapped protein conformational space from two to seven residue lengths by employing multidimensional scaling on a data matrix composed of pair-wise angular distances for multiple phi-Psi values collected from high-resolution protein structures. The resulting global maps show clustering of peptide conformations that reveals a dramatic reduction of conformational space as sampled by experimentally observed peptides. Each map can be viewed as a higher order phi-Psi plot defining regions of space that are conformationally allowed.
View Article and Find Full Text PDFA global conformational space of 6253 dinucleoside monophosphate (DMP) units consisting of RNA and DNA (free and protein/drug-bound) was 'mapped' using high resolution crystal structures cataloged in the Nucleic Acid Database (NDB). The torsion angles of each DMP were clustered in a reduced three-dimensional space using a classical multi-dimensional scaling method. The mapping of the conformational space reveals nine primary clusters which distinguish among the common A-, B- and Z-forms and their various substates, plus five secondary clusters for kinked or bent structures.
View Article and Find Full Text PDFOne of the principal goals of the structural genomics initiative is to identify the total repertoire of protein folds and obtain a global view of the "protein structure universe." Here, we present a 3D map of the protein fold space in which structurally related folds are represented by spatially adjacent points. Such a representation reveals a high-level organization of the fold space that is intuitively interpretable.
View Article and Find Full Text PDF