Gene duplication plays a central role in adaptation to novel environments by providing new genetic material for functional divergence and evolution of biological complexity. Several evolutionary models have been proposed for gene duplication to explain how new gene copies are preserved by natural selection, but these models have rarely been tested using empirical data. Opsin proteins, when combined with a chromophore, form a photopigment that is responsible for the absorption of light, the first step in the phototransduction cascade.
View Article and Find Full Text PDFUnlabelled: Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine clusters as a whole.
View Article and Find Full Text PDFBackground: Genome-wide association studies (GWAS) have effectively identified genetic factors for many diseases. Many diseases, including Alzheimer's disease (AD), have epistatic causes, requiring more sophisticated analyses to identify groups of variants which together affect phenotype.
Results: Based on the GWAS statistical model, we developed a multi-SNP GWAS analysis to identify pairs of variants whose common occurrence signaled the Alzheimer's disease phenotype.
BMC Bioinformatics
February 2016
Background: Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences.
View Article and Find Full Text PDFMotivation: The contig orientation problem, which we formally define as the MAX-DIR problem, has at times been addressed cursorily and at times using various heuristics. In setting forth a linear-time reduction from the MAX-CUT problem to the MAX-DIR problem, we prove the latter is NP-complete. We compare the relative performance of a novel greedy approach with several other heuristic solutions.
View Article and Find Full Text PDFBackground: Genome assemblers to date have predominantly targeted haploid reference reconstruction from homozygous data. When applied to diploid genome assembly, these assemblers perform poorly, owing to the violation of assumptions during both the contigging and scaffolding phases. Effective tools to overcome these problems are in growing demand.
View Article and Find Full Text PDFBMC Bioinformatics
October 2014
Background: Error correction is an important step in increasing the quality of next-generation sequencing data for downstream analysis and use. Polymorphic datasets are a challenge for many bioinformatic software packages that are designed for or assume homozygosity of an input dataset. This assumption ignores the true genomic composition of many organisms that are diploid or polyploid.
View Article and Find Full Text PDFBackground: Since the advent of microarray technology, numerous methods have been devised to infer gene regulatory relationships from gene expression data. Many approaches that infer entire regulatory networks. This produces results that are rich in information and yet so complex that they are often of limited usefulness for researchers.
View Article and Find Full Text PDFBackground: DNA methylation has been linked to many important biological phenomena. Researchers have recently begun to sequence bisulfite treated DNA to determine its pattern of methylation. However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases.
View Article and Find Full Text PDFEmerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents.
View Article and Find Full Text PDFThe genetic rules that dictate legume-rhizobium compatibility have been investigated for decades, but the causes of incompatibility occurring at late stages of the nodulation process are not well understood. An evaluation of naturally diverse legume (genus Medicago) and rhizobium (genus Sinorhizobium) isolates has revealed numerous instances in which Sinorhizobium strains induce and occupy nodules that are only minimally beneficial to certain Medicago hosts. Using these ineffective strain-host pairs, we identified gain-of-compatibility (GOC) rhizobial variants.
View Article and Find Full Text PDFNext-gen sequencing technologies have revolutionized data collection in genetic studies and advanced genome biology to novel frontiers. However, to date, next-gen technologies have been used principally for whole genome sequencing and transcriptome sequencing. Yet many questions in population genetics and systematics rely on sequencing specific genes of known function or diversity levels.
View Article and Find Full Text PDFMapping short next-generation reads to reference genomes is an important element in SNP calling and expression studies. A major limitation to large-scale whole-genome mapping is the large memory requirements for the algorithm and the long run-time necessary for accurate studies. Several parallel implementations have been performed to distribute memory on different processors and to equally share the processing requirements.
View Article and Find Full Text PDFMotivation: The advent of next-generation sequencing technologies has increased the accuracy and quantity of sequence data, opening the door to greater opportunities in genomic research.
Results: In this article, we present GNUMAP (Genomic Next-generation Universal MAPper), a program capable of overcoming two major obstacles in the mapping of reads from next-generation sequencing runs. First, we have created an algorithm that probabilistically maps reads to repeat regions in the genome on a quantitative basis.
Int J Bioinform Res Appl
January 2008
Fundamental to Multiple Sequence Alignment (MSA) algorithms is modelling insertions and deletions (gaps). The most prevalent model is to use Gap Open Penalties (GOP) and Gap Extension Penalties (GEP). While GOP and GEP are well understood conceptually, their effects on MSA and consequently on phylogeny scores are not as well understood.
View Article and Find Full Text PDFInt J Bioinform Res Appl
January 2008
The CYP2D6 gene is responsible for metabolising a large portion of the commonly prescribed drugs. Because of its importance, various approaches have been taken to analyse CYP2D6 and Single Nucleotide Polymorphisms (SNPs) throughout its sequence. This study introduces a novel method to analyse the effects of SNPs on encoded protein complexes by focusing on the biochemical properties of each non-synonymous substitution using the program TreeSAAP.
View Article and Find Full Text PDFIn the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses.
View Article and Find Full Text PDF