Whole-genome sequencing is widely used to investigate population genomic variation in organisms of interest. Assorted tools have been independently developed to call variants from short-read sequencing data aligned to a reference genome, including single nucleotide polymorphisms (SNPs) and structural variations (SVs). We developed SNP-SVant, an integrated, flexible, and computationally efficient bioinformatic workflow that predicts high-confidence SNPs and SVs in organisms without benchmarked variants, which are traditionally used for distinguishing sequencing errors from real variants.
View Article and Find Full Text PDFMotivation: Structure-conditioned information statistics have proven useful to predict and visualize tRNA Class-Informative Features (CIFs) and their evolutionary divergences. Although permutation P-values can quantify the significance of CIF divergences between two taxa, their naive Monte Carlo approximation is slow and inaccurate. The Peaks-over-Threshold approach of Knijnenburg et al.
View Article and Find Full Text PDFThe evolution of tRNA multigene families remains poorly understood, exhibiting unusual phenomena such as functional conversions of tRNA genes through anticodon shift substitutions. We improved FlyBase tRNA gene annotations from twelve Drosophila species, incorporating previously identified ortholog sets to compare substitution rates across tRNA bodies at single-site and base-pair resolution. All rapidly evolving sites fell within the same metal ion-binding pocket that lies at the interface of the two major stacked helical domains.
View Article and Find Full Text PDFThe development of chemotherapies against eukaryotic pathogens is especially challenging because of both the evolutionary conservation of drug targets between host and parasite, and the evolution of strain-dependent drug resistance. There is a strong need for new nontoxic drugs with broad-spectrum activity against trypanosome parasites such as Leishmania and Trypanosoma. A relatively untested approach is to target macromolecular interactions in parasites rather than small molecular interactions, under the hypothesis that the features specifying macromolecular interactions diverge more rapidly through coevolution.
View Article and Find Full Text PDFBackground: Eukaryotes acquired the trait of oxygenic photosynthesis through endosymbiosis of the cyanobacterial progenitor of plastid organelles. Despite recent advances in the phylogenomics of Cyanobacteria, the phylogenetic root of plastids remains controversial. Although a single origin of plastids by endosymbiosis is broadly supported, recent phylogenomic studies are contradictory on whether plastids branch early or late within Cyanobacteria.
View Article and Find Full Text PDFAdvances in structural biology of aminoacyl-tRNA synthetases (aaRSs) have revealed incredible diversity in how aaRSs bind their tRNA substrates. The causes of this diversity remain mysterious. We developed a new class of highly rugged fitness landscape models called match landscapes, through which genes encode the assortative interactions of their gene products through the complementarity and identifiability of their structural features.
View Article and Find Full Text PDFCandida albicans is the most common cause of life-threatening fungal infections in humans, especially in immunocompromised individuals. Crucial to its success as an opportunistic pathogen is the considerable dynamism of its genome, which readily undergoes genetic changes generating new phenotypes and shaping the evolution of new strains. Candida africana is an intriguing C.
View Article and Find Full Text PDFBMC Genomics
December 2016
Background: While the CCA sequence at the mature 3' end of tRNAs is conserved and critical for translational function, a genetic template for this sequence is not always contained in tRNA genes. In eukaryotes and Archaea, the CCA ends of tRNAs are synthesized post-transcriptionally by CCA-adding enzymes. In Bacteria, tRNA genes template CCA sporadically.
View Article and Find Full Text PDFFAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science.
View Article and Find Full Text PDFBackground: Gene expression patterns are determined by rates of mRNA transcription and decay. While transcription is known to regulate many developmental processes, the role of mRNA decay is less extensively defined. A critical step toward defining the role of mRNA decay in neural development is to measure genome-wide mRNA decay rates in neural tissue.
View Article and Find Full Text PDFRecent research using eye-tracking typically relies on constrained visual contexts in particular goal-oriented contexts, viewing a small array of objects on a computer screen and performing some overt decision or identification. Eyetracking paradigms that use pictures as a measure of word or sentence comprehension are sometimes touted as ecologically invalid because pictures and explicit tasks are not always present during language comprehension. This study compared the comprehension of sentences with two different grammatical forms: the past progressive (e.
View Article and Find Full Text PDFMolecular phylogenetics and phylogenomics are subject to noise from horizontal gene transfer (HGT) and bias from convergence in macromolecular compositions. Extensive variation in size, structure and base composition of alphaproteobacterial genomes has complicated their phylogenomics, sparking controversy over the origins and closest relatives of the SAR11 strains. SAR11 are highly abundant, cosmopolitan aquatic Alphaproteobacteria with streamlined, A+T-biased genomes.
View Article and Find Full Text PDFCode-message coevolution (CMC) models represent coevolution of a genetic code and a population of protein-coding genes ("messages"). Formally, CMC models are sets of quasispecies coupled together for fitness through a shared genetic code. Although CMC models display plausible explanations for the origin of multiple genetic code traits by natural selection, useful modern implementations of CMC models are not currently available.
View Article and Find Full Text PDFI review recent developments in computational analysis of tRNA identity. I suggest that the tRNA-protein interaction network is hierarchically organized, and coevolutionarily flexible. Its functional specificity of recognition and discrimination persists despite generic structural constraints and perturbative evolutionary forces.
View Article and Find Full Text PDFBackground: Promoter identification is a first step in the quest to explain gene regulation in bacteria. It has been demonstrated that the initiation of bacterial transcription depends upon the stability and topology of DNA in the promoter region as well as the binding affinity between the RNA polymerase sigma-factor and promoter. However, promoter prediction algorithms to date have not explicitly used an ensemble of these factors as predictors.
View Article and Find Full Text PDFProtein structures change during evolution in response to mutations. Here, we analyze the mapping between sequence and structure in a set of structurally aligned protein domains. To avoid artifacts, we restricted our attention only to the core components of these structures.
View Article and Find Full Text PDFGenome data are increasingly important in the computational identification of novel regulatory non-coding RNAs (ncRNAs). However, most ncRNA gene-finders are either specialized to well-characterized ncRNA gene families or require comparisons of closely related genomes. We developed a method for de novo screening for ncRNA genes with a nucleotide composition that stands out against the background genome based on a partial sum process.
View Article and Find Full Text PDFComparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution.
View Article and Find Full Text PDFExpression of minigenes encoding tetra- or pentapeptides MXLX or MXLXV (E peptides), where X is a nonpolar amino acid, renders cells erythromycin resistant whereas expression of minigenes encoding tripeptide MXL does not. By using a 3A' reporter gene system beginning with an E-peptide-encoding sequence, we asked whether the codons UGG and GGG, which are known to promote peptidyl-tRNA drop-off at early positions in mRNA, would result in a phenotype of erythromycin resistance if located after this sequence. We find that UGG or GGG, at either position +4 or +5, without a following stop codon, is associated with an erythromycin resistance phenotype upon gene induction.
View Article and Find Full Text PDFThere are at least 21 subfunctional classes of tRNAs in most cells that, despite a very highly conserved and compact common structure, must interact specifically with different cliques of proteins or cause grave organismal consequences. Protein recognition of specific tRNA substrates is achieved in part through class-restricted tRNA features called tRNA identity determinants. In earlier work we used TFAM, a statistical classifier of tRNA function, to show evidence of unexpectedly large diversity among bacteria in tRNA identity determinants.
View Article and Find Full Text PDFWe have earlier published an automated statistical classifier of tRNA function called TFAM. Unlike tRNA gene-finders, TFAM uses information from the total sequences of tRNAs and not just their anticodons to predict their function. Therefore TFAM has an advantage in predicting initiator tRNAs, the amino acid charging identity of nonstandard tRNAs such as suppressors, and the former identity of pseudo-tRNAs.
View Article and Find Full Text PDFBackground: The somatic DNA molecules of spirotrichous ciliates are present as linear chromosomes containing mostly single-gene coding sequences with short 5' and 3' flanking regions. Only a few conserved motifs have been found in the flanking DNA. Motifs that may play roles in promoting and/or regulating transcription have not been consistently detected.
View Article and Find Full Text PDFJ Mol Evol
September 2006
The standard genetic code is the nearly universal system for the translation of genes into proteins. The code exhibits two salient structural characteristics: it possesses a distinct organization that makes it extremely robust to errors in replication and translation, and it is highly redundant. The origin of these properties has intrigued researchers since the code was first discovered.
View Article and Find Full Text PDFThe Shine-Dalgarno (SD+: 5'-AAGGAGG-3') sequence anchors the mRNA by base pairing to the 16S rRNA in the small ribosomal subunit during translation initiation. We have here compared how an SD+ sequence influences gene expression, if located upstream or downstream of an initiation codon. The positive effect of an upstream SD+ is confirmed.
View Article and Find Full Text PDF