Multiple sequence alignment: in pursuit of homologous DNA positions.

Genome Res

Center for Evolutionary Functional Genomics, Biodesign Institute and School of Life Sciences, Arizona State University, Tempe, Arizona 85287-5301, USA.

Published: February 2007

DNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species. Errors in sequence alignment are also found to have a significant negative effect on subsequent inference of sequence divergence, phylogenetic trees, and conserved motifs. However, our understanding of alignment biases remains rudimentary, and sequence alignment procedures continue to be used somewhat like benign formatting operations to make sequences equal in length. Because of the central role these alignments now play in our endeavors to establish the tree of life and to identify important parts of genomes through evolutionary functional genomics, we see a need for increased community effort to investigate influences of alignment bias on the accuracy of large-scale comparative genomics.

Download full-text PDF	Source
http://dx.doi.org/10.1101/gr.5232407	DOI Listing

Publication Analysis

Top Keywords

sequence alignment

sequence alignments

multiple sequence

sequence

alignment

alignments

alignment pursuit

pursuit homologous

homologous dna

dna positions

Similar Publications

All-at-once RNA folding with 3D motif prediction framed by evolutionary information.

bioRxiv

December 2024

Aayush Karan Elena Rivas

Unlabelled: Structural RNAs exhibit a vast array of recurrent short 3D elements involving non-Watson-Crick interactions that help arrange canonical double helices into tertiary structures. We present CaCoFold-R3D, a probabilistic grammar that predicts these RNA 3D motifs (also termed modules) jointly with RNA secondary structure over a sequence or alignment. CaCoFold-R3D uses evolutionary information present in an RNA alignment to reliably identify canonical helices (including pseudoknots) by covariation.

View Article and Find Full Text PDF

Similar Publications

sc-SPLASH provides ultra-efficient reference-free discovery in barcoded single-cell sequencing.

bioRxiv

December 2024

Roozbeh Dehghannasiri Marek Kokot Alexander L Starr Jamie Maziarz Tal Gordon

Typical high-throughput single-cell RNA-sequencing (scRNA-seq) analyses are primarily conducted by (pseudo)alignment, through the lens of annotated gene models, and aimed at detecting differential gene expression. This misses diversity generated by other mechanisms that diversify the transcriptome such as splicing and V(D)J recombination, and is blind to sequences missing from imperfect reference genomes. Here, we present sc-SPLASH, a highly efficient pipeline that extends our SPLASH framework for statistics-first, reference-free discovery to barcoded scRNA-seq (10x Chromium) and spatial transcriptomics (10x Visium); we also provide its optimized module for preprocessing and -mer counting in barcoded data, BKC, as a standalone tool.

View Article and Find Full Text PDF

Similar Publications

Linkage-based ortholog refinement in bacterial pangenomes with CLARC.

bioRxiv

December 2024

Indra González Ojeda Samantha G Palace Pamela P Martinez Taj Azarian Lindsay R Grant

Bacterial genomes exhibit significant variation in gene content and sequence identity. Pangenome analyses explore this diversity by classifying genes into core and accessory clusters of orthologous groups (COGs). However, strict sequence identity cutoffs can misclassify divergent alleles as different genes, inflating accessory gene counts.

View Article and Find Full Text PDF

Similar Publications

Identification of the locus controlling leaf rolling and its application in maize breeding.

Mol Breed

January 2025

Maize Research Institute, Guangxi Academy of Agricultural Sciences, Nanning, 530007 Guangxi China.

Meng Yang Aihua Huang Renlai Wen Shuyun Tian Runxiu Mo

Unlabelled: Increasing planting density is one of the most important strategies for generating higher maize yields. Moderate leaf rolling decreases mutual shading of leaves and increases the photosynthesis of the population and hence increases the tolerance for high-density planting. Few genes that control leaf rolling in maize have been identified, however, and their applicability for breeding programs remains unclear.

View Article and Find Full Text PDF

Similar Publications

Resolving the source of branch length variation in the Y chromosome phylogeny.

Genome Biol

January 2025

Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.

Yaniv Swiel Janet Kelso Stéphane Peyrégne

Background: Genetic variation in the non-recombining part of the human Y chromosome has provided important insight into the paternal history of human populations. However, a significant and yet unexplained branch length variation of Y chromosome lineages has been observed, notably amongst those that are highly diverged from the human reference Y chromosome. Understanding the origin of this variation, which has previously been attributed to changes in generation time, mutation rate, or efficacy of selection, is important for accurately reconstructing human evolutionary and demographic history.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!