Background: Finding orthologs remains an important bottleneck in comparative genomics analyses. While the authors of software for the quick comparison of protein sequences evaluate the speed of their software and compare their results against the most usual software for the task, it is not common for them to evaluate their software for more particular uses, such as finding orthologs as reciprocal best hits (RBH). Here we compared RBH results obtained using software that runs faster than blastp. Namely, lastal, diamond, and MMseqs2.
Results: We found that lastal required the least time to produce results. However, it yielded fewer results than any other program when comparing the proteins encoded by evolutionarily distant genomes. The program producing the most similar number of RBH to blastp was diamond ran with the "ultra-sensitive" option. However, this option was diamond's slowest, with the "very-sensitive" option offering the best balance between speed and RBH results. The speeding up of the programs was much more evident when dealing with eukaryotic genomes, which code for more numerous proteins. For example, lastal took a median of approx. 1.5% of the blastp time to run with bacterial proteomes and 0.6% with eukaryotic ones, while diamond with the very-sensitive option took 7.4% and 5.2%, respectively. Though estimated error rates were very similar among the RBH obtained with all programs, RBH obtained with MMseqs2 had the lowest error rates among the programs tested.
Conclusions: The fast algorithms for pairwise protein comparison produced results very similar to blast in a fraction of the time, with diamond offering the best compromise in speed, sensitivity and quality, as long as a sensitivity option, other than the default, was chosen.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7585182 | PMC |
http://dx.doi.org/10.1186/s12864-020-07132-6 | DOI Listing |
Alzheimers Dement
December 2024
Amsterdam UMC, Amsterdam, Netherlands.
Background: The TMEM106B protein is critical for proper functioning of the endolysomal system, which is utilised by all cells to traffic and degrade molecular cargo. Genome-wide association studies identified a haplotype in the TMEM106B gene that is associated with increased risk for Alzheimer's disease (AD), amyotrophic lateral sclerosis (ALS), and frontotemporal lobar degeneration with TAR DNA binding protein inclusions (FTLD-TDP). However, the causal variant that drives the association has thus far remained elusive.
View Article and Find Full Text PDFAlzheimers Dement
December 2024
Columbia University Irving Medical Center, New York, NY, USA.
Background: Genetic variations have emerged as crucial players in the etiology of Alzheimer's disease (AD), and they serve for a better understanding of the disease mechanisms; yet the specific roles of these genetic variants remain uncertain. Animal models with reminiscent disease pathology could uncover previously uncharacterized roles of these genes. Therefore, we generated zebrafish models for AD variants to analyze the in depth molecular and biological functions of these variants.
View Article and Find Full Text PDFSci Data
December 2024
College of Life Science and Technology/Tarim Research Center of Rare Fishes, Tarim University, CN-0997, Alar 843300, Xinjiang Uygur Autonomous Region, Xinjiang, China.
Triplophysa bombifrons, a species of bony fish localized in China, has largely been understudied genetically, with limited data available beyond its mitochondrial genome. This study introduces a chromosome-level genome assembly for T. bombifrons, achieved through the integration of PacBio long-read sequencing and Hi-C chromatin interaction mapping.
View Article and Find Full Text PDFNucleic Acids Res
December 2024
Biology Department, Boston University, 24 Cummington Ave., Boston, 02215, USA.
Exons within transcripts are traditionally classified as first, internal or last exons, each governed by different regulatory mechanisms. We recently described the widespread usage of 'hybrid' exons that serve as terminal or internal exons in different transcripts. Here, we employ an interpretable deep learning pipeline to dissect the sequence features governing the co-regulation of transcription initiation and splicing in hybrid exons.
View Article and Find Full Text PDFNat Commun
December 2024
Beijing Frontier Research Center for Biological Structure, State Key Laboratory of Membrane Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
Exceptionally diverse type V CRISPR-Cas systems provide numerous RNA-guided nucleases as powerful tools for DNA manipulation. Two known Cas12e nucleases, DpbCas12e and PlmCas12e, are both effective in genome editing. However, many differences exist in their in vitro dsDNA cleavage activities, reflecting the diversity in Cas12e's enzymatic properties.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!