Improved DNA-Versus-Protein Homology Search for Protein Fossils.

IEEE/ACM Trans Comput Biol Bioinform

Published: June 2023

Protein fossils, i.e., noncoding DNA descended from coding DNA, arise frequently from transposable elements (TEs), decayed genes, and viral integrations. They can reveal, and mislead about, evolutionary history and relationships. They have been detected by comparing DNA to protein sequences, but current methods are not optimized for this task. We describe a powerful DNA-protein homology search method. We use a 64×21 substitution matrix, which is fitted to sequence data, automatically learning the genetic code. We detect subtly homologous regions by considering alternative possible alignments between them, and calculate significance (probability of occurring by chance between random sequences). Our method detects TE protein fossils much more sensitively than blastx, and faster. Of the ∼ 7 major categories of eukaryotic TE, three were long thought absent in mammals: we find two of them in the human genome, polinton and DIRS/Ngaro. This method increases our power to find ancient fossils, and perhaps to detect non-standard genetic codes. The alternative-alignments and significance paradigm is not specific to DNA-protein comparison, and could benefit homology search generally. This is an extended version of a conference paper (Yao & Frith, 2021).

Download full-text PDF	Source
http://dx.doi.org/10.1109/TCBB.2022.3177855	DOI Listing

Publication Analysis

Top Keywords

homology search

protein fossils

improved dna-versus-protein

dna-versus-protein homology

protein

search protein

fossils

fossils protein

fossils noncoding

noncoding dna

Similar Publications

Tandem repeats provide evidence for convergent evolution to similar protein structures.

Genome Biol Evol

January 2025

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA 15219.

Erik S Wright

Homology is a key concept underpinning the comparison of sequences across organisms. Sequence-level homology is based on a statistical framework optimized over decades of work. Recently, computational protein structure prediction has enabled large-scale homology inference beyond the limits of accurate sequence alignment.

View Article and Find Full Text PDF

Similar Publications

GOPhage: protein function annotation for bacteriophages by integrating the genomic context.

Brief Bioinform

November 2024

Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong (SAR), China.

Jiaojiao Guan Yongxin Ji Cheng Peng Wei Zou Xubo Tang

Bacteriophages are viruses that target bacteria, playing a crucial role in microbial ecology. Phage proteins are important in understanding phage biology, such as virus infection, replication, and evolution. Although a large number of new phages have been identified via metagenomic sequencing, many of them have limited protein function annotation.

View Article and Find Full Text PDF

Similar Publications

Benchmarking protein language models for protein crystallization.

Sci Rep

January 2025

Biotechnology Research Center, Technology Innovation Institute, P.O. Box 9639, Abu Dhabi, United Arab Emirates.

Raghvendra Mall Rahul Kaushik Zachary A Martinez Matt W Thomson Filippo Castiglione

The problem of protein structure determination is usually solved by X-ray crystallography. Several in silico deep learning methods have been developed to overcome the high attrition rate, cost of experiments and extensive trial-and-error settings, for predicting the crystallization propensities of proteins based on their sequences. In this work, we benchmark the power of open protein language models (PLMs) through the TRILL platform, a be-spoke framework democratizing the usage of PLMs for the task of predicting crystallization propensities of proteins.

View Article and Find Full Text PDF

Similar Publications

Crystal structure of the anti-CRISPR protein AcrIE7.

Biochem Biophys Res Commun

January 2025

Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China; Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, 300070, China. Electronic address:

Zhikun Liu Yingcan Liu Shuqin Zhang Yanan Wen Xiaoshen Wang

Bacterial adaptive immunity, driven by CRISPR-Cas systems, protects against foreign nucleic acids from mobile genetic elements (MGEs), like bacteriophages. The type I-E CRISPR-Cas system employs the Cascade (CRISPR-associated complex for antiviral defense) complex for target DNA cleavage, guided by crRNA. Anti-CRISPR (Acr) proteins, such as AcrIE7, counteract this defense by inhibiting Cascade activity.

View Article and Find Full Text PDF

Similar Publications

Importance of Computer-Aided Drug Design in Modern Pharmaceutical Research.

Curr Drug Discov Technol

December 2024

Department of Pharmaceutical Chemistry, School of Pharmaceutical Sciences, Delhi Pharmaceutical Sciences and Research University, PushpViharSector-3, M-B Road, New Delhi, 110017, India.

Uma Agarwal Rajiv Kumar Tonk Swati Paliwal

Background: Computer-Aided Drug Design (CADD) approaches are essential in the drug discovery and development process. Both academic institutions and pharmaceutical and biotechnology corporations utilize them to enhance the efficacy of bioactive compounds.

Objective: This study aims to entice researchers by investigating the benefits of Computer-Aided Drug and Design (CADD) and its fundamental principles.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!