Mitochondrial genomes-in particular those of fungi-often encode genes with a large number of Group I and Group II introns that are conserved at both the sequence and the RNA structure level. They provide a rich resource for the investigation of intron and gene structure, self- and protein-guided splicing mechanisms, and intron evolution. Yet, the degree of sequence conservation of introns is limited, and the primary sequence differs considerably among the distinct intron sub-groups. It makes intron identification, classification, structural modeling, and the inference of gene models a most challenging and error-prone task-frequently passed on to an "expert" for manual intervention. To reduce the need for manual curation of intron structures and mitochondrial gene models, computational methods using ERPIN sequence profiles were initially developed in 2007. Here we present a refinement of search models and alignments using the now abundant publicly available fungal mtDNA sequences. In addition, we have tested in how far members of the originally proposed sub-groups are clearly distinguished and validated by our computational approach. We confirm clearly distinct mitochondrial Group I sub-groups IA1, IA3, IB3, IC1, IC2, and ID. Yet, IB1, IB2, and IB4 ERPIN models are overlapping substantially in predictions, and are therefore combined and reported as IB. We have further explored the conversion of our ERPIN profiles into covariance models (CM). Current limitations and prospects of the CM approach will be discussed.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8971849PMC
http://dx.doi.org/10.3389/fmicb.2022.866187DOI Listing

Publication Analysis

Top Keywords

gene models
8
intron
6
sequence
5
models
5
refining mitochondrial
4
mitochondrial intron
4
intron classification
4
erpin
4
classification erpin
4
erpin identification
4

Similar Publications

Biophysical constraints limit the specificity with which transcription factors (TFs) can target regulatory DNA. While individual nontarget binding events may be low affinity, the sheer number of such interactions could present a challenge for gene regulation by degrading its precision or possibly leading to an erroneous induction state. Chromatin can prevent nontarget binding by rendering DNA physically inaccessible to TFs, at the cost of energy-consuming remodeling orchestrated by pioneer factors (PFs).

View Article and Find Full Text PDF

The homo-dodecameric ring-shaped RNA binding attenuation protein (TRAP) from binds up to twelve tryptophan ligands (Trp) and becomes activated to bind a specific sequence in the 5' leader region of the operon mRNA, thereby downregulating biosynthesis of Trp. Thermodynamic measurements of Trp binding have revealed a range of cooperative behavior for different TRAP variants, even if the averaged apparent affinities for Trp have been found to be similar. Proximity between the ligand binding sites, and the ligand-coupled disorder-to-order transition has implicated nearest-neighbor interactions in cooperativity.

View Article and Find Full Text PDF

Purpose: We aimed to identify the transcriptomic signatures of soft tissue sarcoma (STS) related to radioresistance and establish a model to predict radioresistance.

Materials And Methods: Nine STS cell lines were cultured. Adenosine triphosphate-based viability was determined 5 days after irradiation with 8 Gy of X-rays in a single fraction.

View Article and Find Full Text PDF

Transfer learning aims to integrate useful information from multi-source datasets to improve the learning performance of target data. This can be effectively applied in genomics when we learn the gene associations in a target tissue, and data from other tissues can be integrated. However, heavy-tail distribution and outliers are common in genomics data, which poses challenges to the effectiveness of current transfer learning approaches.

View Article and Find Full Text PDF

The role of chromatin state in intron retention: A case study in leveraging large scale deep learning models.

PLoS Comput Biol

January 2025

Department of Computer Science, Colorado State University, Fort Collins, Colorado, United States of America.

Complex deep learning models trained on very large datasets have become key enabling tools for current research in natural language processing and computer vision. By providing pre-trained models that can be fine-tuned for specific applications, they enable researchers to create accurate models with minimal effort and computational resources. Large scale genomics deep learning models come in two flavors: the first are large language models of DNA sequences trained in a self-supervised fashion, similar to the corresponding natural language models; the second are supervised learning models that leverage large scale genomics datasets from ENCODE and other sources.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!