ncRNA Coding Potential Prediction Using BiLSTM and Transformer Encoder-Based Model.

J Chem Inf Model

School of Computer Science and Engineering, Central South University, Changsha 410018, China.

Published: August 2024

AI Article Synopsis

  • Noncoding RNAs (ncRNAs) play critical roles in biological processes, and recent findings suggest that some can be translated into functional peptides, prompting the need for better methods to predict their coding potential.
  • A new model called nBAT, which uses BiLSTM and Transformer encoders, introduces a unique way to encode intrinsic features of ncRNAs, leading to improved predictions of their coding abilities.
  • nBAT outperforms existing methods on various datasets and shows promise for identifying coding potential in new ncRNAs, positioning it as a valuable tool for high-throughput analysis in this field.

Article Abstract

Many noncoding RNAs (ncRNAs) have been identified, and many of them play vital roles in various biological processes, including gene expression regulation, epigenetic regulation, transcription, and control. Recently, a few observations revealed that ncRNAs are translated into functional peptides. Moreover, many computational methods have been developed to predict the coding potential of these transcripts, which contributes to a deeper investigation of their functions. However, most of these are used to distinguish ncRNAs and mRNAs. It is important to develop a highly accurate computational tool for identifying the coding potential of ncRNAs, thereby contributing to the discovery of novel peptides. In this Article, we propose a novel BiLSTM And Transformer encoder-based model (nBAT) with intrinsic features encoded for ncRNA coding potential prediction. In nBAT, we introduce a learnable position encoding mechanism to better obtain the embeddings of the ncRNA sequence. Moreover, we extract 43 intrinsic features from different perspectives and encode these features into the Transformer encoder by calculating their distances. Our performance comparisons show that nBAT achieves a superior performance than the state-of-the-art methods for coding potential prediction on different datasets. We also apply the method to new ncRNAs for identifying the coding potential, and the results further indicate the competitive performance of nBAT. We expect the method can be exploited as a useful tool for high-throughput coding potential prediction for ncRNAs.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.4c01097DOI Listing

Publication Analysis

Top Keywords

coding potential
28
potential prediction
16
ncrna coding
8
bilstm transformer
8
transformer encoder-based
8
encoder-based model
8
identifying coding
8
intrinsic features
8
potential
7
ncrnas
6

Similar Publications

Study Design: Cross-Sectional Survey.

Objective: This study aimed to assess racial disparities in self-reported barriers to care, health literacy, and health status within a large cohort of cervical stenosis patients.

Methods: This cross-sectional study used ICD-9 and ICD-10 codes to identify cervical stenosis patients recorded in the NIH All of Us Research Program between 2017 and 2022.

View Article and Find Full Text PDF

ABCA4 Deep Intronic Variants Contributed to Nearly Half of Unsolved Stargardt Cases With a Milder Phenotype.

Invest Ophthalmol Vis Sci

January 2025

State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangzhou, China.

Purpose: The purpose of this study was to investigate the contribution and natural progression of ABCA4 deep intronic variants (DIVs) among a Chinese Stargardt disease (STGD) cohort.

Methods: For unsolved STGD probands, DIVs in ABCA4 were detected by next-generation sequencing, and splicing effects were evaluated by in silico tools and validated through minigene experiments. Comprehensive ocular examinations, especially fundus changes, were carried out and analyzed.

View Article and Find Full Text PDF

The demographic history of a population, and the distribution of fitness effects (DFE) of newly arising mutations in functional genomic regions, are fundamental factors dictating both genetic variation and evolutionary trajectories. Although both demographic and DFE inference has been performed extensively in humans, these approaches have generally either been limited to simple demographic models involving a single population, or, where a complex population history has been inferred, without accounting for the potentially confounding effects of selection at linked sites. Taking advantage of the coding-sparse nature of the genome, we propose a 2-step approach in which coalescent simulations are first used to infer a complex multi-population demographic model, utilizing large non-functional regions that are likely free from the effects of background selection.

View Article and Find Full Text PDF

One strategy for CO mitigation is using photosynthetic microorganisms to sequester CO under high concentrations, such as in flue gases. While elevated CO levels generally promote growth, excessively high levels inhibit growth through uncertain mechanisms. This study investigated the physiology of the cyanobacterium Synechocystis sp.

View Article and Find Full Text PDF

In vitro fertilization (IVF) is a widely used assisted reproductive technology to achieve a successful pregnancy. However, the acquisition of oxidative stress in embryo in vitro culture impairs its competence. Here, we demonstrated that a nuclear coding gene, methyltransferase-like protein 7A (METTL7A), improves the developmental potential of bovine embryos.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!