Caveats to Deep Learning Approaches to RNA Secondary Structure Prediction.

Christoph Flamm Julia Wielach Michael T Wolfinger Stefan Badelt Ronny Lorenz Ivo L Hofacker

Front Bioinform

Department of Theoretical Chemistry, University of Vienna, Vienna, Austria.

Published: July 2022

Machine learning (ML) and in particular deep learning techniques have gained popularity for predicting structures from biopolymer sequences. An interesting case is the prediction of RNA secondary structures, where well established biophysics based methods exist. The accuracy of these classical methods is limited due to lack of experimental parameters and certain simplifying assumptions and has seen little improvement over the last decade. This makes RNA folding an attractive target for machine learning and consequently several deep learning models have been proposed in recent years. However, for ML approaches to be competitive for structure prediction, the models must not just demonstrate good phenomenological fits, but be able to learn a (complex) biophysical model. In this contribution we discuss limitations of current approaches, in particular due to biases in the training data. Furthermore, we propose to study capabilities and limitations of ML models by first applying them on synthetic data (obtained from a simplified biophysical model) that can be generated in arbitrary amounts and where all biases can be controlled. We assume that a deep learning model that performs well on these synthetic, would also perform well on real data, and vice versa. We apply this idea by testing several ML models of varying complexity. Finally, we show that the best models are capable of capturing many, but not all, properties of RNA secondary structures. Most severely, the number of predicted base pairs scales quadratically with sequence length, even though a secondary structure can only accommodate a linear number of pairs.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580944	PMC
http://dx.doi.org/10.3389/fbinf.2022.835422	DOI Listing

Publication Analysis

Top Keywords

deep learning

rna secondary

secondary structure

structure prediction

machine learning

secondary structures

biophysical model

learning

models

caveats deep

Similar Publications

Clinical Decision Support Using Speech Signal Analysis: Systematic Scoping Review of Neurological Disorders.

J Med Internet Res

January 2025

Knight Foundation of Computing & Information Sciences, Florida International University, Miami, FL, United States.

Upeka De Silva Samaneh Madanian Sharon Olsen John Michael Templeton Christian Poellabauer

Background: Digital biomarkers are increasingly used in clinical decision support for various health conditions. Speech features as digital biomarkers can offer insights into underlying physiological processes due to the complexity of speech production. This process involves respiration, phonation, articulation, and resonance, all of which rely on specific motor systems for the preparation and execution of speech.

View Article and Find Full Text PDF

Similar Publications

PHIStruct: Improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings.

Bioinformatics

January 2025

Bioinformatics Lab, Advanced Research Institute for Informatics, Computing and Networking, De La Salle University, Manila, 1004, Philippines.

Mark Edward M Gonzales Jennifer C Ureta Anish M S Shrestha

Motivation: Recent computational approaches for predicting phage-host interaction have explored the use of sequence-only protein language models to produce embeddings of phage proteins without manual feature engineering. However, these embeddings do not directly capture protein structure information and structure-informed signals related to host specificity.

Results: We present PHIStruct, a multilayer perceptron that takes in structure-aware embeddings of receptor-binding proteins, generated via the structure-aware protein language model SaProt, and then predicts the host from among the ESKAPEE genera.

View Article and Find Full Text PDF

Similar Publications

EnrichRBP: an automated and interpretable computational platform for predicting and analyzing RNA-binding protein events.

Bioinformatics

January 2025

School of Artificial Intelligence, Jilin University, Jilin, China.

Yubo Wang Haoran Zhu Yansong Wang Yuning Yang Yujian Huang

Motivation: Predicting RNA-binding proteins (RBPs) is central to understanding post-transcriptional regulatory mechanisms. Here, we introduce EnrichRBP, an automated and interpretable computational platform specifically designed for the comprehensive analysis of RBP interactions with RNA.

Results: EnrichRBP is a web service that enables researchers to develop original deep learning and machine learning architectures to explore the complex dynamics of RNA-binding proteins.

View Article and Find Full Text PDF

Similar Publications

Automatic segmentation model and machine learning model grounded in ultrasound radiomics for distinguishing between low malignant risk and intermediate-high malignant risk of adnexal masses.

Insights Imaging

January 2025

Medical Research Department, Qingdao Hospital, University of Health and Rehabilitation Sciences (Qingdao Municipal Hospital), Qingdao, P. R. China.

Lu Liu Wenjun Cai Feibo Zheng Hongyan Tian Yanping Li

Objective: To develop an automatic segmentation model to delineate the adnexal masses and construct a machine learning model to differentiate between low malignant risk and intermediate-high malignant risk of adnexal masses based on ovarian-adnexal reporting and data system (O-RADS).

Methods: A total of 663 ultrasound images of adnexal mass were collected and divided into two sets according to experienced radiologists: a low malignant risk set (n = 446) and an intermediate-high malignant risk set (n = 217). Deep learning segmentation models were trained and selected to automatically segment adnexal masses.

View Article and Find Full Text PDF

Similar Publications

Automated characterisation of cerebral microbleeds using their size and spatial distribution on brain MRI.

Eur Radiol Exp

January 2025

Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK.

Vaanathi Sundaresan Giovanna Zamboni Robert A Dineen Dorothee P Auer Stamatios N Sotiropoulos

Cerebral microbleeds (CMBs) are small, hypointense hemosiderin deposits in the brain measuring 2-10 mm in diameter. As one of the important biomarkers of small vessel disease, they have been associated with various neurodegenerative and cerebrovascular diseases. Hence, automated detection, and subsequent extraction of clinically useful metrics (e.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!