With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11377173PMC
http://dx.doi.org/10.1016/j.jmb.2024.168552DOI Listing

Publication Analysis

Top Keywords

deep learning
20
learning models
16
structure prediction
16
rna structure
12
structurally-dissimilar dataset
8
training benchmarking
8
benchmarking deep
8
models rna
8
testing sets
8
rna3db dataset
8

Similar Publications

Objective: Segmentation of individual thigh muscles in MRI images is essential for monitoring neuromuscular diseases and quantifying relevant biomarkers such as fat fraction (FF). Deep learning approaches such as U-Net have demonstrated effectiveness in this field. However, the impact of reducing neural network complexity remains unexplored in the FF quantification in individual muscles.

View Article and Find Full Text PDF

A new vision of the role of the cerebellum in pain processing.

J Neural Transm (Vienna)

January 2025

Postgraduate Program in Physical Therapy (PPGFT), Department of Physical Therapy (DFisio), University of São Carlos (UFSCar), Washington Luis Road, Km 235, São Carlos, São Paulo, 13565-905, Brazil.

The cerebellum is a structure in the suprasegmental nervous system classically known for its involvement in motor functions such as motor planning, coordination, and motor learning. However, with scientific advances, other functions of the cerebellum, such as cognitive, emotional, and autonomic processing, have been discovered. Currently, there is a body of evidence demonstrating the involvement of the cerebellum in nociception and pain processing.

View Article and Find Full Text PDF

Background: Recent advances in artificial intelligence have facilitated the automatic diagnosis of middle ear diseases using endoscopic tympanic membrane imaging.

Aim: We aimed to develop an automated diagnostic system for middle ear diseases by applying deep learning techniques to tympanic membrane images obtained during routine clinical practice.

Material And Methods: To augment the training dataset, we explored the use of generative adversarial networks (GANs) to produce high-quality synthetic tympanic images that were subsequently added to the training data.

View Article and Find Full Text PDF

Integrating Model-Informed Drug Development With AI: A Synergistic Approach to Accelerating Pharmaceutical Innovation.

Clin Transl Sci

January 2025

Global Biometrics and Data Management, Pfizer Research and Development, New York, New York, USA.

The pharmaceutical industry constantly strives to improve drug development processes to reduce costs, increase efficiencies, and enhance therapeutic outcomes for patients. Model-Informed Drug Development (MIDD) uses mathematical models to simulate intricate processes involved in drug absorption, distribution, metabolism, and excretion, as well as pharmacokinetics and pharmacodynamics. Artificial intelligence (AI), encompassing techniques such as machine learning, deep learning, and Generative AI, offers powerful tools and algorithms to efficiently identify meaningful patterns, correlations, and drug-target interactions from big data, enabling more accurate predictions and novel hypothesis generation.

View Article and Find Full Text PDF

Self-Driving Microscopes: AI Meets Super-Resolution Microscopy.

Small Methods

January 2025

Dept. Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, CB3 0AS, UK.

The integration of Machine Learning (ML) with super-resolution microscopy represents a transformative advancement in biomedical research. Recent advances in ML, particularly deep learning (DL), have significantly enhanced image processing tasks, such as denoising and reconstruction. This review explores the growing potential of automation in super-resolution microscopy, focusing on how DL can enable autonomous imaging tasks.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!