Robust RNA secondary structure prediction with a mixture of deep learning and physics-based experts.

Biol Methods Protoc

Department of Physics, George Washington University, Washington, DC 20052, United States.

Published: January 2025

A mixture-of-experts (MoE) approach has been developed to mitigate the poor out-of-distribution (OOD) generalization of deep learning (DL) models for single-sequence-based prediction of RNA secondary structure. The main idea behind this approach is to use DL models for in-distribution (ID) test sequences to leverage their superior ID performances, while relying on physics-based models for OOD sequences to ensure robust predictions. One key ingredient of the pipeline, named MoEFold2D, is automated ID/OOD detection via consensus analysis of an ensemble of DL model predictions without requiring access to training data during inference. Specifically, motivated by the clustered distribution of known RNA structures, a collection of distinct DL models is trained by iteratively leaving one cluster out. Each DL model hence serves as an expert on all but one cluster in the training data. Consequently, for an ID sequence, all but one DL model makes accurate predictions consistent with one another, while an OOD sequence yields highly inconsistent predictions among all DL models. Through consensus analysis of DL predictions, test sequences are categorized as ID or OOD. ID sequences are subsequently predicted by averaging the DL models in consensus, and OOD sequences are predicted using physics-based models. Instead of remediating generalization gaps with alternative approaches such as transfer learning and sequence alignment, MoEFold2D circumvents unpredictable ID-OOD gaps and combines the strengths of DL and physics-based models to achieve accurate ID and robust OOD predictions.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11729747PMC
http://dx.doi.org/10.1093/biomethods/bpae097DOI Listing

Publication Analysis

Top Keywords

physics-based models
12
ood sequences
12
rna secondary
8
secondary structure
8
deep learning
8
models
8
test sequences
8
consensus analysis
8
training data
8
models consensus
8

Similar Publications

Robust RNA secondary structure prediction with a mixture of deep learning and physics-based experts.

Biol Methods Protoc

January 2025

Department of Physics, George Washington University, Washington, DC 20052, United States.

A mixture-of-experts (MoE) approach has been developed to mitigate the poor out-of-distribution (OOD) generalization of deep learning (DL) models for single-sequence-based prediction of RNA secondary structure. The main idea behind this approach is to use DL models for in-distribution (ID) test sequences to leverage their superior ID performances, while relying on physics-based models for OOD sequences to ensure robust predictions. One key ingredient of the pipeline, named MoEFold2D, is automated ID/OOD detection via consensus analysis of an ensemble of DL model predictions without requiring access to training data during inference.

View Article and Find Full Text PDF

Physics-Based Synthetic Data Model for Automated Segmentation in Catalysis Microscopy.

Microsc Microanal

January 2025

Fritz-Haber-Institut der Max-Planck-Gesellschaft, Berlin 14195, Germany.

In catalysis research, the amount of microscopy data acquired when imaging dynamic processes is often too much for nonautomated quantitative analysis. Developing machine learned segmentation models is challenged by the requirement of high-quality annotated training data. We thus substitute expert-annotated data with a physics-based sequential synthetic data model.

View Article and Find Full Text PDF

In this paper, we tackle the challenge of accurately controlling the position of the valve spool in hydraulic 4/3 two-stage directional control valves utilized in mobile applications. The pilot valve's overlapping design often leads to a significant dead zone, negatively impacting positioning accuracy and necessitating a sophisticated controller design. To overcome these challenges, we introduce a control strategy founded on a control-oriented model.

View Article and Find Full Text PDF

The 3D structure of RNA critically influences its functionality, and understanding this structure is vital for deciphering RNA biology. Experimental methods for determining RNA structures are labour-intensive, expensive, and time-consuming. Computational approaches have emerged as valuable tools, leveraging physics-based-principles and machine learning to predict RNA structures rapidly.

View Article and Find Full Text PDF

Estimating seismic anisotropy parameters, such as Thomson's parameters, is crucial for investigating fractured and finely layered geological media. However, many inversion methods rely on complex physical models with initial assumptions, leading to non-reproducible estimates and subjective fracture interpretation. To address these limitations, this study utilizes machine learning methods: support vector regression, extreme gradient boost, multi-layer perceptron, and a convolutional neural network.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!