Prediction of RNA secondary structure from single sequences still needs substantial improvements. The application of machine learning (ML) to this problem has become increasingly popular. However, ML algorithms are prone to overfitting, limiting the ability to learn more about the inherent mechanisms governing RNA folding. It is natural to use high-capacity models when solving such a difficult task, but poor generalization is expected when too few examples are available. Here, we report the relation between capacity and performance on a fundamental related problem: determining whether two sequences are fully complementary. Our analysis focused on the impact of model architecture and capacity as well as dataset size and nature on classification accuracy. We observed that low-capacity models are better suited for learning with mislabelled training examples, while large capacities improve the ability to generalize to structurally dissimilar data. It turns out that neural networks struggle to grasp the fundamental concept of base complementarity, especially in lengthwise extrapolation context. Given a more complex task like RNA folding, it comes as no surprise that the scarcity of useable examples hurdles the applicability of machine learning techniques to this field.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10507318PMC
http://dx.doi.org/10.3389/fgene.2023.1254226DOI Listing

Publication Analysis

Top Keywords

machine learning
12
rna folding
12
automatic recognition
4
recognition complementary
4
complementary strands
4
strands lessons
4
lessons machine
4
learning
4
learning abilities
4
rna
4

Similar Publications

Background: Epidemiological research on the association between heavy metals and congestive heart failure (CHF) in individuals with abnormal glucose metabolism is scarce. The study addresses this research gap by examining the link between exposure to heavy metals and the odds of CHF in a population with dysregulated glucose metabolism.

Method: This cross-sectional study includes 7326 patients with diabetes and prediabetes from the National Health and Nutrition Examination Survey from 2011 to 2018.

View Article and Find Full Text PDF

Background: Pancreatic cancer is characterized by a complex tumor microenvironment that hinders effective immunotherapy. Identifying key factors that regulate the immunosuppressive landscape is crucial for improving treatment strategies.

Methods: We constructed a prognostic and risk assessment model for pancreatic cancer using 101 machine learning algorithms, identifying OSBPL3 as a key gene associated with disease progression and prognosis.

View Article and Find Full Text PDF

Background: Urinary tract infection (UTI) is a frequent health-threatening condition. Early reliable diagnosis of UTI helps to prevent misuse or overuse of antibiotics and hence prevent antibiotic resistance. The gold standard for UTI diagnosis is urine culture which is a time-consuming and also an error prone method.

View Article and Find Full Text PDF

A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles.

Orphanet J Rare Dis

January 2025

Laboratory of Metabolic Diseases, Department of Laboratory Medicine, University Medical Center Groningen, University of Groningen, Hanzeplein 1, Postbus, Groningen, 30001 - 9700 RB, the Netherlands.

Background: Glycogen storage disease (GSD) Ia is an ultra-rare inherited disorder of carbohydrate metabolism. Patients often present in the first months of life with fasting hypoketotic hypoglycemia and hepatomegaly. The diagnosis of GSD Ia relies on a combination of different biomarkers, mostly routine clinical chemical markers and subsequent genetic confirmation.

View Article and Find Full Text PDF

Background: Steroid-induced osteonecrosis of the femoral head (SIONFH) is a universal hip articular disease and is very hard to perceive at an early stage. The understanding of the pathogenesis of SIONFH is still limited, and the identification of efficient diagnostic biomarkers is insufficient. This research aims to recognize and validate the latent exosome-related molecular signature in SIONFH diagnosis by employing bioinformatics to investigate exosome-related mechanisms in SIONFH.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!