DNA carries the genetic code of life, with different conformations associated with different biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. We have deployed a host of machine learning algorithms, including the popular state-of-the-art LightGBM (a gradient boosting model), for building prediction models. We used the nested cross-validation strategy to address the issues of "overfitting" and selection bias. This simultaneously provides an unbiased estimate of the generalization performance of a machine learning algorithm and allows us to tune the hyperparameters optimally. Furthermore, we built a secondary model based on SHAP (SHapley Additive exPlanations) that offers crucial insight into model interpretability. Our detailed model-building strategy and robust statistical validation protocols tackle the formidable challenge of working on small datasets, which is often the case in biological and medical data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8441556PMC
http://dx.doi.org/10.1016/j.patter.2021.100329DOI Listing

Publication Analysis

Top Keywords

machine learning
12
primary sequence
8
accurate prediction
4
prediction b-form/a-form
4
dna
4
b-form/a-form dna
4
dna conformation
4
conformation propensity
4
propensity primary
4
sequence machine
4

Similar Publications

While the direct health impacts of air pollution are widely discussed, its indirect effects, particularly during pandemics, are less explored. Utilizing detailed individual-level data from all designated hospitals in Wuhan during the initial COVID-19 outbreak, we examine the impact of air pollution exposure on treatment costs and health outcomes for COVID-19 patients. Our findings reveal that patients exposed more intensively to air pollution, identified by their residence in downwind areas of high-polluting enterprises, not only had worsened health outcomes but also consumed more medical resources.

View Article and Find Full Text PDF

Active Physics-Informed Deep Learning: Surrogate Modeling for Nonplanar Wavefront Excitation of Topological Nanophotonic Devices.

Nano Lett

January 2025

Institute of Experimental and Applied Physics, Kiel University, Leibnizstr. 11-19, Kiel 24098, Germany.

Topological plasmonics combines principles of topology and plasmonics to provide new methods for controlling light, analogous to topological edge states in photonics. However, designing such topological states remains challenging due to the complexity of the high-dimensional design space. We present a novel method that uses supervised, physics-informed deep learning and surrogate modeling to design topological devices for desired wavelengths.

View Article and Find Full Text PDF

Gradient porous carbon has become a potential electrode material for energy storage devices, including the aqueous zinc-ion hybrid capacitor (ZIHC). Compared with the sufficient studies on the fabrication of ZIHCs with high electrochemical performance, there is still lack of in-depth understanding of the underlying mechanisms of gradient porous structure for energy storage, especially the synergistic effect of ultramicropores (<1 nm) and micropores (1-2 nm). Here, we report a design principle for the gradient porous carbon structure used for ZIHC based on the data-mining machine learning (ML) method.

View Article and Find Full Text PDF

AI Methods for Antimicrobial Peptides: Progress and Challenges.

Microb Biotechnol

January 2025

Machine Biology Group, Department of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

Antimicrobial peptides (AMPs) are promising candidates to combat multidrug-resistant pathogens. However, the high cost of extensive wet-lab screening has made AI methods for identifying and designing AMPs increasingly important, with machine learning (ML) techniques playing a crucial role. AI approaches have recently revolutionised this field by accelerating the discovery of new peptides with anti-infective activity, particularly in preclinical mouse models.

View Article and Find Full Text PDF

Empirical analysis on retinal segmentation using PSO-based thresholding in diabetic retinopathy grading.

Biomed Tech (Berl)

January 2025

Department of Computer Science, 72937 Centre for Machine Learning and Intelligence (CMLI), Avinashilingam Institute for Home Science and Higher Education for Women, Coimbatore, India.

Objectives: Diabetic retinopathy (DR) is associated with long-term diabetes and is a leading cause of blindness if it is not diagnosed early. The rapid growth of deep learning eases the clinicians' DR diagnosing procedure. It automatically extracts the features and performs the grading.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!