Deep learning-based approaches for multi-omics data integration and analysis.

BioData Min

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Philadelphia, PA, 19104, USA.

Published: October 2024

Background: The rapid growth of deep learning, as well as the vast and ever-growing amount of available data, have provided ample opportunity for advances in fusion and analysis of complex and heterogeneous data types. Different data modalities provide complementary information that can be leveraged to gain a more complete understanding of each subject. In the biomedical domain, multi-omics data includes molecular (genomics, transcriptomics, proteomics, epigenomics, metabolomics, etc.) and imaging (radiomics, pathomics) modalities which, when combined, have the potential to improve performance on prediction, classification, clustering and other tasks. Deep learning encompasses a wide variety of methods, each of which have certain strengths and weaknesses for multi-omics integration.

Method: In this review, we categorize recent deep learning-based approaches by their basic architectures and discuss their unique capabilities in relation to one another. We also discuss some emerging themes advancing the field of multi-omics integration.

Results: Deep learning-based multi-omics integration methods were categorized broadly into non-generative (feedforward neural networks, graph convolutional neural networks, and autoencoders) and generative (variational methods, generative adversarial models, and a generative pretrained model). Generative methods have the advantage of being able to impose constraints on the shared representations to enforce certain properties or incorporate prior knowledge. They can also be used to generate or impute missing modalities. Recent advances achieved by these methods include the ability to handle incomplete data as well as going beyond the traditional molecular omics data types to integrate other modalities such as imaging data.

Conclusion: We expect to see further growth in methods that can handle missingness, as this is a common challenge in working with complex and heterogeneous data. Additionally, methods that integrate more data types are expected to improve performance on downstream tasks by capturing a comprehensive view of each sample.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446004PMC
http://dx.doi.org/10.1186/s13040-024-00391-zDOI Listing

Publication Analysis

Top Keywords

deep learning-based
12
data types
12
data
9
learning-based approaches
8
multi-omics data
8
deep learning
8
complex heterogeneous
8
heterogeneous data
8
improve performance
8
neural networks
8

Similar Publications

Purpose: The relationship between retinal morphology, as assessed by optical coherence tomography (OCT), and retinal function in microperimetry (MP) has not been well studied, despite its increasing importance as an essential functional endpoint for clinical trials and emerging therapies in retinal diseases. Normative databases of healthy ageing eyes are largely missing from literature.

Methods: Healthy subjects above 50 years were examined using two MP devices, MP-3 (NIDEK) and MAIA (iCare).

View Article and Find Full Text PDF

Deep Convolutional Neural Networks (DCNNs), due to their high computational and memory requirements, face significant challenges in deployment on resource-constrained devices. Network Pruning, an essential model compression technique, contributes to enabling the efficient deployment of DCNNs on such devices. Compared to traditional rule-based pruning methods, Reinforcement Learning(RL)-based automatic pruning often yields more effective pruning strategies through its ability to learn and adapt.

View Article and Find Full Text PDF

Understanding the function of proteins is of great significance for revealing disease pathogenesis and discovering new targets. Benefiting from the explosive growth of the protein universal, deep learning has been applied to accelerate the protein annotation cycle from different biological modalities. However, most existing deep learning-based methods not only fail to effectively fuse different biological modalities, resulting in low-quality protein representations, but also suffer from the convergence of suboptimal solution caused by sparse label representations.

View Article and Find Full Text PDF

Radiomics is a method that extracts many features from medical images using various algorithms. Medical nomograms are graphical representations of statistical predictive models that produce a likelihood of a clinical event for a specific individual based on biological and clinical data. The radiomic nomogram was first introduced in 2016 to study the integration of specific radiomic characteristics with clinically significant risk factors for patients with colorectal cancer lymph node metastases.

View Article and Find Full Text PDF

Cerebral microbleeds (CMB) represent a feature of cerebral small vessel disease (cSVD), a prominent vascular contributor to age-related cognitive decline, dementia, and stroke. They are visible as spherical hypointense signals on T2*- or susceptibility-weighted magnetic resonance imaging (MRI) sequences. An increasing number of automated CMB detection methods being proposed are based on supervised deep learning (DL).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!