Motivation: Machine learning (ML) methods are motivated by the need to automate information extraction from large datasets in order to support human users in data-driven tasks. This is an attractive approach for integrative joint analysis of vast amounts of omics data produced in next generation sequencing and other -omics assays. A systematic assessment of the current literature can help to identify key trends and potential gaps in methodology and applications. We surveyed the literature on ML multi-omic data integration and quantitatively explored the goals, techniques and data involved in this field. We were particularly interested in examining how researchers use ML to deal with the volume and complexity of these datasets.

Results: Our main finding is that the methods used are those that address the challenges of datasets with few samples and many features. Dimensionality reduction methods are used to reduce the feature count alongside models that can also appropriately handle relatively few samples. Popular techniques include autoencoders, random forests and support vector machines. We also found that the field is heavily influenced by the use of The Cancer Genome Atlas dataset, which is accessible and contains many diverse experiments.

Availability And Implementation: All data and processing scripts are available at this GitLab repository: https://gitlab.com/polavieja_lab/ml_multi-omics_review/ or in Zenodo: https://doi.org/10.5281/zenodo.7361807.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9907220PMC
http://dx.doi.org/10.1093/bioinformatics/btad021DOI Listing

Publication Analysis

Top Keywords

machine learning
8
data
6
dealing dimensionality
4
dimensionality application
4
application machine
4
learning multi-omics
4
multi-omics data
4
data motivation
4
motivation machine
4
learning methods
4

Similar Publications

BMT: A Cross-Validated ThinPrep Pap Cervical Cytology Dataset for Machine Learning Model Training and Validation.

Sci Data

December 2024

Department of Pathology and Laboratory Medicine, Alpert Medical School, Brown University, Providence, RI, 02912, USA.

In the past several years, a few cervical Pap smear datasets have been published for use in clinical training. However, most publicly available datasets consist of pre-segmented single cell images, contain on-image annotations that must be manually edited out, or are prepared using the conventional Pap smear method. Multicellular liquid Pap image datasets are a more accurate reflection of current cervical screening techniques.

View Article and Find Full Text PDF

Background: High triglyceride (TG) affects and is affected of other hematological factors. The determination of serum fasted triglycerides concentrations, as part of a lipid profile, is crucial key point in hematological factors and significantly affect various systemic diseases. This study was carried out to assess the potential relation between the concentration of TG and hematological factors.

View Article and Find Full Text PDF

Generative Artificial Intelligence (AI), characterized by its ability to generate diverse forms of content including text, images, video and audio, has revolutionized many fields, including medical education. Generative AI leverages machine learning to create diverse content, enabling personalized learning, enhancing resource accessibility, and facilitating interactive case studies. This narrative review explores the integration of generative artificial intelligence (AI) into orthopedic education and training, highlighting its potential, current challenges, and future trajectory.

View Article and Find Full Text PDF

Bias in machine learning applications to address non-communicable diseases at a population-level: a scoping review.

BMC Public Health

December 2024

Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, 30 Bond Street, Toronto, ON, M5B 1W8, Canada.

Background: Machine learning (ML) is increasingly used in population and public health to support epidemiological studies, surveillance, and evaluation. Our objective was to conduct a scoping review to identify studies that use ML in population health, with a focus on its use in non-communicable diseases (NCDs). We also examine potential algorithmic biases in model design, training, and implementation, as well as efforts to mitigate these biases.

View Article and Find Full Text PDF

Development and Validation of a Nomogram Based on Multiparametric MRI for Predicting Lymph Node Metastasis in Endometrial Cancer: A Retrospective Cohort Study.

Acad Radiol

December 2024

Department of Radiology, The First Affiliated Hospital of Guangxi Medical University, Nanning, China (Y.T., Y.W., Y.Y., X.Q., Y.H., J.L.); Key Laboratory of Early Prevention and Treatment for Regional High Frequency Tumor (Guangxi Medical University), Ministry of Education, Nanning 530021, Guangxi Zhuang Autonomous Region, PR China (J.L.). Electronic address:

Rationale And Objectives: To develop a radiomics nomogram based on clinical and magnetic resonance features to predict lymph node metastasis (LNM) in endometrial cancer (EC).

Materials And Methods: We retrospectively collected 308 patients with endometrial cancer (EC) from two centers. These patients were divided into a training set (n=155), a test set (n=67), and an external validation set (n=86).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!