Microarray gene expression data generally suffers from missing value problem due to a variety of experimental reasons. Since the missing data points can adversely affect downstream analysis, many algorithms have been proposed to impute missing values. In this survey, we provide a comprehensive review of existing missing value imputation algorithms, focusing on their underlying algorithmic techniques and how they utilize local or global information from within the data, or their use of domain knowledge during imputation. In addition, we describe how the imputation results can be validated and the different ways to assess the performance of different imputation algorithms, as well as a discussion on some possible future research directions. It is hoped that this review will give the readers a good understanding of the current development in this field and inspire them to come up with the next generation of imputation algorithms.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bib/bbq080 | DOI Listing |
Alzheimers Dement (N Y)
January 2025
Indiana Alzheimer Disease Research Center and Center for Neuroimaging, Department of Radiology and Imaging Sciences Indiana University School of Medicine Indianapolis Indiana USA.
Introduction: The exponential growth of genomic datasets necessitates advanced analytical tools to effectively identify genetic loci from large-scale high throughput sequencing data. This study presents Deep-Block, a multi-stage deep learning framework that incorporates biological knowledge into its AI architecture to identify genetic regions as significantly associated with Alzheimer's disease (AD). The framework employs a three-stage approach: (1) genome segmentation based on linkage disequilibrium (LD) patterns, (2) selection of relevant LD blocks using sparse attention mechanisms, and (3) application of TabNet and Random Forest algorithms to quantify single nucleotide polymorphism (SNP) feature importance, thereby identifying genetic factors contributing to AD risk.
View Article and Find Full Text PDFOne of the major challenges in genomic data sharing is protecting participants' privacy in collaborative studies and when genomic data is outsourced to perform analysis tasks, e.g., genotype imputation services and federated collaborations genomic analysis.
View Article and Find Full Text PDFMed Care
February 2025
RAND, Health Care, Santa Monica, CA.
Background: Medicare Bayesian Improved Surname and Geocoding (MBISG), which augments an imperfect race-and-ethnicity administrative variable to estimate probabilities that people would self-identify as being in each of 6 mutually exclusive racial-and-ethnic groups, performs very well for Asian American and Native Hawaiian/Pacific Islander (AA&NHPI), Black, Hispanic, and White race-and-ethnicity, somewhat less well for American Indian/Alaska Native (AI/AN), and much less well for Multiracial race-and-ethnicity.
Objectives: To assess whether temporal inconsistency of self-reported race-and-ethnicity might limit improvements in approaches like MBISG.
Methods: Using the Medicare Health Outcomes Survey (HOS) baseline (2013-2018) and 2-year follow-up data (2015-2020), we evaluate the consistency of self-reported race-and-ethnicity coded 2 ways: the 6 mutually exclusive MBISG categories and individual endorsements of each racial-and-ethnic group.
Sci Rep
January 2025
Department of Computer Science, Faculty of Computers and Informatics, Kafrelsheikh University, Kafrelsheikh, Egypt.
Missing pixel imputation is a critical task in image processing, where the presence of high percentages of missing pixels can significantly degrade the performance of downstream tasks such as image segmentation and object detection. This paper introduces a novel approach for missing pixel imputation based on Generative Adversarial Networks (GANs). We propose a new GAN architecture incorporating an identity module and a sperm motility-inspired heuristic during filtration to optimize the selection of pixels used in reconstructing missing data.
View Article and Find Full Text PDFInt J Mol Sci
December 2024
Biological and Chemical Research Centre, Faculty of Chemistry, University of Warsaw, Zwirki i Wigury 101, 02-089 Warsaw, Poland.
Mass-spectrometry-based proteomics frequently utilizes label-free quantification strategies due to their cost-effectiveness, methodological simplicity, and capability to identify large numbers of proteins within a single analytical run. Despite these advantages, the prevalence of missing values (MV), which can impact up to 50% of the data matrix, poses a significant challenge by reducing the accuracy, reproducibility, and interpretability of the results. Consequently, effective handling of missing values is crucial for reliable quantitative analysis in proteomic studies.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!