CPARI: a novel approach combining cell partitioning with absolute and relative imputation to address dropout in single-cell RNA-seq data.

Brief Bioinform

School of Computer Science and Engineering, Guilin University of Technology, 12 Jiangan Road, Qixing District, Guilin 541004, China.

Published: November 2024

A key challenge in analyzing single-cell RNA sequencing data is the large number of false zeros, known as "dropout zeros", which are caused by technical limitations such as shallow sequencing depth or inefficient mRNA capture. To address this challenge, we propose a novel imputation model called CPARI, which combines cell partitioning with our designed absolute and relative imputation methods. Initially, CPARI employs a new approach to select highly variable genes and constructs an average consensus matrix using C-mean fuzzy clustering-based blockchain technology to obtain results at different resolutions. Hierarchical clustering is then applied to further refine these blocks, resulting in well-defined cellular partitions. Subsequently, CPARI identifies dropout events and determines the imputation positions of these identified zeros. An autoencoder is trained within each cellular block to learn gene features and reconstruct data. Our uniquely defined absolute imputation technique is first applied to the identified positions, followed by our relative imputation technique to address remaining dropout zeros, ensuring that both global consistency and local variation are maintained. Through comprehensive analyses conducted on simulated and real scRNA-seq datasets, including quantitative assessment, differential expression analysis, cell clustering, cell trajectory inference, robustness evaluation, and large-scale data imputation, CPARI demonstrates superior performance compared to 12 other art-of-state imputation models. Additionally, ablation experiments further confirm the significance and necessity of both the cell partitioning and relative imputation components of CPARI. Notably, CPARI as a new denoising approach could distinguish between real biological zeros and dropout zeros and minimize false positives, and maximize the accuracy of imputation.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11666288PMC
http://dx.doi.org/10.1093/bib/bbae668DOI Listing

Publication Analysis

Top Keywords

relative imputation
16
cell partitioning
12
imputation
10
absolute relative
8
imputation technique
8
dropout zeros
8
cpari
7
cell
5
zeros
5
cpari novel
4

Similar Publications

Background And Objectives: Although a substantial amount of research has focused on negative aspects of caregiving, less research has been conducted investigating positive aspects of providing informal care. The aim of this study was to investigate the longitudinal association between caregiving satisfaction and psychological distress in informal carers of dependent older people, and whether this relationship is mediated by caregiver burden.

Research Design And Methods: Prospective longitudinal study with a probabilistic sample of 332 caregivers of older relatives, with data collected at baseline and at 1-year follow-up.

View Article and Find Full Text PDF

Genomic selection is a widely used quantitative method of determining the genetic value of an individual from genomic information and phenotypic data. In this study, we used a large, multi-year training population of 3248 individuals from the University of Florida strawberry (Fragaria × ananassa Duchesne) breeding program. We coupled this training population with a test population of 1460 individuals derived from 20 biparental families.

View Article and Find Full Text PDF

Comprehensive Evaluation of Advanced Imputation Methods for Proteomic Data Acquired via the Label-Free Approach.

Int J Mol Sci

December 2024

Biological and Chemical Research Centre, Faculty of Chemistry, University of Warsaw, Zwirki i Wigury 101, 02-089 Warsaw, Poland.

Mass-spectrometry-based proteomics frequently utilizes label-free quantification strategies due to their cost-effectiveness, methodological simplicity, and capability to identify large numbers of proteins within a single analytical run. Despite these advantages, the prevalence of missing values (MV), which can impact up to 50% of the data matrix, poses a significant challenge by reducing the accuracy, reproducibility, and interpretability of the results. Consequently, effective handling of missing values is crucial for reliable quantitative analysis in proteomic studies.

View Article and Find Full Text PDF

Poor Self-Rated Health (SRHp) is part of a four-item scale for self-assessment. SRH from the 2019 Behavioral Risk Factor Surveillance Survey (BRFSS) is used to test hypotheses linking population-level well-being influenced by bereavement due to the death of a close friend or relative. By linking the prevalence rates of population-level well-being with exposure to bereavement, we extend our knowledge of this exposure beyond single-person studies.

View Article and Find Full Text PDF

Empirical Bayes Linked Matrix Decomposition.

Mach Learn

October 2024

Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, 55455, MN, USA.

Data for several applications in diverse fields can be represented as multiple matrices that are linked across rows or columns. This is particularly common in molecular biomedical research, in which multiple molecular "omics" technologies may capture different feature sets (e.g.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!