Motivation: Single-cell RNA-sequencing (scRNA-seq) technology provides a powerful tool for investigating cell heterogeneity and cell subpopulations by allowing the quantification of gene expression at single-cell level. However, scRNA-seq data analysis remains challenging because of various technical noises such as dropout events (i.e. excessive zero counts in the expression matrix).
Results: By taking consideration of the association among cells and genes, we propose a novel collaborative matrix factorization-based method called CMF-Impute to impute the dropout entries of a given scRNA-seq expression matrix. We test CMF-Impute and compare it with the other five state-of-the-art methods on six popular real scRNA-seq datasets of various sizes and three simulated datasets. For simulated datasets, CMF-Impute outperforms other methods in imputing the closest dropouts to the original expression values as evaluated by both the sum of squared error and Pearson correlation coefficient. For real datasets, CMF-Impute achieves the most accurate cell classification results in spite of the choice of different clustering methods like SC3 or T-SNE followed by K-means as evaluated by both adjusted rand index and normalized mutual information. Finally, we demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation, and in inferring cell lineage trajectories.
Availability And Implementation: CMF-Impute is written as a Matlab package which is available at https://github.com/xujunlin123/CMFImpute.git.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/bioinformatics/btaa109 | DOI Listing |
Bioinformatics
January 2025
The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
Neuroinformatics
January 2025
Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
The position and orientation of transcranial magnetic stimulation (TMS) coil, which we collectively refer to as coil placement, significantly affect both the assessment and modulation of cortical excitability. TMS electric field (E-field) simulation can be used to identify optimal coil placement. However, the present E-field simulation required a laborious segmentation and meshing procedure to determine optimal coil placement.
View Article and Find Full Text PDFCommun Med (Lond)
January 2025
Department of Demography, University of California, Berkeley, California, USA.
Background: Digital data sources such as mobile phone call detail records (CDRs) are increasingly being used to estimate population mobility fluxes and to predict the spatiotemporal dynamics of infectious disease outbreaks. Differences in mobile phone operators' geographic coverage, however, may result in biased mobility estimates.
Methods: We leverage a unique dataset consisting of CDRs from three mobile phone operators in Bangladesh and digital trace data from Meta's Data for Good program to compare mobility patterns across these sources.
SAR QSAR Environ Res
December 2024
School of Computing and Data Sciences, FLAME University, Pune, India.
This study illustrates the use of chemical fingerprints with machine learning for blood-brain barrier (BBB) permeability prediction. Employing the Blood Brain Barrier Database (B3DB) dataset for BBB permeability prediction, we extracted nine different fingerprints. Support Vector Machine (SVM) and Extreme Gradient Boosting (XGBoost) algorithms were used to develop models for permeability prediction.
View Article and Find Full Text PDFViruses
November 2024
Institute of Biology, ELTE Eötvös Loránd University, 1117 Budapest, Hungary.
The increasingly widespread application of next-generation sequencing (NGS) in clinical diagnostics and epidemiological research has generated a demand for robust, fast, automated, and user-friendly bioinformatics workflows. To guide the choice of tools for the assembly of full-length viral genomes from NGS datasets, we assessed the performance and applicability of four open-source bioinformatics pipelines (shiver-for which we created a user-friendly Dockerized version, referred to as dshiver; SmaltAlign; viral-ngs; and V-pipe) using both simulated and real-world HIV-1 paired-end short-read datasets and default settings. All four pipelines produced consensus genome assemblies with high quality metrics (genome fraction recovery, mismatch and indel rates, variant calling F1 scores) when the reference sequence used for assembly had high similarity to the analyzed sample.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!