CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity.

Bioinformatics

Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China.

Published: March 2023

Motivation: Integration of growing single-cell RNA sequencing datasets helps better understand cellular identity and function. The major challenge for integration is removing batch effects while preserving biological heterogeneities. Advances in contrastive learning have inspired several contrastive learning-based batch correction methods. However, existing contrastive-learning-based methods exhibit noticeable ad hoc trade-off between batch mixing and preservation of cellular heterogeneities (mix-heterogeneity trade-off). Therefore, a deliberate mix-heterogeneity trade-off is expected to yield considerable improvements in scRNA-seq dataset integration.

Results: We develop a novel contrastive learning-based batch correction framework, CIAIRE, which achieves superior mix-heterogeneity trade-off. The key contributions of CLAIRE are proposal of two complementary strategies: construction strategy and refinement strategy, to improve the appropriateness of positive pairs. Construction strategy dynamically generates positive pairs by augmenting inter-batch mutual nearest neighbors (MNN) with intra-batch k-nearest neighbors (KNN), which improves the coverage of positive pairs for the whole distribution of shared cell types between batches. Refinement strategy aims to automatically reduce the potential false positive pairs from the construction strategy, which resorts to the memory effect of deep neural networks. We demonstrate that CLAIRE possesses superior mix-heterogeneity trade-off over existing contrastive learning-based methods. Benchmark results on six real datasets also show that CLAIRE achieves the best integration performance against eight state-of-the-art methods. Finally, comprehensive experiments are conducted to validate the effectiveness of CLAIRE.

Availability And Implementation: The source code and data used in this study can be found in https://github.com/CSUBioGroup/CLAIRE-release.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985174PMC
http://dx.doi.org/10.1093/bioinformatics/btad099DOI Listing

Publication Analysis

Top Keywords

contrastive learning-based
16
mix-heterogeneity trade-off
16
positive pairs
16
learning-based batch
12
batch correction
12
construction strategy
12
correction framework
8
batch mixing
8
mixing preservation
8
preservation cellular
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!