Learning Cell Annotation under Multiple Reference Datasets by Multisource Domain Adaptation.

J Chem Inf Model

School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, Jiangsu210094, China.

Published: January 2023

Accurate and efficient cell type annotation is essential for single-cell sequence analysis. Currently, cell type annotation using well-annotated reference datasets with powerful models has become increasingly popular. However, with the increasing amount of single-cell data, there is an urgent need to develop a novel annotation method that can integrate multiple reference datasets to improve cell type annotation performance. Since the unwanted batch effects between individual reference datasets, integrating multiple reference datasets is still an open challenge. To address this, we proposed scMDR and scMultiR, respectively, using multisource domain adaptation to learn cell type-specific information from multiple reference datasets and query cells. Based on the learned cell type-specific information, scMDR and scMultiR provide the most likely cell types for the query cells. Benchmark experiments demonstrated their state-of-the-art effectiveness for integrative single-cell assignment with multiple reference datasets.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.2c01277DOI Listing

Publication Analysis

Top Keywords

reference datasets
28
multiple reference
20
cell type
12
type annotation
12
multisource domain
8
domain adaptation
8
scmdr scmultir
8
cell type-specific
8
query cells
8
reference
7

Similar Publications

vClean: assessing virus sequence contamination in viral genomes.

NAR Genom Bioinform

March 2025

Department of Life Science and Medical Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan.

Recent advancements in viral metagenomics and single-virus genomics have improved our ability to obtain the draft genomes of environmental viruses. However, these methods can introduce virus sequence contaminations into viral genomes when short, fragmented partial sequences are present in the assembled contigs. These contaminations can lead to incorrect analyses; however, practical detection tools are lacking.

View Article and Find Full Text PDF

Trait mindfulness refers to one's disposition or tendency to pay attention to their experiences in the present moment, in a non-judgmental and accepting way. Trait mindfulness has been robustly associated with positive mental health outcomes, but its neural underpinnings are poorly understood. Prior resting-state fMRI studies have associated trait mindfulness with within- and between-network connectivity of the default-mode (DMN), fronto-parietal (FPN), and salience networks.

View Article and Find Full Text PDF

Biomedical datasets are the mainstays of computational biology and health informatics projects, and can be found on multiple data platforms online or obtained from wet-lab biologists and physicians. The quality and the trustworthiness of these datasets, however, can sometimes be poor, producing bad results in turn, which can harm patients and data subjects. To address this problem, policy-makers, researchers, and consortia have proposed diverse regulations, guidelines, and scores to assess the quality and increase the reliability of datasets.

View Article and Find Full Text PDF

Background: Artificial sweeteners (AS) have been widely utilized in the food, beverage, and pharmaceutical industries for decades. While numerous publications have suggested a potential link between AS and diseases, particularly cancer, controversy still surrounds this issue. This study aims to investigate the association between AS consumption and cancer risk.

View Article and Find Full Text PDF

Complete datasets of genetic variants are key to biodiversity genomic studies. Long-read sequencing technologies allow the routine assembly of highly contiguous, haplotype-resolved reference genomes. However, even when complete, reference genomes from a single individual may bias downstream analyses and fail to adequately represent genetic diversity within a population or species.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!