CellMeSH: probabilistic cell-type identification using indexed literature.

Bioinformatics

Electrical and Computer Engineering Department, University of Washington, Seattle, WA 98195, USA.

Published: February 2022

Motivation: Single-cell RNA sequencing (scRNA-seq) is widely used for analyzing gene expression in multi-cellular systems and provides unprecedented access to cellular heterogeneity. scRNA-seq experiments aim to identify and quantify all cell types present in a sample. Measured single-cell transcriptomes are grouped by similarity and the resulting clusters are mapped to cell types based on cluster-specific gene expression patterns. While the process of generating clusters has become largely automated, annotation remains a laborious ad hoc effort that requires expert biological knowledge.

Results: Here, we introduce CellMeSH-a new automated approach to identifying cell types for clusters based on prior literature. CellMeSH combines a database of gene-cell-type associations with a probabilistic method for database querying. The database is constructed by automatically linking gene and cell-type information from millions of publications using existing indexed literature resources. Compared to manually constructed databases, CellMeSH is more comprehensive and is easily updated with new data. The probabilistic query method enables reliable information retrieval even though the gene-cell-type associations extracted from the literature are noisy. CellMeSH is also able to optionally utilize prior knowledge about tissues or cells for further annotation improvement. CellMeSH achieves top-one and top-three accuracies on a number of mouse and human datasets that are consistently better than existing approaches.

Availability And Implementation: Web server at https://uncurl.cs.washington.edu/db_query and API at https://github.com/shunfumao/cellmesh.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8826164PMC
http://dx.doi.org/10.1093/bioinformatics/btab834DOI Listing

Publication Analysis

Top Keywords

cell types
12
indexed literature
8
gene expression
8
gene-cell-type associations
8
cellmesh
5
cellmesh probabilistic
4
probabilistic cell-type
4
cell-type identification
4
identification indexed
4
literature
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!