AI Article Synopsis

  • The use of single cell RNA sequencing (scRNA-seq) is transforming our knowledge of cell biology, particularly in identifying cell types, understanding disease, and advancing drug development.
  • The growth of scRNA-seq data presents challenges in classifying cell types and finding key marker genes, leading to the adoption of machine learning strategies, such as the NS-Forest algorithm.
  • NS-Forest v4.0 improves upon earlier versions by efficiently selecting marker gene combinations, introducing a new metric called On-Target Fraction to measure the exclusivity of marker gene expression, and enabling comparisons between user and algorithm-derived marker genes.

Article Abstract

The use of single cell/nucleus RNA sequencing (scRNA-seq) technologies that quantitively describe cell transcriptional phenotypes is revolutionizing our understanding of cell biology, leading to new insights in cell type identification, disease mechanisms, and drug development. The tremendous growth in scRNA-seq data has posed new challenges in efficiently characterizing data-driven cell types and identifying quantifiable marker genes for cell type classification. The use of machine learning and explainable artificial intelligence has emerged as an effective approach to study large-scale scRNA-seq data. NS-Forest is a random forest machine learning-based algorithm that aims to provide a scalable data-driven solution to identify minimum combinations of necessary and sufficient marker genes that capture cell type identity with maximum classification accuracy. Here, we describe the latest version, NS-Forest version 4.0 and its companion Python package (https://github.com/JCVenterInstitute/NSForest), with several enhancements to select marker gene combinations that exhibit highly selective expression patterns among closely related cell types and more efficiently perform marker gene selection for large-scale scRNA-seq data atlases with millions of cells. By modularizing the final decision tree step, NS-Forest v4.0 can be used to compare the performance of user-defined marker genes with the NS-Forest computationally-derived marker genes based on the decision tree classifiers. To quantify how well the identified markers exhibit the desired pattern of being exclusively expressed at high levels within their target cell types, we introduce the On-Target Fraction metric that ranges from 0 to 1, with a metric of 1 assigned to markers that are only expressed within their target cell types and not in cells of any other cell types. NS-Forest v4.0 outperforms previous versions on its ability to identify markers with higher On-Target Fraction values for closely related cell types and outperforms other marker gene selection approaches at classification with significantly higher F-beta scores when applied to datasets from three human organs - brain, kidney, and lung.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11071431PMC
http://dx.doi.org/10.1101/2024.04.22.590194DOI Listing

Publication Analysis

Top Keywords

cell types
24
marker genes
20
cell type
16
cell
13
scrna-seq data
12
marker gene
12
type classification
8
marker
8
rna sequencing
8
large-scale scrna-seq
8

Similar Publications

A viscoelastic-plastic deformation model of hemisphere-like tip growth in Arabidopsis zygotes.

Quant Plant Biol

December 2024

Department of Mechanical Engineering, Faculty of Systems Science and Technology, Akita Prefectural University, Yurihonjo, Japan.

Plant zygote cells exhibit tip growth, producing a hemisphere-like tip. To understand how this hemisphere-like tip shape is formed, we revisited a viscoelastic-plastic deformation model that enabled us to simultaneously evaluate the shape, stress and strain of Arabidopsis () zygote cells undergoing tip growth. Altering the spatial distribution of cell wall extensibility revealed that cosine-type distribution and growth in a normal direction to the surface create a stable hemisphere-like tip shape.

View Article and Find Full Text PDF

Ion homeostasis is a crucial process in plants that is closely linked to the efficiency of nutrient uptake, stress tolerance and overall plant growth and development. Nevertheless, our understanding of the fundamental processes of ion homeostasis is still incomplete and highly fragmented. Especially at the mechanistic level, we are still in the process of dissecting physiological systems to analyse the different parts in isolation.

View Article and Find Full Text PDF

-polarized M2-like tumor-associated macrophages accelerate colorectal cancer development via IL-8 secretion.

Anim Cells Syst (Seoul)

December 2024

Department of Oral Biochemistry, Dental and Life Science Institute, School of Dentistry, Pusan National University, Yangsan, Republic of Korea.

(), a periodontal pathogen, has been implicated in the impairment of anti-tumor responses in colorectal cancer (CRC). The tumor microenvironment in CRC involves tumor-associated macrophages (TAMs), which are pivotal in modulating tumor-associated immune responses. The polarization of TAMs towards an M2-like phenotype promotes CRC progression by suppressing the immune system.

View Article and Find Full Text PDF

Diabetic cardiomyopathy (DCM) is a major complication of type 2 diabetes mellitus (T2DM), but its effective prevention and treatment are still limited. We investigated the effects of GYY4137, a slow-releasing hydrogen sulfide donor, and its downstream mediator forkhead box protein O1 (FOXO1) on T2DM-associated DCM. , T2DM mice were induced by a high-fat diet coupled with streptozotocin injection.

View Article and Find Full Text PDF

Insect protein hydrolysates (PH) are emerging as valuable compounds with biological activity. The aim of the present study was to assess the potential cytoprotective effects of PH from the Black Soldier Fly (BPH, in the range 0.1-0.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!