Background: Single-cell multi-omics technologies, particularly single-cell RNA sequencing (scRNA-seq), have revolutionized our understanding of cellular heterogeneity and development by providing insights into gene expression at the single-cell level. Investigating the influence of genes on cellular behavior is crucial for elucidating cell fate determination and differentiation, cell development processes, and disease mechanisms.

Methods: Inspired by NLP, we present a novel scRNA-seq analysis method that treats genes as analogous to words. Using word2vec to embed gene sequences derived from gene networks, we generate vector representations of genes, which are then used to represent cells by summing gene vectors and subsequently tissues by aggregating cell vectors.

Results: Our NLP-based approach analyzes scRNA-seq data by generating vector representations of genes, cells, and tissues. This multi-scale analysis includes mapping cell states in vector space to reveal developmental trajectories, quantifying cell similarity using Euclidean distance, and constructing inter-tissue relationship networks from aggregated cell vectors.

Conclusions: This method offers a computationally efficient approach for analyzing scRNA-seq data by constructing embedding representations similar to those used in large language model pre-training, but without requiring high-performance computing clusters. By generating gene embeddings that capture functional relationships, this method facilitates the study of cell development trajectories, the impact of gene perturbations, cell clustering, and the construction and analysis of tissue networks. This provides a valuable tool for single-cell data analysis.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11877821PMC
http://dx.doi.org/10.1186/s12967-025-06263-2DOI Listing

Publication Analysis

Top Keywords

cell development
12
scrna-seq data
12
cell
8
vector representations
8
representations genes
8
gene
6
scrna-seq
5
investigation cell
4
development
4
development tissue
4

Similar Publications

Semiautomated Production of Cell-Free Biosensors.

ACS Synth Biol

March 2025

Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois 60208, United States.

Cell-free synthetic biology biosensors have potential as effective diagnostic technologies for the detection of chemical compounds, such as toxins and human health biomarkers. They have several advantages over conventional laboratory-based diagnostic approaches, including the ability to be assembled, freeze-dried, distributed, and then used at the point of need. This makes them an attractive platform for cheap and rapid chemical detection across the globe.

View Article and Find Full Text PDF

Historical studies performed nearly a century ago using mouse skin models identified two key steps in cancer evolution: initiation, a likely mutational event, and promotion, driven by inflammation and cell proliferation. Initiation was proposed to be permanent, with promotion as the critical rate-limiting step for cancer development. Here, we carried out whole genome sequencing to demonstrate that initiated cells with thousands of mutagen-induced mutations can persist for long periods and are not removed by cell competition or by immune intervention, thus mimicking the persistence of cells with cancer driver mutations in normal human tissues.

View Article and Find Full Text PDF

Positive surgical margins following radical prostatectomy significantly contribute to tumor recurrence. While systemic chemotherapy demonstrates limited efficacy in this context, local chemotherapy drug delivery systems based on nanomaterials offer promising strategies to address this issue by modifying drug release kinetics and distribution, thereby enhancing antitumor effects while minimizing the toxicities associated with systemic chemotherapy. In this study, we utilized electrospun nanofibrous mats loaded with docetaxel for sustained drug delivery.

View Article and Find Full Text PDF

Background: Congenital cytomegalovirus is the leading cause of nongenetic sensorineural hearing loss. Treatment with (val)ganciclovir improves audiologic outcomes. Neutropenia is a common adverse event, but correlates that predict who will develop neutropenia have not been identified.

View Article and Find Full Text PDF

The development of targeted therapy for patients with multiple myeloma (MM) is hampered by the low frequency of actionable genetic abnormalities. Gain or amplification of chromosome 1q (1q+) is the most frequent arm-level copy number gain in patients with MM and is associated with higher risk of progression and death despite recent therapeutic advances. Thus, developing targeted therapy for MM patients with 1q+ stands to benefit a large portion of patients in need of more effective management.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!