Private information leakage from single-cell count matrices.

Cell

Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; New York Genome Center, New York, NY 10013, USA; Department of Computer Science, Columbia University, New York, NY 10032, USA. Electronic address:

Published: November 2024

AI Article Synopsis

  • The growth of publicly available single-cell datasets has greatly improved our understanding of biology, but it raises significant privacy issues.
  • Recent studies on data sharing have mainly focused on bulk gene expression data due to noise and a lack of large single-cell datasets.
  • Our research reveals that individuals in single-cell datasets are at risk of linking attacks that expose sensitive information, and we propose a method for predicting genotypes that operates independently of eQTLs, allowing for the discovery of private information across different studies.

Article Abstract

The increase in publicly available human single-cell datasets, encompassing millions of cells from many donors, has significantly enhanced our understanding of complex biological processes. However, the accessibility of these datasets raises significant privacy concerns. Due to the inherent noise in single-cell measurements and the scarcity of population-scale single-cell datasets, recent private information quantification studies have focused on bulk gene expression data sharing. To address this gap, we demonstrate that individuals in single-cell gene expression datasets are vulnerable to linking attacks, where attackers can infer their sensitive phenotypic information using publicly available tissue or cell-type-specific expression quantitative trait loci (eQTLs) information. We further develop a method for genotype prediction and genotype-phenotype linking that remains effective without relying on eQTL information. We show that variants from one study can be exploited to uncover private information about individuals in another study.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11568916PMC
http://dx.doi.org/10.1016/j.cell.2024.09.012DOI Listing

Publication Analysis

Top Keywords

single-cell datasets
8
gene expression
8
single-cell
5
private leakage
4
leakage single-cell
4
single-cell count
4
count matrices
4
matrices increase
4
increase publicly
4
publicly human
4

Similar Publications

Pancreatic ductal adenocarcinoma (PDAC) is characterized by its aggressive nature and dismal prognosis, largely attributed to its unique tumor microenvironment. However, the molecular mechanisms by which tumor-associated macrophages (TAMs) promote PDAC progression, particularly the role of β-catenin signaling in regulating TAM phenotype and function, remain incompletely understood. Initially, we performed comprehensive analyses of RNA-seq and single-cell RNA-seq (scRNA-seq) datasets to investigate OSM and LOXL2 expression patterns in PDAC.

View Article and Find Full Text PDF

Cross-species comparative single-cell transcriptomics highlights the molecular evolution and genetic basis of male infertility.

Cell Rep

December 2024

State Key Laboratory of Reproductive Medicine and Offspring Health, Nanjing Medical University, Nanjing, Jiangsu, China; Cellular Screening Center, The University of Chicago, Chicago, IL, USA; Department of Neurology, Center for Reproductive Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, USA. Electronic address:

In male animals, spermatogonia in testes differentiate into sperm, one of the most diverse cell types across species. Despite the evolutionary retention of key genes essential for spermatogenesis, the extent of their conservation remains unclear. To explore the genetic basis of spermatogenesis under strong selective pressure, we compare single-cell RNA sequencing (scRNA-seq) datasets from the testes of humans, mice, and fruit flies.

View Article and Find Full Text PDF

Intratumoural CD8 CXCR5 follicular cytotoxic T cells have prognostic value and are associated with CD19 CD38 B cells and tertiary lymphoid structures in colorectal cancer.

Cancer Immunol Immunother

December 2024

Department of Clinical Laboratory, State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

Background: Colorectal cancer (CRC) is the most common digestive cancer in the world. Microsatellite stability (MSS) and microsatellite instability (MSI-high) are important molecular subtypes of CRC closely related to tumor occurrence and progression and immunotherapy efficacy. The presence of CD8 CXCR5 follicular cytotoxic T (T) cells is strongly associated with autoimmune disease and CD8 effector function.

View Article and Find Full Text PDF

Purpose: Determining the primary origin of non-organ-confined neuroendocrine tumors (NETs) for accurate diagnosis and management. Neuroendocrine tumors are rare neoplasms with diverse clinical behaviors. Determining their primary origin remains challenging in cases of non-organ-confined NETs.

View Article and Find Full Text PDF

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that currently affects approximately 1-2% of the global population. Genome-wide studies have identified several loci associated with ASD; however, pinpointing causal variants remains elusive. Therefore, functional studies are essential to discover potential therapeutics for ASD.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!