Gene expression based survival prediction for cancer patients-A topic modeling approach.

PLoS One

Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.

Published: April 2020

Cancer is one of the leading cause of death, worldwide. Many believe that genomic data will enable us to better predict the survival time of these patients, which will lead to better, more personalized treatment options and patient care. As standard survival prediction models have a hard time coping with the high-dimensionality of such gene expression data, many projects use some dimensionality reduction techniques to overcome this hurdle. We introduce a novel methodology, inspired by topic modeling from the natural language domain, to derive expressive features from the high-dimensional gene expression data. There, a document is represented as a mixture over a relatively small number of topics, where each topic corresponds to a distribution over the words; here, to accommodate the heterogeneity of a patient's cancer, we represent each patient (≈ document) as a mixture over cancer-topics, where each cancer-topic is a mixture over gene expression values (≈ words). This required some extensions to the standard LDA model-e.g., to accommodate the real-valued expression values-leading to our novel discretized Latent Dirichlet Allocation (dLDA) procedure. After using this dLDA to learn these cancer-topics, we can then express each patient as a distribution over a small number of cancer-topics, then use this low-dimensional "distribution vector" as input to a learning algorithm-here, we ran the recent survival prediction algorithm, MTLR, on this representation of the cancer dataset. We initially focus on the METABRIC dataset, which describes each of n = 1,981 breast cancer patients using the r = 49,576 gene expression values, from microarrays. Our results show that our approach (dLDA followed by MTLR) provides survival estimates that are more accurate than standard models, in terms of the standard Concordance measure. We then validate this "dLDA+MTLR" approach by running it on the n = 883 Pan-kidney (KIPAN) dataset, over r = 15,529 gene expression values-here using the mRNAseq modality-and find that it again achieves excellent results. In both cases, we also show that the resulting model is calibrated, using the recent "D-calibrated" measure. These successes, in two different cancer types and expression modalities, demonstrates the generality, and the effectiveness, of this approach. The dLDA+MTLR source code is available at https://github.com/nitsanluke/GE-LDA-Survival.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857918PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0224446PLOS

Publication Analysis

Top Keywords

gene expression
24
survival prediction
12
topic modeling
8
expression data
8
small number
8
expression values
8
expression
7
gene
6
cancer
6
survival
5

Similar Publications

The HAK/KUP/KT (High-affinity K transporters/K uptake permeases/K transporters) is the largest and most dominant potassium transporter family in plants, playing a crucial role in various biological processes. However, our understanding of HAK/KUP/KT gene family in potato ( L.) remains limited and unclear.

View Article and Find Full Text PDF

The wall-associated kinase (WAK) gene family encodes functional cell wall-related proteins. These genes are widely presented in plants and serve as the receptors of plant cell membranes, which perceive the external environment changes and activate signaling pathways to participate in plant growth, development, defense, and stress response. However, the WAK gene family and the encoded proteins in soybean (Glycine max (L.

View Article and Find Full Text PDF

The methylation- demethylation dynamics of RNA plays major roles in different biological functions, including stress responses, in plants. mA methylation in RNA is orchestrated by a coordinated function of methyl transferases (writers) and demethylases (Erasers). Genome-wide analysis of genes involved in methylation and demethylation was performed in pigeon pea.

View Article and Find Full Text PDF

Implication of fibroblast growth factor 7 in ovarian cancer metastases and patient survival.

Front Oncol

January 2025

Gynecologic Oncology Section, Stephenson Cancer Center, Obstetrics and Gynecology Department, University of Oklahoma Health Sciences Center, Oklahoma City, OK, United States.

Background/objectives: Patients with ovarian cancer commonly experience metastases and recurrences, which contribute to high mortality. Our objective was to better understand ovarian cancer metastasis and identify candidate biomarkers and drug targets for predicting and preventing ovarian cancer recurrence.

Methods: Transcripts of 770 cancer-associated genes were compared in cells collected from ascitic fluid versus resected tumors of an ES-2 orthotopic ovarian cancer mouse model.

View Article and Find Full Text PDF

A Prognostic Riskscore Model Related to Infection in Stomach Adenocarcinoma.

Int J Genomics

January 2025

Department of General Medicine, Chongqing University Central Hospital, Chongqing Emergency Medical Center, Chongqing Key Laboratory of Emergency Medicine, Chongqing, China.

() is associated with the development of various stomach diseases, one of the major risk factors for stomach adenocarcinoma (STAD). The infection score between tumor and normal groups was compared by single-sample gene set enrichment analysis (ssGSEA). The key modules related to infection were identified by weighted gene coexpression network analysis (WGCNA), and functional enrichment analysis was conducted on these module genes.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!