Cancer is one of the leading cause of death, worldwide. Many believe that genomic data will enable us to better predict the survival time of these patients, which will lead to better, more personalized treatment options and patient care. As standard survival prediction models have a hard time coping with the high-dimensionality of such gene expression data, many projects use some dimensionality reduction techniques to overcome this hurdle. We introduce a novel methodology, inspired by topic modeling from the natural language domain, to derive expressive features from the high-dimensional gene expression data. There, a document is represented as a mixture over a relatively small number of topics, where each topic corresponds to a distribution over the words; here, to accommodate the heterogeneity of a patient's cancer, we represent each patient (≈ document) as a mixture over cancer-topics, where each cancer-topic is a mixture over gene expression values (≈ words). This required some extensions to the standard LDA model-e.g., to accommodate the real-valued expression values-leading to our novel discretized Latent Dirichlet Allocation (dLDA) procedure. After using this dLDA to learn these cancer-topics, we can then express each patient as a distribution over a small number of cancer-topics, then use this low-dimensional "distribution vector" as input to a learning algorithm-here, we ran the recent survival prediction algorithm, MTLR, on this representation of the cancer dataset. We initially focus on the METABRIC dataset, which describes each of n = 1,981 breast cancer patients using the r = 49,576 gene expression values, from microarrays. Our results show that our approach (dLDA followed by MTLR) provides survival estimates that are more accurate than standard models, in terms of the standard Concordance measure. We then validate this "dLDA+MTLR" approach by running it on the n = 883 Pan-kidney (KIPAN) dataset, over r = 15,529 gene expression values-here using the mRNAseq modality-and find that it again achieves excellent results. In both cases, we also show that the resulting model is calibrated, using the recent "D-calibrated" measure. These successes, in two different cancer types and expression modalities, demonstrates the generality, and the effectiveness, of this approach. The dLDA+MTLR source code is available at https://github.com/nitsanluke/GE-LDA-Survival.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6857918 | PMC |
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0224446 | PLOS |
Front Plant Sci
January 2025
College of Agronomy, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.
The HAK/KUP/KT (High-affinity K transporters/K uptake permeases/K transporters) is the largest and most dominant potassium transporter family in plants, playing a crucial role in various biological processes. However, our understanding of HAK/KUP/KT gene family in potato ( L.) remains limited and unclear.
View Article and Find Full Text PDFFront Plant Sci
January 2025
College of Agriculture and Biology, Liaocheng University, Liaocheng, China.
The wall-associated kinase (WAK) gene family encodes functional cell wall-related proteins. These genes are widely presented in plants and serve as the receptors of plant cell membranes, which perceive the external environment changes and activate signaling pathways to participate in plant growth, development, defense, and stress response. However, the WAK gene family and the encoded proteins in soybean (Glycine max (L.
View Article and Find Full Text PDFFront Plant Sci
January 2025
National Institute of Plant Biotechnology, Indian Council of Agricultural Research (ICAR), New Delhi, India.
The methylation- demethylation dynamics of RNA plays major roles in different biological functions, including stress responses, in plants. mA methylation in RNA is orchestrated by a coordinated function of methyl transferases (writers) and demethylases (Erasers). Genome-wide analysis of genes involved in methylation and demethylation was performed in pigeon pea.
View Article and Find Full Text PDFFront Oncol
January 2025
Gynecologic Oncology Section, Stephenson Cancer Center, Obstetrics and Gynecology Department, University of Oklahoma Health Sciences Center, Oklahoma City, OK, United States.
Background/objectives: Patients with ovarian cancer commonly experience metastases and recurrences, which contribute to high mortality. Our objective was to better understand ovarian cancer metastasis and identify candidate biomarkers and drug targets for predicting and preventing ovarian cancer recurrence.
Methods: Transcripts of 770 cancer-associated genes were compared in cells collected from ascitic fluid versus resected tumors of an ES-2 orthotopic ovarian cancer mouse model.
Int J Genomics
January 2025
Department of General Medicine, Chongqing University Central Hospital, Chongqing Emergency Medical Center, Chongqing Key Laboratory of Emergency Medicine, Chongqing, China.
() is associated with the development of various stomach diseases, one of the major risk factors for stomach adenocarcinoma (STAD). The infection score between tumor and normal groups was compared by single-sample gene set enrichment analysis (ssGSEA). The key modules related to infection were identified by weighted gene coexpression network analysis (WGCNA), and functional enrichment analysis was conducted on these module genes.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!