Motivation: Unsupervised clustering is important in disease subtyping, among having other genomic applications. As genomic data has become more multifaceted, how to cluster across data sources for more precise subtyping is an ever more important area of research. Many of the methods proposed so far, including iCluster and Cluster of Cluster Assignments (COCAs), make an unreasonable assumption of a common clustering across all data sources, and those that do not are fewer and tend to be computationally intensive.

Results: We propose a Bayesian parametric model for integrative, unsupervised clustering across data sources. In our two-way latent structure model, samples are clustered in relation to each specific data source, distinguishing it from methods like COCAs and iCluster, but cluster labels have across-dataset meaning, allowing cluster information to be shared between data sources. A common scaling across data sources is not required, and inference is obtained by a Gibbs Sampler, which we improve with a warm start strategy and modified density functions to robustify and speed convergence. Posterior interpretation allows for inference on common clusterings occurring among subsets of data sources. An interesting statistical formulation of the model results in sampling from closed-form posteriors despite incorporation of a complex latent structure. We fit the model with Gaussian and more general densities, which influences the degree of across-dataset cluster label sharing. Uniquely among integrative clustering models, our formulation makes no nestedness assumptions of samples across data sources so that a sample missing data from one genomic source can be clustered according to its existing data sources. We apply our model to a Norwegian breast cancer cohort of ductal carcinoma in situ and invasive tumors, comprised of somatic copy-number alteration, methylation and expression datasets. We find enrichment in the Her2 subtype and ductal carcinoma among those observations exhibiting greater cluster correspondence across expression and CNA data. In general, there are few pan-genomic clusterings, suggesting that models assuming a common clustering across genomic data sources might yield misleading results.

Availability And Implementation: The model is implemented in an R package called twl ('two-way latent'), available on CRAN. Data for analysis are available within the R package.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btz381DOI Listing

Publication Analysis

Top Keywords

data sources
36
data
16
latent structure
12
genomic data
12
sources
9
two-way latent
8
structure model
8
cluster
8
breast cancer
8
cancer cohort
8

Similar Publications

Exponentially Enhanced Scheme for the Heralded Qudit Greenberger-Horne-Zeilinger State in Linear Optics.

Phys Rev Lett

December 2024

Center for Quantum Information, Korea Institute of Science and Technology (KIST), Seoul 02792, Korea and Division of Quantum Information Technology, KIST School, Korea University of Science and Technology, Seoul 02792, Korea.

High-dimensional multipartite entanglement plays a crucial role in quantum information science. However, existing schemes for generating such entanglement become complex and costly as the dimension of quantum units increases. In this Letter, we overcome the limitation by proposing a significantly enhanced linear optical heralded scheme that generates the d-level N-partite Greenberger-Horne-Zeilinger (GHZ) state with single-photon sources and linear operations.

View Article and Find Full Text PDF

Background: Mental health chatbots have emerged as a promising tool for providing accessible and convenient support to individuals in need. Building on our previous research on digital interventions for loneliness and depression among Korean college students, this study addresses the limitations identified and explores more advanced artificial intelligence-driven solutions.

Objective: This study aimed to develop and evaluate the performance of HoMemeTown Dr.

View Article and Find Full Text PDF

Research Participants' Engagement and Retention in Digital Health Interventions Research: Protocol for Mixed Methods Systematic Review.

JMIR Res Protoc

January 2025

Department of Women's and Children's Health, Participatory eHealth and Health Data Research Group, Uppsala University, Uppsala, Sweden.

Background: Digital health interventions have become increasingly popular in recent years, expanding the possibilities for treatment for various patient groups. In clinical research, while the design of the intervention receives close attention, challenges with research participant engagement and retention persist. This may be partially due to the use of digital health platforms, which may lack adequacy for participants.

View Article and Find Full Text PDF

Background: Virtual follow-up (VFU) has the potential to enhance cancer survivorship care. However, a greater understanding is needed of how VFU can be optimized.

Objective: This study aims to examine how, for whom, and in what contexts VFU works for cancer survivorship care.

View Article and Find Full Text PDF

Background: Young gay, bisexual, and other men who have sex with men have been referred to as a "hard-to-reach" or "hidden" community in terms of recruiting for research studies. With widespread internet use among this group and young adults in general, web-based avenues represent an important approach for reaching and recruiting members of this community. However, little is known about how participants recruited from various web-based sources may differ from one another.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!