Cancer is caused by an accumulation of somatic mutations and copy number alterations (CNAs). Besides mutations, these copy number changes are key characteristics of cancer development. Nonetheless, some tumors show hardly any CNAs, a remarkable phenomenon in oncogenesis.
View Article and Find Full Text PDFEpidemiol Methods
January 2024
Objectives: The addition of two-way interactions is a classic problem in statistics, and comes with the challenge of quadratically increasing dimension. We aim to a) devise an estimation method that can handle this challenge and b) to aid interpretation of the resulting model by developing computational tools for quantifying variable importance.
Methods: Existing strategies typically overcome the dimensionality problem by only allowing interactions between relevant main effects.
Aims/hypothesis: People with type 2 diabetes are heterogeneous in their disease trajectory, with some progressing more quickly to insulin initiation than others. Although classical biomarkers such as age, HbA and diabetes duration are associated with glycaemic progression, it is unclear how well such variables predict insulin initiation or requirement and whether newly identified markers have added predictive value.
Methods: In two prospective cohort studies as part of IMI-RHAPSODY, we investigated whether clinical variables and three types of molecular markers (metabolites, lipids, proteins) can predict time to insulin requirement using different machine learning approaches (lasso, ridge, GRridge, random forest).
J Comput Graph Stat
November 2022
Elastic net penalization is widely used in high-dimensional prediction and variable selection settings. Auxiliary information on the variables, for example, groups of variables, is often available. Group-adaptive elastic net penalization exploits this information to potentially improve performance by estimating group penalties, thereby penalizing important groups of variables less than other groups.
View Article and Find Full Text PDFMotivation: In many high-dimensional prediction or classification tasks, complementary data on the features are available, e.g. prior biological knowledge on (epi)genetic markers.
View Article and Find Full Text PDFBackground And Aim: Phenotypic expression of hypertrophic cardiomyopathy (HCM) and disease course are associated with unfavorable metabolic health. We investigated if Western diet (WD) feeding is sufficient to trigger cardiac hypertrophy and dysfunction in heterozygous (HET) knock-in mice.
Methods And Results: Wild-type (WT) and HET mice (3-months-old) were fed a WD or normal chow (NC) for 8 weeks.
Background: High-dimensional prediction considers data with more variables than samples. Generic research goals are to find the best predictor or to select variables. Results may be improved by exploiting prior information in the form of co-data, providing complementary data not on the samples, but on the variables.
View Article and Find Full Text PDFObjectives: Human papillomavirus- (HPV) positive oropharyngeal squamous cell carcinoma (OPSCC) differs biologically and clinically from HPV-negative OPSCC and has a better prognosis. This study aims to analyze the value of magnetic resonance imaging (MRI)-based radiomics in predicting HPV status in OPSCC and aims to develop a prognostic model in OPSCC including HPV status and MRI-based radiomics.
Materials And Methods: Manual delineation of 249 primary OPSCCs (91 HPV-positive and 159 HPV-negative) on pretreatment native T1-weighted MRIs was performed and used to extract 498 radiomic features per delineation.
Purpose: Patients with vanishing white matter (VWM) experience unremitting chronic neurological decline and stress-provoked episodes of rapid, partially reversible decline. Cerebral white matter abnormalities are progressive, without improvement, and are therefore unlikely to be related to the episodes. We determined which radiological findings are related to episodic decline.
View Article and Find Full Text PDFObjectives: To externally validate a pre-treatment MR-based radiomics model predictive of locoregional control in oropharyngeal squamous cell carcinoma (OPSCC) and to assess the impact of differences between datasets on the predictive performance.
Methods: Radiomic features, as defined in our previously published radiomics model, were extracted from the primary tumor volumes of 157 OPSCC patients in a different institute. The developed radiomics model was validated using this cohort.
Patients with inflammatory bowel disease (IBD) produce enhanced immunoglobulin A (IgA) against the microbiota compared to healthy individuals, which has been correlated with disease severity. Since IgA complexes can potently activate myeloid cells via the IgA receptor FcαRI (CD89), excessive IgA production may contribute to IBD pathology. However, the cellular mechanisms that contribute to dysregulated IgA production in IBD are poorly understood.
View Article and Find Full Text PDFThe features in a high-dimensional biomedical prediction problem are often well described by low-dimensional latent variables (or factors). We use this to include unlabeled features and additional information on the features when building a prediction model. Such additional feature information is often available in biomedical applications.
View Article and Find Full Text PDFPreclinical models have been the workhorse of cancer research, producing massive amounts of drug response data. Unfortunately, translating response biomarkers derived from these datasets to human tumors has proven to be particularly challenging. To address this challenge, we developed TRANSACT, a computational framework that builds a consensus space to capture biological processes common to preclinical models and human tumors and exploits this space to construct drug response predictors that robustly transfer from preclinical models to human tumors.
View Article and Find Full Text PDFHypertrophic Cardiomyopathy (HCM) is a common inherited heart disease with poor risk prediction due to incomplete penetrance and a lack of clear genotype-phenotype correlations. Advanced imaging techniques have shown altered myocardial energetics already in preclinical gene variant carriers. To determine whether disturbed myocardial energetics with the potential to serve as biomarkers are also reflected in the serum metabolome, we analyzed the serum metabolome of asymptomatic carriers in comparison to healthy controls and obstructive HCM patients (HOCM).
View Article and Find Full Text PDFDeconvolution of bulk gene expression profiles into the cellular components is pivotal to portraying tissue's complex cellular make-up, such as the tumor microenvironment. However, the inherently variable nature of gene expression requires a comprehensive statistical model and reliable prior knowledge of individual cell types that can be obtained from single-cell RNA sequencing. We introduce BLADE (Bayesian Log-normAl Deconvolution), a unified Bayesian framework to estimate both cellular composition and gene expression profiles for each cell type.
View Article and Find Full Text PDFHigh levels of methylated DNA in urine represent an emerging biomarker for non-small cell lung cancer (NSCLC) detection and are the subject of ongoing research. This study aimed to investigate the circadian variation of urinary cell-free DNA (cfDNA) abundance and methylation levels of cancer-associated genes in NSCLC patients. In this prospective study of 23 metastatic NSCLC patients with active disease, patients were asked to collect six urine samples during the morning, afternoon, and evening of two subsequent days.
View Article and Find Full Text PDFConsensus molecular subtypes (CMSs) can guide precision treatment of colorectal cancer (CRC). We aim to identify methylation markers to distinguish between CMS2 and CMS3 in patients with CRC, for which an easy test is currently lacking. To this aim, fresh-frozen tumor tissue of 239 patients with stage I-III CRC was analyzed.
View Article and Find Full Text PDFClinical research often focuses on complex traits in which many variables play a role in mechanisms driving, or curing, diseases. Clinical prediction is hard when data is high-dimensional, but additional information, like domain knowledge and previously published studies, may be helpful to improve predictions. Such complementary data, or co-data, provide information on the covariates, such as genomic location or P-values from external studies.
View Article and Find Full Text PDFIn precision medicine, a common problem is drug sensitivity prediction from cancer tissue cell lines. These types of problems entail modelling multivariate drug responses on high-dimensional molecular feature sets in typically >1000 cell lines. The dimensions of the problem require specialised models and estimation methods.
View Article and Find Full Text PDFObjectives: Head and neck squamous cell carcinoma (HNSCC) shows a remarkable heterogeneity between tumors, which may be captured by a variety of quantitative features extracted from diagnostic images, termed radiomics. The aim of this study was to develop and validate MRI-based radiomic prognostic models in oral and oropharyngeal cancer.
Materials And Methods: Native T1-weighted images of four independent, retrospective (2005-2013), patient cohorts (n = 102, n = 76, n = 89, and n = 56) were used to delineate primary tumors, and to extract 545 quantitative features from.
Motivation: Machine learning in the biomedical sciences should ideally provide predictive and interpretable models. When predicting outcomes from clinical or molecular features, applied researchers often want to know which features have effects, whether these effects are positive or negative and how strong these effects are. Regression analysis includes this information in the coefficients but typically renders less predictive models than more advanced machine learning techniques.
View Article and Find Full Text PDFIn high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (i) $p$-values from a previous study and (ii) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection but is not straightforward.
View Article and Find Full Text PDFScreening to detect colorectal cancer (CRC) in an early or premalignant state is an effective method to reduce CRC mortality rates. Current stool-based screening tests, e.g.
View Article and Find Full Text PDFMotivation: Cell lines and patient-derived xenografts (PDXs) have been used extensively to understand the molecular underpinnings of cancer. While core biological processes are typically conserved, these models also show important differences compared to human tumors, hampering the translation of findings from pre-clinical models to the human setting. In particular, employing drug response predictors generated on data derived from pre-clinical models to predict patient response remains a challenging task.
View Article and Find Full Text PDFAccurate diagnosis of pancreatic head lesions remains challenging as no minimally invasive biomarkers are available to discriminate distal cholangiocarcinoma (CCA) from pancreatic ductal adenocarcinoma (PDAC). The aim of this study is to identify specific circulating microRNAs (miRNAs) to diagnose distal CCA. In the discovery phase, PCR profiling of 752 miRNAs was performed on fourteen patients with distal CCA and age- and sex-matched healthy controls.
View Article and Find Full Text PDF