Objectives: Most bipolar disorder (BD) patients initially present with depressive symptoms, resulting in a delayed diagnosis of BD and poor clinical outcomes. This study aims to identify features predictive of the conversion from Major Depressive Disorder (MDD) to BD by leveraging electronic health record (EHR) data from the Clínica San Juan de Dios Manizales in Colombia.
Methods: We employed a multivariable Cox regression model to identify important predictors of conversion from MDD to BD.
Motivation: Conditional testing via the knockoff framework allows one to identify-among a large number of possible explanatory variables-those that carry unique information about an outcome of interest and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome-wide association studies (GWAS), which have the goal of identifying genetic variants that influence traits of medical relevance.
Results: While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors.
We deployed the Blended Genome Exome (BGE), a DNA library blending approach that generates low pass whole genome (1-4× mean depth) and deep whole exome (30-40× mean depth) data in a single sequencing run. This technology is cost-effective, empowers most genomic discoveries possible with deep whole genome sequencing, and provides an unbiased method to capture the diversity of common SNP variation across the globe. To evaluate this new technology at scale, we applied BGE to sequence >53,000 samples from the Populations Underrepresented in Mental Illness Associations Studies (PUMAS) Project, which included participants across African, African American, and Latin American populations.
View Article and Find Full Text PDFConditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying genetic variants which influence traits of medical relevance. While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors.
View Article and Find Full Text PDFUnderstanding the causal genetic architecture of complex phenotypes is essential for future research into disease mechanisms and potential therapies. Here, we present a novel framework for genome-wide detection of sets of variants that carry non-redundant information on the phenotypes and are therefore more likely to be causal in a biological sense. Crucially, our framework requires only summary statistics obtained from standard genome-wide marginal association testing.
View Article and Find Full Text PDFIdentifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.
View Article and Find Full Text PDFBackground: Geographical variations in mood and psychotic disorders have been found in upper-income countries. We looked for geographic variation in these disorders in Colombia, a middle-income country. We analyzed electronic health records from the Clínica San Juan de Dios Manizales (CSJDM), which provides comprehensive mental healthcare for the one million inhabitants of Caldas.
View Article and Find Full Text PDFScientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure of the International Classification of Diseases (ICD), the directed acyclic graph structure of the Gene Ontology (GO), or the spatial structure in genome-wide association studies. In the context of multiple testing, the resulting relationships among hypotheses can create redundancies among rejections that hinder interpretability. This leads to the practice of filtering rejection sets obtained from multiple testing procedures, which may in turn invalidate their inferential guarantees.
View Article and Find Full Text PDFRecent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches.
View Article and Find Full Text PDFThis paper develops a method based on model-X knockoffs to find conditional associations that are consistent across environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations replicated under different conditions may be more interesting.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
October 2021
We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing interpretable findings while controlling the false discovery rate. In contrast with standard approaches, our method can leverage sophisticated multivariate algorithms but makes no parametric assumptions about the unknown relation between genotypes and phenotype. Instead, we recognize that genotypes can be considered as a random sample from an appropriate model, encapsulating our knowledge of genetic inheritance and human populations.
View Article and Find Full Text PDFStructural variation in the complement 4 gene (C4) confers genetic risk for schizophrenia. The variation includes numbers of the increased C4A copy number, which predicts increased C4A mRNA expression. C4-anaphylatoxin (C4-ana) is a C4 protein fragment released upon C4 protein activation that has the potential to change the blood-brain barrier (BBB).
View Article and Find Full Text PDFWe introduce a multiple testing procedure that controls global error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses that are organized hierarchically in a tree structure. We describe a fast algorithm and prove that it controls relevant error rates given certain assumptions on the dependence between the -values.
View Article and Find Full Text PDFSystematic and extensive investigation of enzymes is needed to understand their extraordinary efficiency and meet current challenges in medicine and engineering. We present HT-MEK (High-Throughput Microfluidic Enzyme Kinetics), a microfluidic platform for high-throughput expression, purification, and characterization of more than 1500 enzyme variants per experiment. For 1036 mutants of the alkaline phosphatase PafA (phosphate-irrepressible alkaline phosphatase of Flavobacterium), we performed more than 670,000 reactions and determined more than 5000 kinetic and physical constants for multiple substrates and inhibitors.
View Article and Find Full Text PDFWe introduce a method to draw causal inferences-inferences immune to all possible confounding-from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a conditional independence test that identifies regions of the genome containing distinct causal variants.
View Article and Find Full Text PDFThe distal lung contains terminal bronchioles and alveoli that facilitate gas exchange and is affected by disorders including interstitial lung disease, cancer, and SARS-CoV-2-associated COVID-19 pneumonia. Investigations of these localized pathologies have been hindered by a lack of 3D in vitro human distal lung culture systems. Further, human distal lung stem cell identification has been impaired by quiescence, anatomic divergence from mouse and lack of lineage tracing and clonogenic culture.
View Article and Find Full Text PDFBipolar disorder is a highly heritable illness, associated with alterations of brain structure. As such, identification of genes influencing inter-individual differences in brain morphology may help elucidate the underlying pathophysiology of bipolar disorder (BP). To identify quantitative trait loci (QTL) that contribute to phenotypic variance of brain structure, structural neuroimages were acquired from family members (n = 527) of extended pedigrees heavily loaded for bipolar disorder ascertained from genetically isolated populations in Latin America.
View Article and Find Full Text PDFBackground: Severe mental illness diagnoses have overlapping symptomatology and shared genetic risk, motivating cross-diagnostic investigations of disease-relevant quantitative measures. We analysed relationships between neurocognitive performance, symptom domains, and diagnoses in a large sample of people with severe mental illness not ascertained for a specific diagnosis (cases), and people without mental illness (controls) from a single, homogeneous population.
Methods: In this case-control study, cases with severe mental illness were ascertained through electronic medical records at Clínica San Juan de Dios de Manizales (Manizales, Caldas, Colombia) and the Hospital Universitario San Vicente Fundación (Medellín, Antioquía, Colombia).
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
View Article and Find Full Text PDFIn the statistical analysis of genome-wide association data, it is challenging to precisely localize the variants that affect complex traits, due to linkage disequilibrium, and to maximize power while limiting spurious findings. Here we report on KnockoffZoom: a flexible method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures.
View Article and Find Full Text PDFCurrent evidence from case/control studies indicates that genetic risk for psychiatric disorders derives primarily from numerous common variants, each with a small phenotypic impact. The literature describing apparent segregation of bipolar disorder (BP) in numerous multigenerational pedigrees suggests that, in such families, large-effect inherited variants might play a greater role. To identify roles of rare and common variants on BP, we conducted genetic analyses in 26 Colombia and Costa Rica pedigrees ascertained for bipolar disorder 1 (BP1), the most severe and heritable form of BP.
View Article and Find Full Text PDFBackground: Disturbed sleep and activity are prominent features of bipolar disorder type I (BP-I). However, the relationship of sleep and activity characteristics to brain structure and behavior in euthymic BP-I patients and their non-BP-I relatives is unknown. Additionally, underlying genetic relationships between these traits have not been investigated.
View Article and Find Full Text PDFWe tackle the problem of selecting from among a large number of variables those that are "important" for an outcome. We consider situations where groups of variables are also of interest. For example, each variable might be a genetic polymorphism, and we might want to study how a trait depends on variability in genes, segments of DNA that typically contain multiple such polymorphisms.
View Article and Find Full Text PDF