Front Comput Neurosci
September 2024
Late-onset Alzheimer disease (AD) is a highly complex disease with multiple subtypes, as demonstrated by its disparate risk factors, pathological manifestations, and clinical traits. Discovery of biomarkers to diagnose specific AD subtypes is a key step towards understanding biological mechanisms underlying this enigmatic disease, generating candidate drug targets, and selecting participants for drug trials. Popular statistical methods for evaluating candidate biomarkers, fold change (FC) and area under the receiver operating characteristic curve (AUC), were designed for homogeneous data and we demonstrate the inherent weaknesses of these approaches when used to evaluate subtypes representing less than half of the diseased cases.
View Article and Find Full Text PDFIdentification of proteins dysregulated by COVID-19 infection is critically important for better understanding of its pathophysiology, building prognostic models, and identifying new targets. Plasma proteomic profiling of 4,301 proteins was performed in two independent datasets and tested for the association for three COVID-19 outcomes (infection, ventilation, and death). We identified 1,449 proteins consistently associated in both datasets with any of these three outcomes.
View Article and Find Full Text PDFThe heritability of autism spectrum disorder (ASD), based on 680,000 families and five countries, is estimated to be nearly 80%, yet heritability reported from SNP-based studies are consistently lower, and few significant loci have been identified with genome-wide association studies. This gap in genomic information may reside in rare variants, interaction among variants (epistasis), or cryptic structural variation (SV) and may provide mechanisms that underlie ASD. Here we use a method to identify potential SVs based on non-Mendelian inheritance patterns in pedigrees using parent-child genotypes from ASD families and demonstrate that they are enriched in ASD-risk genes.
View Article and Find Full Text PDFIdentification of the plasma proteomic changes of Coronavirus disease 2019 (COVID-19) is essential to understanding the pathophysiology of the disease and developing predictive models and novel therapeutics. We performed plasma deep proteomic profiling from 332 COVID-19 patients and 150 controls and pursued replication in an independent cohort (297 cases and 76 controls) to find potential biomarkers and causal proteins for three COVID-19 outcomes (infection, ventilation, and death). We identified and replicated 1,449 proteins associated with any of the three outcomes (841 for infection, 833 for ventilation, and 253 for death) that can be query on a web portal ( https://covid.
View Article and Find Full Text PDFNetwork modeling transforms data into a structure of nodes and edges such that edges represent relationships between pairs of objects, then extracts clusters of densely connected nodes in order to capture high-dimensional relationships hidden in the data. This efficient and flexible strategy holds potential for unveiling complex patterns concealed within massive datasets, but standard implementations overlook several key issues that can undermine research efforts. These issues range from data imputation and discretization to correlation metrics, clustering methods, and validation of results.
View Article and Find Full Text PDFThe conundrums of choosing candidate genes, via differential expression between treated and mock specimens, are tackled by Ghandikota et al. in this issue of in their efforts to tease out genetic patterns that are characteristic of coronavirus disease 2019 (COVID-19) outcomes.
View Article and Find Full Text PDFWe demonstrate a selection of network and machine learning techniques useful in the analysis of complex datasets, including 2-way similarity networks, Markov clustering, enrichment statistical networks, FCROS differential analysis, and random forests. We demonstrate each of these techniques on the Populus trichocarpa gene expression atlas.
View Article and Find Full Text PDFThe well-documented latitudinal clines of genes affecting human skin color presumably arise from the need for protection from intense ultraviolet radiation (UVR) vs. the need to use UVR for vitamin D synthesis. Sampling 751 subjects from a broad range of latitudes and skin colors, we investigated possible multilocus correlated adaptation of skin color genes with the vitamin D receptor gene (VDR), using a vector correlation metric and network method called BlocBuster.
View Article and Find Full Text PDFThe substantial progress in the last few years toward uncovering genetic causes and risk factors for autism spectrum disorders (ASDs) has opened new experimental avenues for identifying the underlying neurobiological mechanism of the condition. The bounty of genetic findings has led to a variety of data-driven exploratory analyses aimed at deriving new insights about the shared features of these genes. These approaches leverage data from a variety of different sources such as co-expression in transcriptomic studies, protein-protein interaction networks, gene ontologies (GOs) annotations, or multi-level combinations of all of these.
View Article and Find Full Text PDFGephyrin is a highly conserved gene that is vital for the organization of proteins at inhibitory receptors, molybdenum cofactor biosynthesis and other diverse functions. Its specific function is intricately regulated and its aberrant activities have been observed for a number of human diseases. Here we report a remarkable yin-yang haplotype pattern encompassing gephyrin.
View Article and Find Full Text PDFHundreds of genetic markers have shown associations with various complex diseases, yet the "missing heritability" remains alarmingly elusive. Combinatorial interactions may account for a substantial portion of this missing heritability, but their discoveries have been impeded by computational complexity and genetic heterogeneity. We present BlocBuster, a novel systems-level approach that efficiently constructs genome-wide, allele-specific networks that accurately segregate homogenous combinations of genetic factors, tests the associations of these combinations with the given phenotype, and rigorously validates the results using a series of unbiased validation methods.
View Article and Find Full Text PDFComplex diseases are often associated with sets of multiple interacting genetic factors and possibly with unique sets of the genetic factors in different groups of individuals (genetic heterogeneity). We introduce a novel concept of custom correlation coefficient (CCC) between single nucleotide polymorphisms (SNPs) that address genetic heterogeneity by measuring subset correlations autonomously. It is used to develop a 3-step process to identify candidate multi-SNP patterns: (1) pairwise (SNP-SNP) correlations are computed using CCC; (2) clusters of so-correlated SNPs identified; and (3) frequencies of these clusters in disease cases and controls compared to identify disease-associated multi-SNP patterns.
View Article and Find Full Text PDFMotivation: Inference of haplotypes from genotype data is crucial and challenging for many vitally important studies. The first, and most critical step, is the ascertainment of a biologically sound model to be optimized. Many models that have been proposed rely partially or entirely on reducing the number of unique haplotypes in the solution.
View Article and Find Full Text PDF