Bayesian network modeling (BN modeling, or BNM) is an interpretable machine learning method for constructing probabilistic graphical models from the data. In recent years, it has been extensively applied to diverse types of biomedical datasets. Concurrently, our ability to perform long-timescale molecular dynamics (MD) simulations on proteins and other materials has increased exponentially.
View Article and Find Full Text PDFWhile there are currently over 40 replicated genes with mapped risk alleles for Late Onset Alzheimer's disease (LOAD), the Apolipoprotein E locus E4 haplotype is still the biggest driver of risk, with odds ratios for neuropathologically confirmed E44 carriers exceeding 30 (95% confidence interval 16.59-58.75).
View Article and Find Full Text PDFCooperative interactions in protein-protein interfaces demonstrate the interdependency or the linked network-like behavior and their effect on the coupling of proteins. Cooperative interactions also could cause ripple or allosteric effects at a distance in protein-protein interfaces. Although they are critically important in protein-protein interfaces, it is challenging to determine which amino acid pair interactions are cooperative.
View Article and Find Full Text PDFEnhancers are fundamental to gene regulation. Post-translational modifications by the small ubiquitin-like modifiers (SUMO) modify chromatin regulation enzymes, including histone acetylases and deacetylases. However, it remains unclear whether SUMOylation regulates enhancer marks, acetylation at the 27th lysine residue of the histone H3 protein (H3K27Ac).
View Article and Find Full Text PDFWhile there are currently over 40 replicated genes with mapped risk alleles for Late Onset Alzheimer's disease (LOAD), the E locus E4 haplotype is still the biggest driver of risk, with odds ratios for neuropathologically confirmed E44 carriers exceeding 30 (95% confidence interval 16.59-58.75).
View Article and Find Full Text PDFCancers (Basel)
December 2023
Next-generation cancer and oncology research needs to take full advantage of the multimodal structured, or graph, information, with the graph data types ranging from molecular structures to spatially resolved imaging and digital pathology, biological networks, and knowledge graphs. Graph Neural Networks (GNNs) efficiently combine the graph structure representations with the high predictive performance of deep learning, especially on large multimodal datasets. In this review article, we survey the landscape of recent (2020-present) GNN applications in the context of cancer and oncology research, and delineate six currently predominant research areas.
View Article and Find Full Text PDFCooperative interactions in protein-protein interfaces demonstrate the interdependency or the linked network-like behavior of interface interactions and their effect on the coupling of proteins. Cooperative interactions also could cause ripple or allosteric effects at a distance in protein-protein interfaces. Although they are critically important in protein-protein interfaces it is challenging to determine which amino acid pair interactions are cooperative.
View Article and Find Full Text PDFModern artificial neural networks (ANNs) have long been designed on foundations of mathematics as opposed to their original foundations of biomimicry. However, the structure and function of these modern ANNs are often analogous to real-life biological networks. We propose that the ubiquitous information-theoretic principles underlying the development of ANNs are similar to the principles guiding the macro-evolution of biological networks and that insights gained from one field can be applied to the other.
View Article and Find Full Text PDFBayesian Network (BN) modeling is a prominent and increasingly popular computational systems biology method. It aims to construct network graphs from the large heterogeneous biological datasets that reflect the underlying biological relationships. Currently, a variety of strategies exist for evaluating BN methodology performance, ranging from utilizing artificial benchmark datasets and models, to specialized biological benchmark datasets, to simulation studies that generate synthetic data from predefined network models.
View Article and Find Full Text PDFCancer immunotherapy, specifically immune checkpoint blockade, has been found to be effective in the treatment of metastatic cancers. However, only a subset of patients achieve clinical responses. Elucidating pretreatment biomarkers predictive of sustained clinical response is a major research priority.
View Article and Find Full Text PDFWe propose a novel two-stage analysis strategy to discover candidate genes associated with the particular cancer outcomes in large multimodal genomic cancers databases, such as The Cancer Genome Atlas (TCGA). During the first stage, we use mixed mutual information to perform variable selection; during the second stage, we use scalable Bayesian network (BN) modeling to identify candidate genes and their interactions. Two crucial features of the proposed approach are (i) the ability to handle mixed data types (continuous and discrete, genomic, epigenomic, etc.
View Article and Find Full Text PDFThe challenges in recapitulating in vivo human T cell development in laboratory models have posed a barrier to understanding human thymopoiesis. Here, we used single-cell RNA sequencing (sRNA-seq) to interrogate the rare CD34 progenitor and the more differentiated CD34 fractions in the human postnatal thymus. CD34 thymic progenitors were comprised of a spectrum of specification and commitment states characterized by multilineage priming followed by gradual T cell commitment.
View Article and Find Full Text PDFRecent developments in sequencing and growth of bioinformatics resources provide us with vast depositories of protein network and single nucleotide polymorphism data. It allows us to re-examine, on a larger and more comprehensive scale, the relationship between protein-protein interactions and protein variability and evolutionary rates. This relationship has remained far from unambiguously resolved for quite a long time, reflecting shifting analysis approaches in the literature, and growing data availability.
View Article and Find Full Text PDFThe identity/recognition of tRNAs, in the context of aminoacyl tRNA synthetases (and other molecules), is a complex phenomenon that has major implications ranging from the origins and evolution of translation machinery and genetic code to the evolution and speciation of tRNAs themselves to human mitochondrial diseases to artificial genetic code engineering. Deciphering it via laboratory experiments, however, is difficult and necessarily time- and resource-consuming. In this study, we propose a mathematically rigorous two-pronged in silico approach to identifying and classifying tRNA positions important for tRNA identity/recognition, rooted in machine learning and information-theoretic methodology.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
November 2017
Long-range intrachromosomal interactions play an important role in 3D chromosome structure and function, but our understanding of how various factors contribute to the strength of these interactions remains poor. In this study we used a recently developed analysis framework for Bayesian network (BN) modeling to analyze publicly available datasets for intrachromosomal interactions. We investigated how 106 variables affect the pairwise interactions of over 10 million 5-kb DNA segments in the B-lymphocyte cell line GB12878.
View Article and Find Full Text PDFBayesian network (BN) reconstruction is a prototypical systems biology data analysis approach that has been successfully used to reverse engineer and model networks reflecting different layers of biological organization (ranging from genetic to epigenetic to cellular pathway to metabolomic). It is especially relevant in the context of modern (ongoing and prospective) studies that generate heterogeneous high-throughput omics datasets. However, there are both theoretical and practical obstacles to the seamless application of BN modeling to such big data, including computational inefficiency of optimal BN structure search algorithms, ambiguity in data discretization, mixing data types, imputation and validation, and, in general, limited scalability in both reconstruction and visualization of BNs.
View Article and Find Full Text PDFData on biological mechanisms of aging are mostly obtained from cross-sectional study designs. An inherent disadvantage of this design is that inter-individual differences can mask small but biologically significant age-dependent changes. A serially sampled design (same individual at different time points) would overcome this problem but is often limited by the relatively small numbers of available paired samples and the statistics being used.
View Article and Find Full Text PDFPharmacogenetics aims to elucidate the genetic factors underlying the individual's response to pharmacotherapy. Coupled with the recent (and ongoing) progress in high-throughput genotyping, sequencing and other genomic technologies, pharmacogenetics is rapidly transforming into pharmacogenomics, while pursuing the primary goals of identifying and studying the genetic contribution to drug therapy response and adverse effects, and existing drug characterization and new drug discovery. Accomplishment of both of these goals hinges on gaining a better understanding of the underlying biological systems; however, reverse-engineering biological system models from the massive datasets generated by the large-scale genetic epidemiology studies presents a formidable data analysis challenge.
View Article and Find Full Text PDF