Objective: Plasma metabolite profiling has uncovered several nonglycemic markers of incident type 2 diabetes (T2D). We investigated whether such biomarkers provide information about specific aspects of T2D etiology, such as impaired fasting glucose and impaired glucose tolerance, and whether their association with T2D risk varies by race.
Research Design And Methods: Untargeted plasma metabolite profiling was performed of participants in the FINRISK 2002 cohort (n = 7,564).
As the field of artificial intelligence evolves rapidly, these hallmarks are intended to capture fundamental, complementary concepts necessary for the progress and timely adoption of predictive modeling in precision oncology. Through these hallmarks, we hope to establish standards and guidelines that enable the symbiotic development of artificial intelligence and precision oncology.
View Article and Find Full Text PDFAdvancements in genomic and proteomic technologies have powered the creation of large gene and protein networks ("interactomes") for understanding biological systems. However, the proliferation of interactomes complicates the selection of networks for specific applications. Here, we present a comprehensive evaluation of 45 current human interactomes, encompassing protein-protein interactions as well as gene regulatory, signaling, colocalization, and genetic interaction networks.
View Article and Find Full Text PDFHepatitis B virus (HBV) infections promote liver cancer initiation by inducing inflammation and cellular stress. Despite the primarily indirect effect on oncogenesis, HBV is associated with a recurrent genomic phenotype in HCC, suggesting that it impacts the biology of established HCC. Characterization of the interaction of HBV with host proteins and the mechanistic contributions of HBV to HCC initiation and maintenance could provide insights into HCC biology and uncover therapeutic vulnerabilities.
View Article and Find Full Text PDFGene set enrichment is a mainstay of functional genomics, but it relies on gene function databases that are incomplete. Here we evaluate five large language models (LLMs) for their ability to discover the common functions represented by a gene set, supported by molecular rationale and a self-confidence assessment. For curated gene sets from Gene Ontology, GPT-4 suggests functions similar to the curated name in 73% of cases, with higher self-confidence predicting higher similarity.
View Article and Find Full Text PDFTowards comprehensively investigating the genotype-phenotype relationships governing the human pluripotent stem cell state, we generated an expressed genome-scale CRISPRi Perturbation Cell Atlas in KOLF2.1J human induced pluripotent stem cells (hiPSCs) mapping transcriptional and fitness phenotypes associated with 11,739 targeted genes. Using the transcriptional phenotypes, we created a minimum distortion embedding map of the pluripotent state, demonstrating rich recapitulation of protein complexes, such as strong co-clustering of MRPL, BAF, SAGA, and Ragulator family members.
View Article and Find Full Text PDFCancers are driven by alterations in diverse genes, creating dependencies that can be therapeutically targeted. However, many genetic dependencies have proven inconsistent across tumors. Here we describe SCHEMATIC, a strategy to identify a core network of highly penetrant, actionable genetic interactions.
View Article and Find Full Text PDFProteins exhibit cell-type-specific functions and interactions, yet most ways of representing proteins lack any biological or environmental context. To address this gap, recent work by Li et al. introduces PINNACLE, a geometric deep learning approach that generates contextualized representations of proteins by combined analysis of protein interactions and multiorgan single-cell transcriptomics.
View Article and Find Full Text PDFBioinformatics
September 2024
The data deluge in biology calls for computational approaches that can integrate multiple datasets of different types to build a holistic view of biological processes or structures of interest. An emerging paradigm in this domain is the unsupervised learning of data embeddings that can be used for downstream clustering and classification tasks. While such approaches for integrating data of similar types are becoming common, there is scarcer work on consolidating different data modalities such as network and image information.
View Article and Find Full Text PDFReactive changes of glial cells during neuroinflammation impact brain disorders and disease progression. Elucidating the mechanisms that control reactive gliosis may help us to understand brain pathophysiology and improve outcomes. Here, we report that adult ablation of autism spectrum disorder (ASD)-associated CHD8 in astrocytes attenuates reactive gliosis via remodeling chromatin accessibility, changing gene expression.
View Article and Find Full Text PDFAlcohol Clin Exp Res (Hoboken)
September 2024
Background: Genome-wide association studies (GWAS) have identified hundreds of common variants associated with alcohol consumption. In contrast, genetic studies of alcohol consumption that use rare variants are still in their early stages. No prior studies of alcohol consumption have examined whether common and rare variants implicate the same genes and molecular networks, leaving open the possibility that the two approaches might identify distinct biology.
View Article and Find Full Text PDFDefining the subset of cellular factors governing SARS-CoV-2 replication can provide critical insights into viral pathogenesis and identify targets for host-directed antiviral therapies. While a number of genetic screens have previously reported SARS-CoV-2 host dependency factors, these approaches relied on utilizing pooled genome-scale CRISPR libraries, which are biased towards the discovery of host proteins impacting early stages of viral replication. To identify host factors involved throughout the SARS-CoV-2 infectious cycle, we conducted an arrayed genome-scale siRNA screen.
View Article and Find Full Text PDFSingle-gene missense mutations remain challenging to interpret. Here, we deploy scalable functional screening by sequencing (SEUSS), a Perturb-seq method, to generate mutations at protein interfaces of RUNX1 and quantify their effect on activities of downstream cellular programs. We evaluate single-cell RNA profiles of 115 mutations in myelogenous leukemia cells and categorize them into three functionally distinct groups, wild-type (WT)-like, loss-of-function (LoF)-like, and hypomorphic, that we validate in orthogonal assays.
View Article and Find Full Text PDFMotivation: Predicting cancer drug response requires a comprehensive assessment of many mutations present across a tumor genome. While current drug response models generally use a binary mutated/unmutated indicator for each gene, not all mutations in a gene are equivalent.
Results: Here, we construct and evaluate a series of predictive models based on leading methods for quantitative mutation scoring.
In vitro evolution and whole genome analysis has proven to be a powerful method for studying the mechanism of action of small molecules in many haploid microbes but has generally not been applied to human cell lines in part because their diploid state complicates the identification of variants that confer drug resistance. To determine if haploid human cells could be used in MOA studies, we evolved resistance to five different anticancer drugs (doxorubicin, gemcitabine, etoposide, topotecan, and paclitaxel) using a near-haploid cell line (HAP1) and then analyzed the genomes of the drug resistant clones, developing a bioinformatic pipeline that involved filtering for high frequency alleles predicted to change protein sequence, or alleles which appeared in the same gene for multiple independent selections with the same compound. Applying the filter to sequences from 28 drug resistant clones identified a set of 21 genes which was strongly enriched for known resistance genes or known drug targets (TOP1, TOP2A, DCK, WDR33, SLCO3A1).
View Article and Find Full Text PDFThis article describes the Cell Maps for Artificial Intelligence (CM4AI) project and its goals, methods, standards, current datasets, software tools , status, and future directions. CM4AI is the in the U.S.
View Article and Find Full Text PDFAnnu Rev Biomed Data Sci
August 2024
While the primary sequences of human proteins have been cataloged for over a decade, determining how these are organized into a dynamic collection of multiprotein assemblies, with structures and functions spanning biological scales, is an ongoing venture. Systematic and data-driven analyses of these higher-order structures are emerging, facilitating the discovery and understanding of cellular phenotypes. At present, knowledge of protein localization and function has been primarily derived from manual annotation and curation in resources such as the Gene Ontology, which are biased toward richly annotated genes in the literature.
View Article and Find Full Text PDFAdvancements in genomic and proteomic technologies have powered the use of gene and protein networks ("interactomes") for understanding genotype-phenotype translation. However, the proliferation of interactomes complicates the selection of networks for specific applications. Here, we present a comprehensive evaluation of 46 current human interactomes, encompassing protein-protein interactions as well as gene regulatory, signaling, colocalization, and genetic interaction networks.
View Article and Find Full Text PDFPolypharmacology drugs-compounds that inhibit multiple proteins-have many applications but are difficult to design. To address this challenge we have developed POLYGON, an approach to polypharmacology based on generative reinforcement learning. POLYGON embeds chemical space and iteratively samples it to generate new molecular structures; these are rewarded by the predicted ability to inhibit each of two protein targets and by drug-likeness and ease-of-synthesis.
View Article and Find Full Text PDFGenome-wide association studies (GWAS) have identified hundreds of common variants associated with alcohol consumption. In contrast, rare variants have only begun to be studied for their role in alcohol consumption. No studies have examined whether common and rare variants implicate the same genes and molecular networks.
View Article and Find Full Text PDFCyclin-dependent kinase 4 and 6 inhibitors (CDK4/6is) have revolutionized breast cancer therapy. However, <50% of patients have an objective response, and nearly all patients develop resistance during therapy. To elucidate the underlying mechanisms, we constructed an interpretable deep learning model of the response to palbociclib, a CDK4/6i, based on a reference map of multiprotein assemblies in cancer.
View Article and Find Full Text PDFUnlabelled: Rapid proliferation is a hallmark of cancer associated with sensitivity to therapeutics that cause DNA replication stress (RS). Many tumors exhibit drug resistance, however, via molecular pathways that are incompletely understood. Here, we develop an ensemble of predictive models that elucidate how cancer mutations impact the response to common RS-inducing (RSi) agents.
View Article and Find Full Text PDFThe data-intensive fields of genomics and machine learning (ML) are in an early stage of convergence. Genomics researchers increasingly seek to harness the power of ML methods to extract knowledge from their data; conversely, ML scientists recognize that genomics offers a wealth of large, complex, and well-annotated datasets that can be used as a substrate for developing biologically relevant algorithms and applications. The National Human Genome Research Institute (NHGRI) inquired with researchers working in these two fields to identify common challenges and receive recommendations to better support genomic research efforts using ML approaches.
View Article and Find Full Text PDF