Machine learning models that embed graphs in non-Euclidean spaces have shown substantial benefits in a variety of contexts, but their application has not been studied extensively in the biological domain, particularly with respect to biological pathway graphs. Such graphs exhibit a variety of complex network structures, presenting challenges to existing embedding approaches. Learning high-quality embeddings for biological pathway graphs is important for researchers looking to understand the underpinnings of disease and train high-quality predictive models on these networks.
View Article and Find Full Text PDFExponential increases in microbial and viral genomic data demand transformational advances in scalable, generalizable frameworks for their interpretation. Standard homology-based functional analyses are hindered by the rapid divergence of microbial and especially viral genomes and proteins that significantly decreases the volume of usable data. Here, we present Protein Set Transformer (PST), a protein-based genome language model that models genomes as sets of proteins without considering sparsely available functional labels.
View Article and Find Full Text PDFObjectives: Rapid evolution of SARS-CoV-2 has resulted in the emergence of numerous variants, posing significant challenges to public health surveillance. Clinical genome sequencing, while valuable, has limitations in capturing the full epidemiological dynamics of circulating variants in the general population. This study aimed to monitor the SARS-CoV-2 variant community dynamics and evolution using receptor-binding domain (RBD) amplicon sequencing of wastewater samples.
View Article and Find Full Text PDFOne snapshot of the peer review process for "Transcriptome data are insufficient to control false discoveries in regulatory network inference" (Kernfeld et al., 2024)..
View Article and Find Full Text PDFExponential increases in microbial and viral genomic data demand transformational advances in scalable, generalizable frameworks for their interpretation. Standard homology-based functional analyses are hindered by the rapid divergence of microbial and especially viral genomes and proteins that significantly decreases the volume of usable data. Here, we present Protein Set Transformer (PST), a protein-based genome language model that models genomes as sets of proteins without considering sparsely available functional labels.
View Article and Find Full Text PDFFully capturing cellular state requires examining genomic, epigenomic, transcriptomic, proteomic, and other assays for a biological sample and comprehensive computational modeling to reason with the complex and sometimes conflicting measurements. Modeling these so-called multi-omic data is especially beneficial in disease analysis, where observations across omic data types may reveal unexpected patient groupings and inform clinical outcomes and treatments. We present Multi-omic Pathway Analysis of Cancer (MPAC), a computational framework that interprets multi-omic data through prior knowledge from biological pathways.
View Article and Find Full Text PDFProtein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure, and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose Mutational Effect Transfer Learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling.
View Article and Find Full Text PDFEye infections from bacterial contamination of bulk-refillable liquid soap dispensers and artificial tear eye drops continue to occur, resulting in adverse health outcomes that include impaired vision or eye enucleation. (), a common cause of eye infections, can grow in eye drop containers and refillable soap dispensers to high numbers. To assess the risk of eye infection, a quantitative microbial risk assessment for was conducted to predict the probability of an eye infection for two potential exposure scenarios: (i) individuals using bacteria-contaminated eye drops and (ii) contact lens wearers washing their hands with bacteria-contaminated liquid soap prior to placing the lens.
View Article and Find Full Text PDFBackground: Tracking infectious diseases at the community level is challenging due to asymptomatic infections and the logistical complexities of mass surveillance. Wastewater surveillance has emerged as a valuable tool for monitoring infectious disease agents including SARS-CoV-2 and Mpox virus. However, detecting the Mpox virus in wastewater is particularly challenging due to its relatively low prevalence in the community.
View Article and Find Full Text PDFQuantitative microbial risk assessment (QMRA) can be used to evaluate health risks associated with recreational beach use. This study developed a site-specific risk assessment using a novel approach that combined quantitative PCR-based measurement of microbial source tracking (MST) genetic markers (human, dog, and gull fecal bacteria) with a QMRA analysis of potential pathogen risk. Water samples ( = 24) from two recreational beaches were collected and analyzed for MST markers as part of a broader Beach Exposure And Child Health Study that examined child behavior interactions with the beach environment.
View Article and Find Full Text PDFWastewater is a discarded human by-product, but its analysis may help us understand the health of populations. Epidemiologists first analyzed wastewater to track outbreaks of poliovirus decades ago, but so-called wastewater-based epidemiology was reinvigorated to monitor SARS-CoV-2 levels while bypassing the difficulties and pit falls of individual testing. Current approaches overlook the activity of most human viruses and preclude a deeper understanding of human virome community dynamics.
View Article and Find Full Text PDFInt J Environ Res Public Health
September 2023
Aquifer storage and recovery (ASR) can augment water supplies and hydrologic flows under varying climatic conditions. However, imposing drinking water regulations on ASR practices, including pre-treatment before injection into the aquifer, remains arguable. Microbial inactivation data-, , poliovirus type 1 and -were used in a human health risk assessment to identify how the storage time of recharged water in the Floridan Aquifer enhances pathogen inactivation, thereby mitigating the human health risks associated with ingestion.
View Article and Find Full Text PDFTraditional small-molecule drug discovery is a time-consuming and costly endeavor. High-throughput chemical screening can only assess a tiny fraction of drug-like chemical space. The strong predictive power of modern machine-learning methods for virtual chemical screening enables training models on known active and inactive compounds and extrapolating to much larger chemical libraries.
View Article and Find Full Text PDFWastewater surveillance has proved to be a valuable tool to track the COVID-19 pandemic. However, most studies using wastewater surveillance data revolve around establishing correlations and lead time relative to reported case data. In this perspective, we advocate for the integration of wastewater surveillance data with dynamic within-host and between-host models to better understand, monitor, and predict viral disease outbreaks.
View Article and Find Full Text PDFHIV-1 spreads efficiently through direct cell-to-cell transmission at virological synapses (VSs) formed by interactions between HIV-1 envelope proteins (Env) on the surface of infected cells and CD4 receptors on uninfected target cells. Env-CD4 interactions bring the infected and uninfected cellular membranes into close proximity and induce transport of viral and cellular factors to the VS for efficient virion assembly and HIV-1 transmission. Using novel, cell-specific stable isotope labeling and quantitative mass spectrometric proteomics, we identified extensive changes in the levels and phosphorylation states of proteins in HIV-1 infected producer cells upon mixing with CD4+ target cells under conditions inducing VS formation.
View Article and Find Full Text PDFWastewater surveillance has been widely used to track and estimate SARS-CoV-2 incidence. While both infectious and recovered individuals shed virus into wastewater, epidemiological inferences using wastewater often only consider the viral contribution from the former group. Yet, the persistent shedding in the latter group could confound wastewater-based epidemiological inference, especially during the late stage of an outbreak when the recovered population outnumbers the infectious population.
View Article and Find Full Text PDFIn the 21st century, several emergent viruses have posed a global threat. Each pathogen has emphasized the value of rapid and scalable vaccine development programs. The ongoing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has made the importance of such efforts especially clear.
View Article and Find Full Text PDFOver the past 150 years, vaccines have revolutionized the relationship between people and disease. During the COVID-19 pandemic, technologies such as mRNA vaccines have received attention due to their novelty and successes. However, more traditional vaccine development platforms have also yielded important tools in the worldwide fight against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
View Article and Find Full Text PDFInt J Environ Health Res
January 2024
The border city of El Paso, Texas, and its water utility, El Paso Water, initiated a SARS-CoV-2 wastewater monitoring program to assess virus trends and the appropriateness of a wastewater monitoring program for the community. Nearly weekly sample collection at four wastewater treatment facilities (WWTFs), serving distinct regions of the city, was analyzed for SARS-CoV-2 genes using the CDC 2019-Novel coronavirus Real-Time RT-PCR diagnostic panel. Virus concentrations ranged from 86.
View Article and Find Full Text PDFViruses must balance their reliance on host cell machinery for replication while avoiding host defense. Influenza A viruses are zoonotic agents that frequently switch hosts, causing localized outbreaks with the potential for larger pandemics. The host range of influenza virus is limited by the need for successful interactions between the virus and cellular partners.
View Article and Find Full Text PDFPac Symp Biocomput
December 2022
Protein subcellular localization is an important factor in normal cellular processes and disease. While many protein localization resources treat it as static, protein localization is dynamic and heavily influenced by biological context. Biological pathways are graphs that represent a specific biological context and can be inferred from large-scale data.
View Article and Find Full Text PDFRapid and accurate diagnosis of infections is fundamental to containment of disease. Several monkeypox virus (MPV) real-time diagnostic assays have been recommended by the CDC; however, the specificity of the primers and probes in these assays for the ongoing MPV outbreak has not been investigated. We analyzed the primer and probe sequences present in the CDC recommended MPV generic real-time PCR assay by aligning those sequences against 1730 MPV complete genomes reported in 2022 worldwide.
View Article and Find Full Text PDF