Over the last four years, each successive wave of the COVID-19 pandemic has been caused by variants with mutations that improve the transmissibility of the virus. Despite this, we still lack tools for predicting clinically important features of the virus. In this study, we show that it is possible to predict the PCR cycle threshold (Ct) values from clinical detection assays using sequence data.
View Article and Find Full Text PDFAs genomic and related data continue to expand, research biologists are often hampered by the computational hurdles required to analyze their data. The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Centers (BRC) to assist researchers with their analysis of genome sequence and other omics-related data. Recently, the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD), and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs merged to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) at https://www.
View Article and Find Full Text PDFHaving the ability to predict the protein-encoding gene content of an incomplete genome or metagenome-assembled genome is important for a variety of bioinformatic tasks. In this study, as a proof of concept, we built machine learning classifiers for predicting variable gene content in genomes using only the nucleotide k-mers from a set of 100 conserved genes as features. Protein families were used to define orthologs and a single classifier was built for predicting the presence or absence of each protein family occurring in 10%-90% of all genomes.
View Article and Find Full Text PDFThere is mounting evidence of SARS-CoV-2 spillover from humans into many domestic, companion, and wild animal species. Research indicates that humans have infected white-tailed deer, and that deer-to-deer transmission has occurred, indicating that deer could be a wildlife reservoir and a source of novel SARS-CoV-2 variants. We examined the hypothesis that the Omicron variant is actively and asymptomatically infecting the free-ranging deer of New York City.
View Article and Find Full Text PDFPlasmids are important genetic elements that facilitate horizonal gene transfer between bacteria and contribute to the spread of virulence and antimicrobial resistance. Most bacterial genome sequences in the public archives exist in draft form with many contigs, making it difficult to determine if a contig is of chromosomal or plasmid origin. Using a training set of contigs comprising 10,584 chromosomes and 10,654 plasmids from the PATRIC database, we evaluated several machine learning models including random forest, logistic regression, XGBoost, and a neural network for their ability to classify chromosomal and plasmid sequences using nucleotide k-mers as features.
View Article and Find Full Text PDFWe seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of SARS-CoV-2 genomes. By pre-training on over 110 million prokaryotic gene sequences and fine-tuning a SARS-CoV-2-specific model on 1.
View Article and Find Full Text PDFHigh-throughput genome sequencing technologies enable the investigation of complex genetic interactions, including the horizontal gene transfer of plasmids and bacteriophages. However, identifying these elements from assembled reads remains challenging due to genome sequence plasticity and the difficulty in assembling complete sequences. In this study, we developed a classifier, using random forest, to identify whether sequences originated from bacterial chromosomes, plasmids, or bacteriophages.
View Article and Find Full Text PDFThe National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.
View Article and Find Full Text PDFPlasmids play a major role facilitating the spread of antimicrobial resistance between bacteria. Understanding the host range and dissemination trajectories of plasmids is critical for surveillance and prevention of antimicrobial resistance. Identification of plasmid host ranges could be improved using automated pattern detection methods compared to homology-based methods due to the diversity and genetic plasticity of plasmids.
View Article and Find Full Text PDFUnlabelled: White-tailed deer ( ) are highly susceptible to infection by SARS-CoV-2, with multiple reports of widespread spillover of virus from humans to free-living deer. While the recently emerged SARS-CoV-2 B.1.
View Article and Find Full Text PDFGenetic variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continue to dramatically alter the landscape of the coronavirus disease 2019 (COVID-19) pandemic. The recently described variant of concern designated Omicron (B.1.
View Article and Find Full Text PDFThe ARTIC Network provides a common resource of PCR primer sequences and recommendations for amplifying SARS-CoV-2 genomes. The initial tiling strategy was developed with the reference genome Wuhan-01, and subsequent iterations have addressed areas of low amplification and sequence drop out. Recently, a new version (V4) was released, based on new variant genome sequences, in response to the realization that some V3 primers were located in regions with key mutations.
View Article and Find Full Text PDFGenetic variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have repeatedly altered the course of the coronavirus disease 2019 (COVID-19) pandemic. Delta variants are now the focus of intense international attention because they are causing widespread COVID-19 globally and are associated with vaccine breakthrough cases. We sequenced 16,965 SARS-CoV-2 genomes from samples acquired March 15, 2021, through September 20, 2021, in the Houston Methodist hospital system.
View Article and Find Full Text PDFAntimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data.
View Article and Find Full Text PDFCertain genetic variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are of substantial concern because they may be more transmissible or detrimentally alter the pandemic course and disease features in individual patients. SARS-CoV-2 genome sequences from 12,476 patients in the Houston Methodist health care system diagnosed from January 1 through May 31, 2021 are reported here. Prevalence of the B.
View Article and Find Full Text PDFSince the beginning of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, there has been international concern about the emergence of virus variants with mutations that increase transmissibility, enhance escape from the human immune response, or otherwise alter biologically important phenotypes. In late 2020, several variants of concern emerged globally, including the UK variant (B.1.
View Article and Find Full Text PDFWe sequenced the genomes of 5,085 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains causing two coronavirus disease 2019 (COVID-19) disease waves in metropolitan Houston, TX, an ethnically diverse region with 7 million residents. The genomes were from viruses recovered in the earliest recognized phase of the pandemic in Houston and from viruses recovered in an ongoing massive second wave of infections. The virus was originally introduced into Houston many times independently.
View Article and Find Full Text PDFChronic airways infection with methicillin-resistant Staphylococcus aureus (MRSA) is associated with worse respiratory disease cystic fibrosis (CF) patients. Ceftaroline is a cephalosporin that inhibits the penicillin-binding protein (PBP2a) uniquely produced by MRSA. We analyzed 335 S.
View Article and Find Full Text PDFA growing number of studies are using machine learning models to accurately predict antimicrobial resistance (AMR) phenotypes from bacterial sequence data. Although these studies are showing promise, the models are typically trained using features derived from comprehensive sets of AMR genes or whole genome sequences and may not be suitable for use when genomes are incomplete. In this study, we explore the possibility of predicting AMR phenotypes using incomplete genome sequence data.
View Article and Find Full Text PDFWe sequenced the genomes of 5,085 SARS-CoV-2 strains causing two COVID-19 disease waves in metropolitan Houston, Texas, an ethnically diverse region with seven million residents. The genomes were from viruses recovered in the earliest recognized phase of the pandemic in Houston, and an ongoing massive second wave of infections. The virus was originally introduced into Houston many times independently.
View Article and Find Full Text PDFThe laboratory identification of antibacterial resistance is a cornerstone of infectious disease medicine. In vitro antimicrobial susceptibility testing has long been based on the growth response of organisms in pure culture to a defined concentration of antimicrobial agents. By comparing individual isolates to wild-type susceptibility patterns, strains with acquired resistance can be identified.
View Article and Find Full Text PDFVariation in the genome of , an important pathogen, can have dramatic impacts on the bacterium's ability to cause disease. We therefore asked whether it was possible to predict the virulence of isolates based on their genomic content. We applied a machine learning approach to a genetically and phenotypically diverse collection of 115 clinical isolates using genomic information and corresponding virulence phenotypes in a mouse model of bacteremia.
View Article and Find Full Text PDF