Publications by authors named "Zhengqiao Zhao"

Focal gene amplifications are among the most common cancer-associated mutations but have proven challenging to engineer in primary cells and model organisms. Here we describe a general strategy to engineer large (more than 1 Mbp) focal amplifications mediated by extrachromosomal DNAs (ecDNAs) in a spatiotemporally controlled manner in cells and in mice. By coupling ecDNA formation with expression of selectable markers, we track the dynamics of ecDNA-containing cells under physiological conditions and in the presence of specific selective pressures.

View Article and Find Full Text PDF

Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational framework to complement conventional lineage classification and applies it to predict the severe disease potential of viral genetic variation.

View Article and Find Full Text PDF
Article Synopsis
  • Evaluating metagenomic software is crucial for enhancing the interpretation of metagenomes, and the CAMI II challenge focused on this by using complex datasets from numerous genomes and plasmids.
  • The analysis of 5,002 results from 76 software versions showed significant advancements in assembly, especially with long-read data, although challenges remained with related strains and genome recovery.
  • Findings indicated that while taxon profilers improved, they struggled with viruses and Archaea, highlighting the need for better reproducibility in clinical pathogen detection and guiding researchers in method selection based on efficiency and performance metrics.
View Article and Find Full Text PDF

Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences.

View Article and Find Full Text PDF

Machine learning algorithms can learn mechanisms of antimicrobial resistance from the data of DNA sequence without any a priori information. Interpreting a trained machine learning algorithm can be exploited for validating the model and obtaining new information about resistance mechanisms. Different feature extraction methods, such as SNP calling and counting nucleotide -mers have been proposed for presenting DNA sequences to the model.

View Article and Find Full Text PDF

Background: It is a computational challenge for current metagenomic classifiers to keep up with the pace of training data generated from genome sequencing projects, such as the exponentially-growing NCBI RefSeq bacterial genome database. When new reference sequences are added to training data, statically trained classifiers must be rerun on all data, resulting in a highly inefficient process. The rich literature of "incremental learning" addresses the need to update an existing classifier to accommodate new data without sacrificing much accuracy compared to retraining the classifier with all data.

View Article and Find Full Text PDF

We propose an efficient framework for genetic subtyping of SARS-CoV-2, the novel coronavirus that causes the COVID-19 pandemic. Efficient viral subtyping enables visualization and modeling of the geographic distribution and temporal dynamics of disease spread. Subtyping thereby advances the development of effective containment strategies and, potentially, therapeutic and vaccine strategies.

View Article and Find Full Text PDF

Microbiome research has increased dramatically in recent years, driven by advances in technology and significant reductions in the cost of analysis. Such research has unlocked a wealth of data, which has yielded tremendous insight into the nature of the microbial communities, including their interactions and effects, both within a host and in an external environment as part of an ecological community. Understanding the role of microbiota, including their dynamic interactions with their hosts and other microbes, can enable the engineering of new diagnostic techniques and interventional strategies that can be used in a diverse spectrum of fields, spanning from ecology and agriculture to medicine and from forensics to exobiology.

View Article and Find Full Text PDF

Analysis of microbiome data involves identifying co-occurring groups of taxa associated with sample features of interest (e.g., disease state).

View Article and Find Full Text PDF

Advances in high-throughput sequencing have increased the availability of microbiome sequencing data that can be exploited to characterize microbiome community structure in situ. We explore using word and sentence embedding approaches for nucleotide sequences since they may be a suitable numerical representation for downstream machine learning applications (especially deep learning). This work involves first encoding ("embedding") each sequence into a dense, low-dimensional, numeric vector space.

View Article and Find Full Text PDF