The global efforts to control COVID-19 are threatened by the rapid emergence of novel SARS-CoV-2 variants that may display undesirable characteristics such as immune escape, increased transmissibility or pathogenicity. Early prediction for emergence of new strains with these features is critical for pandemic preparedness. We present , a supervised and causally predictive model using unsupervised latent space features of SARS-CoV-2 genome sequences. was trained and validated on 0.9 million sequences for the period December, 2019 to June, 2021 and the frozen model was prospectively validated from July, 2021 to December, 2021. captured the rise in cases 2 months ahead of the Delta and Omicron surges in most countries including the prediction of a surge in India as early as beginning of November, 2021. Entropy analysis of unsupervised embeddings clearly reveals the explore-exploit cycles in genomic feature-space, thus adding interpretability to the deep learning based model. We also conducted codon-level analysis of our model for interpretability and biological validity of our unsupervised features. application is openly available as an interactive web-application for prospective genomic surveillance of COVID-19 across the globe.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9024110PMC
http://dx.doi.org/10.3389/fgene.2022.858252DOI Listing

Publication Analysis

Top Keywords

genomic surveillance
8
surveillance covid-19
8
covid-19 variants
4
variants language
4
language models
4
models machine
4
machine learning
4
learning global
4
global efforts
4
efforts control
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!