Exponential increases in microbial and viral genomic data demand transformational advances in scalable, generalizable frameworks for their interpretation. Standard homology-based functional analyses are hindered by the rapid divergence of microbial and especially viral genomes and proteins that significantly decreases the volume of usable data. Here, we present Protein Set Transformer (PST), a protein-based genome language model that models genomes as sets of proteins without considering sparsely available functional labels.
View Article and Find Full Text PDFObjectives: Rapid evolution of SARS-CoV-2 has resulted in the emergence of numerous variants, posing significant challenges to public health surveillance. Clinical genome sequencing, while valuable, has limitations in capturing the full epidemiological dynamics of circulating variants in the general population. This study aimed to monitor the SARS-CoV-2 variant community dynamics and evolution using receptor-binding domain (RBD) amplicon sequencing of wastewater samples.
View Article and Find Full Text PDFOne snapshot of the peer review process for "Transcriptome data are insufficient to control false discoveries in regulatory network inference" (Kernfeld et al., 2024)..
View Article and Find Full Text PDF