Publications by Peter Koo | LitMetric

Publications by authors named "Peter Koo"

Page 1 of 3

Uncertainty-aware genomic deep learning with knowledge distillation.

Jessica Zhou Kaeli Rizzo Ziqi Tang Peter K Koo

bioRxiv

November 2024

Deep neural networks (DNNs) have advanced predictive modeling for regulatory genomics, but challenges remain in ensuring the reliability of their predictions and understanding the key factors behind their decision making. Here we introduce DEGU (Distilling Ensembles for Genomic Uncertainty-aware models), a method that integrates ensemble learning and knowledge distillation to improve the robustness and explainability of DNN predictions. DEGU distills the predictions of an ensemble of DNNs into a single model, capturing both the average of the ensemble's predictions and the variability across them, with the latter representing epistemic (or model-based) uncertainty.

View Article and Find Full Text PDF

Analysis of single-cell CRISPR perturbations indicates that enhancers predominantly act multiplicatively.

Jessica L Zhou Karthik Guruvayurappan Shushan Toneyan Hsiuyi V Chen Aaron R Chen Peter Koo

Cell Genom

November 2024

A single gene may have multiple enhancers, but how they work in concert to regulate transcription is poorly understood. To analyze enhancer interactions throughout the genome, we developed a generalized linear modeling framework, GLiMMIRS, for interrogating enhancer effects from single-cell CRISPR experiments. We applied GLiMMIRS to a published dataset and tested for interactions between 46,166 enhancer pairs and corresponding genes, including 264 "high-confidence" enhancer pairs.

View Article and Find Full Text PDF

Interpreting cis-regulatory interactions from large-scale deep neural networks.

Shushan Toneyan Peter K Koo

Nat Genet

November 2024

The rise of large-scale, sequence-based deep neural networks (DNNs) for predicting gene expression has introduced challenges in their evaluation and interpretation. Current evaluations align DNN predictions with orthogonal experimental data, providing insights into generalization but offering limited insights into their decision-making process. Existing model explainability tools focus mainly on motif analysis, which becomes complex when interpreting longer sequences.

View Article and Find Full Text PDF

Explainable AI for computational pathology identifies model limitations and tissue biomarkers.

Jakub R Kaczmarzyk Joel H Saltz Peter K Koo

ArXiv

November 2024

Introduction: Deep learning models hold great promise for digital pathology, but their opaque decision-making processes undermine trust and hinder clinical adoption. Explainable AI methods are essential to enhance model transparency and reliability.

Methods: We developed HIPPO, an explainable AI framework that systematically modifies tissue regions in whole slide images to generate image counterfactuals, enabling quantitative hypothesis testing, bias detection, and model evaluation beyond traditional performance metrics.

View Article and Find Full Text PDF

Massive experimental quantification of amyloid nucleation allows interpretable deep learning of protein aggregation.

Mike Thompson Mariano Martín Trinidad Sanmartín Olmo Chandana Rajesh Peter K Koo

bioRxiv

October 2024

Protein aggregation is a pathological hallmark of more than fifty human diseases and a major problem for biotechnology. Methods have been proposed to predict aggregation from sequence, but these have been trained and evaluated on small and biased experimental datasets. Here we directly address this data shortage by experimentally quantifying the amyloid nucleation of >100,000 protein sequences.

View Article and Find Full Text PDF

Evaluating the representational power of pre-trained DNA language models for regulatory genomics.

Ziqi Tang Nirali Somia Yiyang Yu Peter K Koo

bioRxiv

September 2024

The emergence of genomic language models (gLMs) offers an unsupervised approach to learning a wide diversity of -regulatory patterns in the non-coding genome without requiring labels of functional activity generated by wet-lab experiments. Previous evaluations have shown that pre-trained gLMs can be leveraged to improve predictive performance across a broad range of regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since the gLMs in these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody a foundational understanding of -regulatory biology remains an open question.

View Article and Find Full Text PDF

EvoAug-TF: extending evolution-inspired data augmentations for genomic deep learning to TensorFlow.

Yiyang Yu Shivani Muthukumar Peter K Koo

Bioinformatics

March 2024

Summary: Deep neural networks (DNNs) have been widely applied to predict the molecular functions of the non-coding genome. DNNs are data hungry and thus require many training examples to fit data well. However, functional genomics experiments typically generate limited amounts of data, constrained by the activity levels of the molecular function under study inside the cell.

View Article and Find Full Text PDF

EvoAug-TF: Extending evolution-inspired data augmentations for genomic deep learning to TensorFlow.

Yiyang Yu Shivani Muthukumar Peter K Koo

bioRxiv

January 2024

Unlabelled: Deep neural networks (DNNs) have been widely applied to predict the molecular functions of regulatory regions in the non-coding genome. DNNs are data hungry and thus require many training examples to fit data well. However, functional genomics experiments typically generate limited amounts of data, constrained by the activity levels of the molecular function under study inside the cell.

View Article and Find Full Text PDF

Current approaches to genomic deep learning struggle to fully capture human genetic variation.

Ziqi Tang Shushan Toneyan Peter K Koo

Nat Genet

December 2023

View Article and Find Full Text PDF

Interpreting -regulatory mechanisms from genomic deep neural networks using surrogate models.

Evan E Seitz David M McCandlish Justin B Kinney Peter K Koo

bioRxiv

March 2024

Deep neural networks (DNNs) have greatly advanced the ability to predict genome function from sequence. Interpreting genomic DNNs in terms of biological mechanisms, however, remains difficult. Here we introduce SQUID, a genomic DNN interpretability framework based on surrogate modeling.

View Article and Find Full Text PDF

Interpreting -Regulatory Interactions from Large-Scale Deep Neural Networks for Genomics.

Shushan Toneyan Peter K Koo

bioRxiv

March 2024

The rise of large-scale, sequence-based deep neural networks (DNNs) for predicting gene expression has introduced challenges in their evaluation and interpretation. Current evaluations align DNN predictions with experimental perturbation assays, which provides insights into the generalization capabilities within the studied loci but offers a limited perspective of what drives their predictions. Moreover, existing model explainability tools focus mainly on motif analysis, which becomes complex when interpreting longer sequences.

View Article and Find Full Text PDF

Evaluating deep learning for predicting epigenomic profiles.

Shushan Toneyan Ziqi Tang Peter K Koo

Nat Mach Intell

December 2022

Deep learning has been successful at predicting epigenomic profiles from DNA sequences. Most approaches frame this task as a binary classification relying on peak callers to define functional activity. Recently, quantitative models have emerged to directly predict the experimental coverage values as a regression.

View Article and Find Full Text PDF

ChampKit: A framework for rapid evaluation of deep neural networks for patch-based histopathology classification.

Jakub R Kaczmarzyk Rajarsi Gupta Tahsin M Kurc Shahira Abousamra Joel H Saltz Peter K Koo

Comput Methods Programs Biomed

September 2023

Background And Objective: Histopathology is the gold standard for diagnosis of many cancers. Recent advances in computer vision, specifically deep learning, have facilitated the analysis of histopathology images for many tasks, including the detection of immune cells and microsatellite instability. However, it remains difficult to identify optimal models and training configurations for different histopathology classification tasks due to the abundance of available architectures and the lack of systematic evaluations.

View Article and Find Full Text PDF

Selecting deep neural networks that yield consistent attribution-based interpretations for genomics.

Antonio Majdandzic Chandana Rajesh Amber Tang Shushan Toneyan Ethan Labelson Peter K Koo

Proc Mach Learn Res

November 2022

Deep neural networks (DNNs) have advanced our ability to take DNA primary sequence as input and predict a myriad of molecular activities measured via high-throughput functional genomic assays. Post hoc attribution analysis has been employed to provide insights into the importance of features learned by DNNs, often revealing patterns such as sequence motifs. However, attribution maps typically harbor spurious importance scores to an extent that varies from model to model, even for DNNs whose predictions generalize well.

View Article and Find Full Text PDF

Analysis of single-cell CRISPR perturbations indicates that enhancers act multiplicatively and provides limited evidence for epistatic-like interactions.

Jessica Zhou Karthik Guruvayurappan Shushan Toneyan Hsiuyi V Chen Aaron R Chen Peter Koo

bioRxiv

July 2024

A single gene may have multiple enhancers, but how they work in concert to regulate transcription is poorly understood. To analyze enhancer interactions throughout the genome, we developed a generalized linear modeling framework, GLiMMIRS, for interrogating enhancer effects from single-cell CRISPR experiments. We applied GLiMMIRS to a published dataset and tested for interactions between 46,166 enhancer pairs and corresponding genes, including 264 'high-confidence' enhancer pairs.

View Article and Find Full Text PDF

Correcting gradient-based interpretations of deep neural networks for genomics.

Antonio Majdandzic Chandana Rajesh Peter K Koo

Genome Biol

May 2023

Post hoc attribution methods can provide insights into the learned patterns from deep neural networks (DNNs) trained on high-throughput functional genomics data. However, in practice, their resultant attribution maps can be challenging to interpret due to spurious importance scores for seemingly arbitrary nucleotides. Here, we identify a previously overlooked attribution noise source that arises from how DNNs handle one-hot encoded DNA.

View Article and Find Full Text PDF

EvoAug: improving generalization and interpretability of genomic deep neural networks with evolution-inspired data augmentations.

Nicholas Keone Lee Ziqi Tang Shushan Toneyan Peter K Koo

Genome Biol

May 2023

Deep neural networks (DNNs) hold promise for functional genomics prediction, but their generalization capability may be limited by the amount of available data. To address this, we propose EvoAug, a suite of evolution-inspired augmentations that enhance the training of genomic DNNs by increasing genetic variation. Random transformation of DNA sequences can potentially alter their function in unknown ways, so we employ a fine-tuning procedure using the original non-transformed data to preserve functional integrity.

View Article and Find Full Text PDF

Light and temperature regulate mA-RNA modification to regulate growth in plants.

Oliver Artz Amanda Ackermann Laura Taylor Peter K Koo Ullas V Pedmale

bioRxiv

January 2023

N6-methyladenosine is a highly dynamic, abundant mRNA modification which is an excellent potential mechanism for fine tuning gene expression. Plants adapt to their surrounding light and temperature environment using complex gene regulatory networks. The role of mA in controlling gene expression in response to variable environmental conditions has so far been unexplored.

View Article and Find Full Text PDF

ResidualBind: Uncovering Sequence-Structure Preferences of RNA-Binding Proteins with Deep Neural Networks.

Peter K Koo Matt Ploenzke Praveen Anand Steffan Paul Antonio Majdandzic

Methods Mol Biol

January 2023

Deep neural networks have demonstrated improved performance at predicting sequence specificities of DNA- and RNA-binding proteins. However, it remains unclear why they perform better than previous methods that rely on k-mers and position weight matrices. Here, we highlight a recent deep learning-based software package, called ResidualBind, that analyzes RNA-protein interactions using only RNA sequence as an input feature and performs global importance analysis for model interpretability.

View Article and Find Full Text PDF

ETV6 dependency in Ewing sarcoma by antagonism of EWS-FLI1-mediated enhancer activation.

Yuan Gao Xue-Yan He Xiaoli S Wu Yu-Han Huang Shushan Toneyan Peter K Koo

Nat Cell Biol

February 2023

The EWS-FLI1 fusion oncoprotein deregulates transcription to initiate the paediatric cancer Ewing sarcoma. Here we used a domain-focused CRISPR screen to implicate the transcriptional repressor ETV6 as a unique dependency in this tumour. Using biochemical assays and epigenomics, we show that ETV6 competes with EWS-FLI1 for binding to select DNA elements enriched for short GGAA repeat sequences.

View Article and Find Full Text PDF

Learning single-cell chromatin accessibility profiles using meta-analytic marker genes.

Risa Karakida Kawaguchi Ziqi Tang Stephan Fischer Chandana Rajesh Rohit Tripathy Peter K Koo

Brief Bioinform

January 2023

Motivation: Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a valuable resource to learn cis-regulatory elements such as cell-type specific enhancers and transcription factor binding sites. However, cell-type identification of scATAC-seq data is known to be challenging due to the heterogeneity derived from different protocols and the high dropout rate.

Results: In this study, we perform a systematic comparison of seven scATAC-seq datasets of mouse brain to benchmark the efficacy of neuronal cell-type annotation from gene sets.

View Article and Find Full Text PDF

End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman.

Samantha Petti Nicholas Bhattacharya Roshan Rao Justas Dauparas Neil Thomas Peter Koo

Bioinformatics

January 2023

Motivation: Multiple sequence alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for.

Results: Here, we implement a smooth and differentiable version of the Smith-Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion.

View Article and Find Full Text PDF

Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention.

Nicholas Bhattacharya Neil Thomas Roshan Rao Justas Dauparas Peter K Koo

Pac Symp Biocomput

January 2022

The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction.

View Article and Find Full Text PDF

Improving representations of genomic sequence motifs in convolutional networks with exponential activations.

Peter K Koo Matt Ploenzke

Nat Mach Intell

March 2021

Deep convolutional neural networks (CNNs) trained on regulatory genomic sequences tend to build representations in a distributed manner, making it a challenge to extract learned features that are biologically meaningful, such as sequence motifs. Here we perform a comprehensive analysis on synthetic sequences to investigate the role that CNN activations have on model interpretability. We show that employing an exponential activation to first layer filters consistently leads to interpretable and robust representations of motifs compared to other commonly used activations.

View Article and Find Full Text PDF

Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks.

Peter K Koo Antonio Majdandzic Matthew Ploenzke Praveen Anand Steffan B Paul

PLoS Comput Biol

May 2021

Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. To gain insights into why a DNN makes a given prediction, model interpretability methods, such as attribution methods, can be employed to identify motif-like representations along a given sequence. Because explanations are given on an individual sequence basis and can vary substantially across sequences, deducing generalizable trends across the dataset and quantifying their effect size remains a challenge.

View Article and Find Full Text PDF