To enable large-scale analyses of transcription regulation in model species, we developed DeepArk, a set of deep learning models of the -regulatory activities for four widely studied species: , , , and DeepArk accurately predicts the presence of thousands of different context-specific regulatory features, including chromatin states, histone marks, and transcription factors. In vivo studies show that DeepArk can predict the regulatory impact of any genomic variant (including rare or not previously observed) and enables the regulatory annotation of understudied model species.
View Article and Find Full Text PDFThe microbiome is a new frontier for building predictors of human phenotypes. However, machine learning in the microbiome is fraught with issues of reproducibility, driven in large part by the wide range of analytic models and metagenomic data types available. We aimed to build robust metagenomic predictors of host phenotype by comparing prediction performances and biological interpretation across 8 machine learning methods and 4 different types of metagenomic data.
View Article and Find Full Text PDFTo enable the application of deep learning in biology, we present Selene (https://selene.flatironinstitute.org/), a PyTorch-based deep learning library for fast and easy development, training, and application of deep learning model architectures for any biological sequence data.
View Article and Find Full Text PDFDeep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood.
View Article and Find Full Text PDFMotivation: Across biology, we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities.
Results: Here, we present Aether (http://aether.kosticlab.