DrivR-Base: a feature extraction toolkit for variant effect prediction model construction.

Bioinformatics

MRC Integrative Epidemiology Unit, Bristol Medical School (PHS), University of Bristol, Bristol BS8 2BN, United Kingdom.

Published: March 2024

Motivation: Recent advancements in sequencing technologies have led to the discovery of numerous variants in the human genome. However, understanding their precise roles in diseases remains challenging due to their complex functional mechanisms. Various methodologies have emerged to predict the pathogenic significance of these genetic variants. Typically, these methods employ an integrative approach, leveraging diverse data sources that provide important insights into genomic function. Despite the abundance of publicly available data sources and databases, the process of navigating, extracting, and pre-processing features for machine learning models can be highly challenging and time-consuming. Furthermore, researchers often invest substantial effort in feature extraction, only to later discover that these features lack informativeness.

Results: In this article, we introduce DrivR-Base, an innovative resource that efficiently extracts and integrates molecular information (features) related to single nucleotide variants. These features encompass information about the genomic positions and the associated protein positions of a variant. They are derived from a wide array of databases and tools, including structural properties obtained from AlphaFold, regulatory information sourced from ENCODE, and predicted variant consequences from Variant Effect Predictor. DrivR-Base is easily deployable via a Docker container to ensure reproducibility and ease of access across diverse computational environments. The resulting features can be used as input for machine learning models designed to predict the pathogenic impact of human genome variants in disease. Moreover, these feature sets have applications beyond this, including haploinsufficiency prediction and the development of drug repurposing tools. We describe the resource's development, practical applications, and potential for future expansion and enhancement.

Availability And Implementation: DrivR-Base source code is available at https://github.com/amyfrancis97/DrivR-Base.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11057939PMC
http://dx.doi.org/10.1093/bioinformatics/btae197DOI Listing

Publication Analysis

Top Keywords

feature extraction
8
human genome
8
predict pathogenic
8
data sources
8
machine learning
8
learning models
8
features
5
drivr-base
4
drivr-base feature
4
extraction toolkit
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!