Predicting target genes of non-coding regulatory variants with IRT.

Bioinformatics

Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, 94305 CA, USA.

Published: August 2020

Summary: Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies.

Availability And Implementation: Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7575052	PMC
http://dx.doi.org/10.1093/bioinformatics/btaa254	DOI Listing

Publication Analysis

Top Keywords

non-coding variants

target genes

variants

regulatory variants

irt achieves

irt

regulatory

predicting target

genes

non-coding

Similar Publications

Missing Regulation Between Genetic Association and Transcriptional Abundance for Hypercholesterolemia Genes.

Genes (Basel)

January 2025

Division of Genetics and Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA.

Aaron Hakim Noah J Connally Gavin R Schnitzler Michael H Cho Z Gordon Jiang

Low-density lipoprotein cholesterol (LDL-C) is a well-established risk factor for cardiovascular disease, and it plays a causal role in the development of atherosclerosis. Genome-wide association studies (GWASs) have successfully identified hundreds of genetic variants associated with LDL-C. Most of these risk loci fall in non-coding regions of the genome, and it is unclear how these non-coding variants affect circulating lipid levels.

View Article and Find Full Text PDF

Similar Publications

Interplay between genetics and epigenetics in lung fibrosis.

Int J Biochem Cell Biol

January 2025

Centre for Respiratory Research, Translational Medical Sciences, School of Medicine, University of Nottingham, UK; Nottingham NIHR Biomedical Research Centre, Nottingham, UK; Biodiscovery Institute, University Park, University of Nottingham, UK. Electronic address:

Anita Valand Poojitha Rajasekar Louise V Wain Rachel L Clifford

Lung fibrosis, including idiopathic pulmonary fibrosis (IPF), is a complex and devastating disease characterised by the progressive scarring of lung tissue leading to compromised respiratory function. Aberrantly activated fibroblasts deposit extracellular matrix components into the surrounding lung tissue, impairing lung function and capacity for gas exchange. Both genetic and epigenetic factors have been found to play a role in the pathogenesis of lung fibrosis, with emerging evidence highlighting the interplay between these two regulatory mechanisms.

View Article and Find Full Text PDF

Similar Publications

Genetic predisposition to Behcet's disease mediated by a IL10RA enhancer polymorphism.

Heliyon

January 2025

The First Affiliated Hospital of Chongqing Medical University, Chongqing Branch (Municipality Division) of National Clinical Research Center for Ocular Diseases, Chongqing, PR China.

Handan Tan Zhenyu Zhong Xiaojie Feng Xiang Luo Qingfeng Cao

Background: Several studies suggested the genetic association between IL10RA variants and susceptibility to Behcet's disease (BD). However, the precise mechanism of the association is still unknown. The purpose of this study was to investigate the mechanism underlying the genetic associations between IL10RA polymorphisms and the risk of BD.

View Article and Find Full Text PDF

Similar Publications

Multi-omics analysis reveals novel causal pathways in psoriasis pathogenesis.

J Transl Med

January 2025

Department of Academic Research, The Second Hospital of Shandong University, Jinan, Shandong, China.

Hua Guo Jinyang Gao Liping Gong Yanqing Wang

Background: To elucidate the genetic and molecular mechanisms underlying psoriasis by employing an integrative multi-omics approach, using summary-data-based Mendelian randomization (SMR) to infer causal relationships among DNA methylation, gene expression, and protein levels in relation to psoriasis risk.

Methods: We conducted SMR analyses integrating genome-wide association study (GWAS) summary statistics with methylation quantitative trait loci (mQTL), expression quantitative trait loci (eQTL), and protein quantitative trait loci (pQTL) data. Publicly available datasets were utilized, including psoriasis GWAS data from the European Molecular Biology Laboratory-European Bioinformatics Institute and the UK Biobank.

View Article and Find Full Text PDF

Similar Publications

Regulatory non-coding somatic mutations as drivers of neuroblastoma.

Br J Cancer

January 2025

University of Naples Federico II, Department of Molecular Medicine and Medical Biotechnology, Naples, Italy.

Annalaura Montella Matilde Tirelli Vito Alessandro Lasorsa Vincenzo Aievola Vincenza Cerbone

Background: Emerging evidence suggests that non-coding somatic single nucleotide variants (SNVs) in cis-regulatory elements (CREs) contribute to cancer by disrupting gene expression networks. However, the role of non-coding SNVs in cancer, particularly neuroblastoma, remains largely unclear.

Methods: SNVs effect on CREs activity was evaluated by luciferase assays.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!