Summary: Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies.
Availability And Implementation: Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7575052 | PMC |
http://dx.doi.org/10.1093/bioinformatics/btaa254 | DOI Listing |
Genes (Basel)
January 2025
Division of Genetics and Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA.
Low-density lipoprotein cholesterol (LDL-C) is a well-established risk factor for cardiovascular disease, and it plays a causal role in the development of atherosclerosis. Genome-wide association studies (GWASs) have successfully identified hundreds of genetic variants associated with LDL-C. Most of these risk loci fall in non-coding regions of the genome, and it is unclear how these non-coding variants affect circulating lipid levels.
View Article and Find Full Text PDFInt J Biochem Cell Biol
January 2025
Centre for Respiratory Research, Translational Medical Sciences, School of Medicine, University of Nottingham, UK; Nottingham NIHR Biomedical Research Centre, Nottingham, UK; Biodiscovery Institute, University Park, University of Nottingham, UK. Electronic address:
Lung fibrosis, including idiopathic pulmonary fibrosis (IPF), is a complex and devastating disease characterised by the progressive scarring of lung tissue leading to compromised respiratory function. Aberrantly activated fibroblasts deposit extracellular matrix components into the surrounding lung tissue, impairing lung function and capacity for gas exchange. Both genetic and epigenetic factors have been found to play a role in the pathogenesis of lung fibrosis, with emerging evidence highlighting the interplay between these two regulatory mechanisms.
View Article and Find Full Text PDFHeliyon
January 2025
The First Affiliated Hospital of Chongqing Medical University, Chongqing Branch (Municipality Division) of National Clinical Research Center for Ocular Diseases, Chongqing, PR China.
Background: Several studies suggested the genetic association between IL10RA variants and susceptibility to Behcet's disease (BD). However, the precise mechanism of the association is still unknown. The purpose of this study was to investigate the mechanism underlying the genetic associations between IL10RA polymorphisms and the risk of BD.
View Article and Find Full Text PDFJ Transl Med
January 2025
Department of Academic Research, The Second Hospital of Shandong University, Jinan, Shandong, China.
Background: To elucidate the genetic and molecular mechanisms underlying psoriasis by employing an integrative multi-omics approach, using summary-data-based Mendelian randomization (SMR) to infer causal relationships among DNA methylation, gene expression, and protein levels in relation to psoriasis risk.
Methods: We conducted SMR analyses integrating genome-wide association study (GWAS) summary statistics with methylation quantitative trait loci (mQTL), expression quantitative trait loci (eQTL), and protein quantitative trait loci (pQTL) data. Publicly available datasets were utilized, including psoriasis GWAS data from the European Molecular Biology Laboratory-European Bioinformatics Institute and the UK Biobank.
Br J Cancer
January 2025
University of Naples Federico II, Department of Molecular Medicine and Medical Biotechnology, Naples, Italy.
Background: Emerging evidence suggests that non-coding somatic single nucleotide variants (SNVs) in cis-regulatory elements (CREs) contribute to cancer by disrupting gene expression networks. However, the role of non-coding SNVs in cancer, particularly neuroblastoma, remains largely unclear.
Methods: SNVs effect on CREs activity was evaluated by luciferase assays.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!