AI Article Synopsis

  • SNPs can be correlated due to linkage disequilibrium (LD), affecting the performance of Random Forest (RF) analyses in association studies.
  • Different methods, such as focusing on SNPs in linkage equilibrium (LE), adjusting importance measures, or using haplotypes instead of SNPs, can help address challenges posed by SNPs in LD.
  • The findings indicate that using a revised importance measure with the original RF is the most effective approach, especially when the genetic model is unclear, while haplotype-based methods tend to underperform as LD increases.

Article Abstract

Background: Single nucleotide polymorphisms (SNPs) may be correlated due to linkage disequilibrium (LD). Association studies look for both direct and indirect associations with disease loci. In a Random Forest (RF) analysis, correlation between a true risk SNP and SNPs in LD may lead to diminished variable importance for the true risk SNP. One approach to address this problem is to select SNPs in linkage equilibrium (LE) for analysis. Here, we explore alternative methods for dealing with SNPs in LD: change the tree-building algorithm by building each tree in an RF only with SNPs in LE, modify the importance measure (IM), and use haplotypes instead of SNPs to build a RF.

Results: We evaluated the performance of our alternative methods by simulation of a spectrum of complex genetics models. When a haplotype rather than an individual SNP is the risk factor, we find that the original Random Forest method performed on SNPs provides good performance. When individual, genotyped SNPs are the risk factors, we find that the stronger the genetic effect, the stronger the effect LD has on the performance of the original RF. A revised importance measure used with the original RF is relatively robust to LD among SNPs; this revised importance measure used with the revised RF is sometimes inflated. Overall, we find that the revised importance measure used with the original RF is the best choice when the genetic model and the number of SNPs in LD with risk SNPs are unknown. For the haplotype-based method, under a multiplicative heterogeneity model, we observed a decrease in the performance of RF with increasing LD among the SNPs in the haplotype.

Conclusion: Our results suggest that by strategically revising the Random Forest method tree-building or importance measure calculation, power can increase when LD exists between SNPs. We conclude that the revised Random Forest method performed on SNPs offers an advantage of not requiring genotype phase, making it a viable tool for use in the context of thousands of SNPs, such as candidate gene studies and follow-up of top candidates from genome wide association studies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2666661PMC
http://dx.doi.org/10.1186/1471-2105-10-78DOI Listing

Publication Analysis

Top Keywords

random forest
20
snps
16
forest method
12
revised measure
12
snps linkage
8
linkage disequilibrium
8
association studies
8
true risk
8
risk snp
8
alternative methods
8

Similar Publications

This study investigates the electronic properties and photovoltaic (PV) performance of newly designed bithiophene-based dyes, focusing on their light harvesting efficiency (LHE), open-circuit voltage (V), fill factor (FF), and short-circuit current density (J).These new dyes are designed with the help of machine learning (ML) to design best donor acceptor designs. For this, we collect 2567 differenr electron donor groups and calculated their bandgap with the help of Random Forest (RF) Regression method.

View Article and Find Full Text PDF

IL-33, a neutrophil extracellular trap-related gene involved in the progression of diabetic kidney disease.

Inflamm Res

January 2025

Department of Nephrology, First Affiliated Hospital of Naval Medical University, Shanghai Changhai Hospital, Shanghai, China.

Background: Chronic inflammation is well recognized as a key factor related to renal function deterioration in patients with diabetic kidney disease (DKD). Neutrophil extracellular traps (NETs) play an important role in amplifying inflammation. With respect to NET-related genes, the aim of this study was to explore the mechanism of DKD progression and therefore identify potential intervention targets.

View Article and Find Full Text PDF

Age-related cognitive impairment and dementia pose a significant global health, social, and economic challenge. While Alzheimer's disease (AD) has historically been viewed as the leading cause of dementia, recent evidence reveals the considerable impact of vascular cognitive impairment and dementia (VCID), which now accounts for nearly half of all dementia cases. The Mediterranean diet-characterized by high consumption of fruits, vegetables, whole grains, fish, and olive oil-has been widely recognized for its cardiovascular benefits and may also reduce the risk of cognitive decline and dementia.

View Article and Find Full Text PDF

COX-2 Inhibitor Prediction With KNIME: A Codeless Automated Machine Learning-Based Virtual Screening Workflow.

J Comput Chem

January 2025

Pharmaceutical Chemistry Research Laboratory 1, Department of Pharmaceutical Engineering & Technology, Indian Institute of Technology (Banaras Hindu University), Varanasi, India.

Cyclooxygenase-2 (COX-2) is an enzyme that plays a crucial role in inflammation by converting arachidonic acid into prostaglandins. The overexpression of enzyme is associated with conditions such as cancer, arthritis, and Alzheimer's disease (AD), where it contributes to neuroinflammation. In silico virtual screening is pivotal in early-stage drug discovery; however, the absence of coding or machine learning expertise can impede the development of reliable computational models capable of accurately predicting inhibitor compounds based on their chemical structure.

View Article and Find Full Text PDF

Background: Non-small cell lung cancer (NSCLC) is a fatal disease, and radioresistance is an important factor leading to treatment failure and disease progression. The objective of this research was to detect radioresistance-related genes (RRRGs) with prognostic value in NSCLC.

Methods: The weighted gene coexpression network analysis (WGCNA) and differentially expressed genes (DEGs) analysis were performed to identify RRRGs using expression profiles from TCGA and GEO databases.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!