Genome-wide association study data analyses often face two significant challenges: (i) high dimensionality of single-nucleotide polymorphism (SNP) genotypes and (ii) imputation of missing values. SNPs are not independent due to physical linkage and natural selection. The correlation of nearby SNPs is known as linkage disequilibrium (LD), which can be used for LD conceptual SNP bin mapping, missing genotype inferencing and SNP dimension reduction. We used a stochastic process to describe the SNP signals and proposed two types of autocorrelations to measure nearby SNPs' information redundancy. Based on the calculated autocorrelation coefficients, we constructed LD bins. We adopted a -nearest neighbors algorithm (kNN) to impute the missing genotypes. We proposed several novel methods to find the optimal synthetic marker to represent the SNP bin. We also proposed methods to evaluate the information loss or information conservation between using the original genome-wide markers and using dimension-reduced synthetic markers. Our performance assessments on the real-life SNP data from a rice recombinant inbred line (RIL) population and a rice HapMap project show that the new methods produce satisfactory results. We implemented these functional modules in C/C++ and streamlined them into a web-based pipeline named PIP-SNP (https://bioinfo.noble.org/PIP_SNP/) for processing SNP data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8256826PMC
http://dx.doi.org/10.1093/nargab/lqab060DOI Listing

Publication Analysis

Top Keywords

snp data
12
snp
8
processing snp
8
linkage disequilibrium
8
bin mapping
8
snp bin
8
pip-snp pipeline
4
pipeline processing
4
data
4
data featured
4

Similar Publications

Congenital heart disease (CHD) represents nearly one-third of congenital birth defects annually, with ventricular septal defect (VSD) being the most common type. The aim of this study was to explore the role of specific GATA binding protein 6 gene () mutations as a potential etiological factor in the development of VSD through an in silico approach. Data were collected from the human gene databases: DisGeNET and GeneCards, with protein-protein interaction networks constructed via STRING and Cytoscape.

View Article and Find Full Text PDF

Environmental gradients shape genetic variation in the desert moss, Syntrichia caninervis Mitt. (Pottiaceae).

Sci Rep

January 2025

Department of Biological Sciences, California State University Los Angeles, 5151 State University Dr, Los Angeles, CA, 90032, USA.

The moss Syntrichia caninervis Mitt. is distributed throughout drylands globally, and often anchors ecologically significant communities known as biological soil crusts (biocrusts). The species occupies a variety of dryland habitats with varying levels of drought and temperature stress, suggesting the potential for ecological specialization within S.

View Article and Find Full Text PDF

Phylogenetic analyses are crucial for understanding microbial evolution and infectious disease transmission. Bacterial phylogenies are often inferred from SNP alignments, with SNPs as the fundamental signal within these data. SNP alignments can be reduced to a 'strict core' by removing those sites that do not have data present in every sample.

View Article and Find Full Text PDF

Genome-wide association studies (GWAS) are hypothesis-free studies that estimate the association between polymorphisms across the genome with a trait of interest. To increase power and to estimate the direct effects of these single-nucleotide polymorphisms (SNPs) on a trait GWAS are often conditioned on a covariate (such as body mass index or smoking status). This adjustment can introduce bias in the estimated effect of the SNP on the trait.

View Article and Find Full Text PDF

Background: Root rot is a major disease affecting alfalfa (Medicago sativa L.), causing significant yield losses and economic damage. The primary pathogens include Fusarium spp.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!