Motivation: Linkage disequilibrium (LD) block construction is required for research in population genetics and genetic epidemiology, including specification of sets of single nucleotide polymorphisms (SNPs) for analysis of multi-SNP based association and identification of haplotype blocks in high density sequencing data. Existing methods based on a narrow sense definition do not allow intermediate regions of low LD between strongly associated SNP pairs and tend to split high density SNP data into small blocks having high between-block correlation.

Results: We present Big-LD, a block partition method based on interval graph modeling of LD bins which are clusters of strong pairwise LD SNPs, not necessarily physically consecutive. Big-LD uses an agglomerative approach that starts by identifying small communities of SNPs, i.e. the SNPs in each LD bin region, and proceeds by merging these communities. We determine the number of blocks using a method to find maximum-weight independent set. Big-LD produces larger LD blocks compared to existing methods such as MATILDE, Haploview, MIG ++, or S-MIG ++ and the LD blocks better agree with recombination hotspot locations determined by sperm-typing experiments. The observed average runtime of Big-LD for 13 288 240 non-monomorphic SNPs from 1000 Genomes Project autosome data (286 East Asians) is about 5.83 h, which is a significant improvement over the existing methods.

Availability And Implementation: Source code and documentation are available for download at http://github.com/sunnyeesl/BigLD.

Contact: yyoo@snu.ac.kr.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5860363PMC
http://dx.doi.org/10.1093/bioinformatics/btx609DOI Listing

Publication Analysis

Top Keywords

sequencing data
8
based interval
8
interval graph
8
graph modeling
8
blocks high
8
high density
8
existing methods
8
snps
6
data
5
blocks
5

Similar Publications

Identification and validation of up-regulated TNFAIP6 in osteoarthritis with type 2 diabetes mellitus.

Sci Rep

December 2024

Division of Joint Surgery and Sports Medicine, Department of Orthopedic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, 430071, China.

Lines of evidence have indicated that type 2 diabetes mellitus (T2DM) is an independent risk factor for osteoarthritis (OA) progression. However, the study focused on the relationship between T2DM and OA at the transcriptional level remains empty. We downloaded OA- and T2DM-related bulk RNA-sequencing and single-cell RNA sequencing data from the Gene Expression Omnibus (GEO) dataset.

View Article and Find Full Text PDF

Focusing on the Yashkun population of Gilgit-Baltistan, an administrative territory in northern Pakistan, our study investigated mtDNA haplotypes as indicators of ancient gene flow and genetic diversity. Genomic DNA was extracted and evaluated for quality using agarose gel electrophoresis. The complete control region of mtDNA (nt 16024-576) was amplified via PCR, and sequencing was performed using the Big Dye Terminator Kit on an Applied Biosystems Genetic Analyzer.

View Article and Find Full Text PDF

A stable combination of non-stable genes outperforms standard reference genes for RT-qPCR data normalization.

Sci Rep

December 2024

Laboratoire de Recherche en Sciences Végétales, Equipe Génomique et Biotechnologie des Fruits, UMR 5546, CNRS, UPS, Toulouse INP, Université de Toulouse, Toulouse, France.

Gene expression profiling is of key importance in all domains of life sciences, as medicine, environment, and plants, for both basic and applied research. Despite the emergence of microarrays and high-throughput sequencing, qPCR remains a standard method for gene expression analyses, with its data normalization step being crucial for ensuring accuracy. Currently, the most widely used normalization method is based on the use of reference genes, assumed to be stably expressed across all experimental conditions.

View Article and Find Full Text PDF

A recent study proposed a new genetic lineage of leatherback turtles (Dermochelys coriacea) based on genetic analysis, environmental history, and local ecological knowledge (LEK), suggesting the existence of two possible species or subspecies on the beaches of Oaxaca, diverging ~ 13.5 Mya. However, this hypothesis may be influenced by nuclear mitochondrial DNA segments (NUMTs), which could have been misamplified as true mtDNA.

View Article and Find Full Text PDF

Cinnamomum camphora, a key multifunctional tree species, primarily serves in landscaping. Leaf color is crucial for its ornamental appeal, undergoing a transformation to red that enhances the ornamental value of C. camphora.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!