Fast genotyping of known SNPs through approximate k-mer matching.

Bioinformatics

Computer Science and AI Lab Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.

Published: September 2016

AI Article Synopsis

  • The increase in next-generation sequencing (NGS) data necessitates faster algorithms, particularly for genotyping known variants, which is crucial for studying genetic traits and diseases.
  • LAVA (Lightweight Assignment of Variant Alleles) is a new algorithm that effectively identifies SNPs by using approximate matching of mid-size k-mers, making it significantly faster and more accurate than traditional NGS approaches.
  • LAVA operates on minimal RAM (around 5 GB) and is available for public use, making it suitable for large-scale population studies and serving as an alternative to SNP arrays.

Article Abstract

Motivation: As the volume of next-generation sequencing (NGS) data increases, faster algorithms become necessary. Although speeding up individual components of a sequence analysis pipeline (e.g. read mapping) can reduce the computational cost of analysis, such approaches do not take full advantage of the particulars of a given problem. One problem of great interest, genotyping a known set of variants (e.g. dbSNP or Affymetrix SNPs), is important for characterization of known genetic traits and causative disease variants within an individual, as well as the initial stage of many ancestral and population genomic pipelines (e.g. GWAS).

Results: We introduce lightweight assignment of variant alleles (LAVA), an NGS-based genotyping algorithm for a given set of SNP loci, which takes advantage of the fact that approximate matching of mid-size k-mers (with k = 32) can typically uniquely identify loci in the human genome without full read alignment. LAVA accurately calls the vast majority of SNPs in dbSNP and Affymetrix's Genome-Wide Human SNP Array 6.0 up to about an order of magnitude faster than standard NGS genotyping pipelines. For Affymetrix SNPs, LAVA has significantly higher SNP calling accuracy than existing pipelines while using as low as ∼5 GB of RAM. As such, LAVA represents a scalable computational method for population-level genotyping studies as well as a flexible NGS-based replacement for SNP arrays.

Availability And Implementation: LAVA software is available at http://lava.csail.mit.edu

Contact: bab@mit.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5013917PMC
http://dx.doi.org/10.1093/bioinformatics/btw460DOI Listing

Publication Analysis

Top Keywords

affymetrix snps
8
lava
5
fast genotyping
4
snps
4
genotyping snps
4
snps approximate
4
approximate k-mer
4
k-mer matching
4
matching motivation
4
motivation volume
4

Similar Publications

Heritability and Genome-Wide Association Study of Dog Behavioral Phenotypes in a Commercial Breeding Cohort.

Genes (Basel)

December 2024

Department of Basic Medical Sciences, College of Veterinary Medicine, Purdue University, West Lafayette, IN 47907, USA.

: Canine behavior plays an important role in the success of the human-dog relationship and the dog's overall welfare, making selection for behavior a vital part of any breeding program. While behaviors are complex traits determined by gene × environment interactions, genetic selection for desirable behavioral phenotypes remains possible. : No genomic association studies of dog behavior to date have been reported on a commercial breeding (CB) cohort; therefore, we utilized dogs from these facilities ( = 615 dogs).

View Article and Find Full Text PDF
Article Synopsis
  • Italian local turkey populations are crucial for maintaining genetic diversity, and efforts should focus on in vivo preservation methods.
  • Advanced genomic techniques, including whole genome sequencing and genotyping, revealed significant genetic variability and isolation among different turkey populations in Italy.
  • Findings highlight the role of selective sweeps in shaping genetic traits related to heat stress and growth, suggesting potential benefits in adapting to climate change, which can inform conservation and selection strategies for these turkeys.
View Article and Find Full Text PDF

Maize, belonging to the Poaceae family and the L. genus, stands as an excellent food crop. The plant type has a significant impact on crop growth, photosynthesis, lodging resistance, planting density, and final yield.

View Article and Find Full Text PDF

Background And Purpose: Syncope is characterized by the temporary loss of consciousness and is commonly associated with migraine. However, the genetic factors that contribute to this association are not well understood. This study investigated the specific genetic loci that make patients with migraine more susceptible to syncope as well as the genetic factors contributing to syncope and migraine comorbidity in a Han Chinese population in Taiwan.

View Article and Find Full Text PDF
Article Synopsis
  • Serum uric acid levels are linked to cardiovascular events and mortality, and the rs2231142 TT genotype is associated with higher uric acid levels but lower risk for coronary artery disease (CAD) in a Taiwanese population.
  • A study with 139,508 participants analyzed the effects of the rs2231142 genetic variants on the Framingham Risk Score for Cardiovascular Disease (FRS-CVD) using logistic regression.
  • Findings showed that TT genotype carriers exhibited better metabolic health markers despite having the same FRS-CVD score, indicating a reduced cardiovascular risk, especially among non-obese females with hyperuricemia.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!