Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes.

BMC Bioinformatics

Spanish National Genotyping Center (CeGen), Genomic Medicine Group, CIBERER, University of Santiago de Compostela, Galicia, Spain.

Published: March 2009

Background: Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies.

Results: To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases.

Conclusion: The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2665053PMC
http://dx.doi.org/10.1186/1471-2105-10-S3-S5DOI Listing

Publication Analysis

Top Keywords

population genetics
16
data
10
genetics analysis
8
snp genotypes
8
snp repositories
8
raw data
8
additional complementary
8
statistical indices
8
snp
6
population
5

Similar Publications

We build and study an individual based model of the telomere length's evolution in a population across multiple generations. This model is a continuous time typed branching process, where the type of an individual includes its gamete mean telomere length and its age. We study its Malthusian's behaviour and provide numerical simulations to understand the influence of biologically relevant parameters.

View Article and Find Full Text PDF

Introduction: China implemented a dynamic zero-COVID strategy to curb viral transmission in response to the coronavirus disease 2019 (COVID-19) pandemic. This strategy was designed to inhibit mutation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus responsible for COVID-19. This study explores the dynamics of viral evolution under stringent non-pharmaceutical interventions (NPIs) through real-world observations.

View Article and Find Full Text PDF

Aedes mosquitoes transmit pathogenic arthropod-borne (arbo) viruses, putting nearly half the world's population at risk. Blocking virus replication in mosquitoes is a promising approach to prevent arbovirus transmission, the development of which requires in-depth knowledge of virus-host interactions and mosquito immunity. By integrating multi-omics data, we find that heat shock factor 1 (Hsf1) regulates eight small heat shock protein (sHsp) genes within one topologically associated domain in the genome of the Aedes aegypti mosquito.

View Article and Find Full Text PDF

Telomere shortening ultimately causes replicative senescence. However, identifying the mechanisms driving replicative senescence in cell populations is challenging due to the heterogeneity of telomere lengths and the asynchrony of senescence onset. Here, we present a mathematical model of telomere shortening and replicative senescence in Saccharomyces cerevisiae which is quantitatively calibrated and validated using data of telomerase-deficient single cells.

View Article and Find Full Text PDF

Epigenetic variation in light of population genetic practice.

Nat Commun

January 2025

Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany.

The evolutionary impact of epigenetic variation depends on its transgenerational stability and source - whether genetically determined, environmentally induced, or due to spontaneous, genotype-independent mutations. Here, we evaluate current approaches for investigating an independent role of epigenetics in evolution, pinpointing methodological challenges. We further identify opportunities arising from integrating epigenetic data with population genetic analyses in natural populations.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!