AI Article Synopsis

  • Missing data is a frequent issue in high-throughput DNA sequencing, often stemming from low-quality samples or experimental errors.
  • The authors propose new statistical methods that allow for the analysis of DNA sequences with missing data without needing to exclude problematic bases or individuals.
  • These methods include modifications to traditional neutrality tests and can also be applied to other types of data, like DNA microarrays, providing a comprehensive framework for such analyses.

Article Abstract

Missing data are common in DNA sequences obtained through high-throughput sequencing. Furthermore, samples of low quality or problems in the experimental protocol often cause a loss of data even with traditional sequencing technologies. Here we propose modified estimators of variability and neutrality tests that can be naturally applied to sequences with missing data, without the need to remove bases or individuals from the analysis. Modified statistics include the Watterson estimator θW, Tajima's D, Fay and Wu's H, and HKA. We develop a general framework to take missing data into account in frequency spectrum-based neutrality tests and we derive the exact expression for the variance of these statistics under the neutral model. The neutrality tests proposed here can also be used as summary statistics to describe the information contained in other classes of data like DNA microarrays.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3416018PMC
http://dx.doi.org/10.1534/genetics.112.139949DOI Listing

Publication Analysis

Top Keywords

neutrality tests
16
missing data
16
sequences missing
8
data
6
neutrality
4
tests sequences
4
missing
4
data missing
4
data common
4
common dna
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!