Gene-mapping studies, regularly, rely on examination for Mendelian transmission of marker alleles in a pedigree as a way of screening for genotyping errors and mutations. For analysis of family data sets, it is, usually, necessary to resolve or remove the genotyping errors prior to consideration. At the Center of Inherited Disease Research (CIDR), to deal with their large-scale data flow, they formalized their data cleaning approach in a set of rules based on PedCheck output. We scrutinize via carefully designed simulations that how well CIDR's data cleaning rules work in practice. We found that genotype errors in siblings are detected more often than in parents for less polymorphic SNPs and vice versa for more polymorphic SNPs. Through computer simulations, we conclude that some of the CIDR's rules work poorly in some circumstances, and we suggest a set of modified data cleaning rules that may work better than CIDR's rules.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5333839PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0172807PLOS

Publication Analysis

Top Keywords

data cleaning
12
rules work
12
genotyping errors
8
cleaning rules
8
polymorphic snps
8
cidr's rules
8
rules
6
data
5
rules resolving
4
resolving mendelian
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!