Inspired by well-established material and pedagogy provided by The Carpentries (Wilson, 2016), we developed a two-day workshop curriculum that teaches introductory R programming for managing, analyzing, plotting and reporting data using packages from the tidyverse (Wickham et al., 2019), the Unix shell, version control with git, and GitHub. While the official Software Carpentry curriculum is comprehensive, we found that it contains too much content for a two-day workshop.
View Article and Find Full Text PDFSummary: LocusZoom.js is a JavaScript library for creating interactive web-based visualizations of genetic association study results. It can display one or more traits in the context of relevant biological data (such as gene models and other genomic annotation), and allows interactive refinement of analysis models (by selecting linkage disequilibrium reference panels, identifying sets of likely causal variants, or comparisons to the GWAS catalog).
View Article and Find Full Text PDFDetecting and estimating DNA sample contamination are important steps to ensure high-quality genotype calls and reliable downstream analysis. Existing methods rely on population allele frequency information for accurate estimation of contamination rates. Correctly specifying population allele frequencies for each individual in early stage of sequence analysis is impractical or even impossible for large-scale sequencing centers that simultaneously process samples from multiple studies across diverse populations.
View Article and Find Full Text PDFA detailed understanding of the genome-wide variability of single-nucleotide germline mutation rates is essential to studying human genome evolution. Here, we use ~36 million singleton variants from 3560 whole-genome sequences to infer fine-scale patterns of mutation rate heterogeneity. Mutability is jointly affected by adjacent nucleotide context and diverse genomic features of the surrounding region, including histone modifications, replication timing, and recombination rate, sometimes suggesting specific mutagenic mechanisms.
View Article and Find Full Text PDFImportance: Complex disorders, such as bipolar disorder (BD), likely result from the influence of both common and rare susceptibility alleles. While common variation has been widely studied, rare variant discovery has only recently become feasible with next-generation sequencing.
Objective: To utilize a combined family-based and case-control approach to exome sequencing in BD using multiplex families as an initial discovery strategy, followed by association testing in a large case-control meta-analysis.
DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses.
View Article and Find Full Text PDFWhole genome sequencing was completed on 1,325 individuals from 602 families, identifying 27 million autosomal variants. Genetic association tests were conducted for those individuals who had been assessed for one or more of 17 endophenotypes (N range = 802-1,185). No significant associations were found.
View Article and Find Full Text PDFNat Genet
September 2013
DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone.
View Article and Find Full Text PDFBipolar disorder (BP) is a disabling and often life-threatening disorder that affects approximately 1% of the population worldwide. To identify genetic variants that increase the risk of BP, we genotyped on the Illumina HumanHap550 Beadchip 2,076 bipolar cases and 1,676 controls of European ancestry from the National Institute of Mental Health Human Genetics Initiative Repository, and the Prechter Repository and samples collected in London, Toronto, and Dundee. We imputed SNP genotypes and tested for SNP-BP association in each sample and then performed meta-analysis across samples.
View Article and Find Full Text PDF