The promise of precision medicine lies in data diversity. More than the sheer size of biomedical data, it is the layering of multiple data modalities, offering complementary perspectives, that is thought to enable the identification of patient subgroups with shared pathophysiology. In the present study, we use autism to test this notion.
View Article and Find Full Text PDFThe original version of this paper contained an incorrect primer sequence. In the Methods subsection "Rampage libraries," the text for modification 3 stated that the reverse primer used for library indexing was 5'-CAAGCAGAAGACGGCATACGAGATXXXXXXXXGTGACTGGAGT-3'. The correct sequence of the oligonucleotide used is 5'-CAAGCAGAAGACGGCATACGAGATXXXXXXXXGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3'.
View Article and Find Full Text PDFSpecialized RNA-seq methods are required to identify the 5' ends of transcripts, which are critical for studies of gene regulation, but these methods have not been systematically benchmarked. We directly compared six such methods, including the performance of five methods on a single human cellular RNA sample and a new spike-in RNA assay that helps circumvent challenges resulting from uncertainties in annotation and RNA processing. We found that the 'cap analysis of gene expression' (CAGE) method performed best for mRNA and that most of its unannotated peaks were supported by evidence from other genomic methods.
View Article and Find Full Text PDFLarge-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence.
View Article and Find Full Text PDFAllelic expression analysis has become important for integrating genome and transcriptome data to characterize various biological phenomena such as cis-regulatory variation and nonsense-mediated decay. We analyze the properties of allelic expression read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth. We provide guidelines for correcting such errors, show that our quality control measures improve the detection of relevant allelic expression, and introduce tools for the high-throughput production of allelic expression data from RNA-sequencing data.
View Article and Find Full Text PDFThis unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.
View Article and Find Full Text PDFExperimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.
View Article and Find Full Text PDFBackground: The dramatic reduction in the cost of sequencing has allowed many researchers to join in the effort of sequencing and annotating prokaryotic genomes. Annotation methods vary considerably and may fail to identify some genes. Here we draw attention to a large number of likely genes missing from annotations using common tools such as Glimmer and BLAST.
View Article and Find Full Text PDFBioinformatics
October 2009
Motivation: The roughness of energy landscapes is a major obstacle to protein structure prediction, since it forces conformational searches to spend much time struggling to escape numerous traps. Specifically, beta-sheet formation is prone to stray, since many possible combinations of hydrogen bonds are dead ends in terms of beta-sheet assembly. It has been shown that cooperative terms for backbone hydrogen bonds ease this problem by augmenting hydrogen bond patterns that are consistent with beta sheets.
View Article and Find Full Text PDF