Alternative splicing and multiple transcription start and termination sites can produce a diverse repertoire of mRNA transcript variants from a given gene. While the full picture of the human transcriptome is still incomplete, publicly available RNA datasets have enabled the assembly of transcripts. Using publicly available deep sequencing data from 927 human samples across 48 tissues, we quantified known and new transcript variants, provide an interactive, browser-based application Splice-O-Mat and demonstrate its relevance using adhesion G protein-coupled receptors (aGPCRs) as an example.
View Article and Find Full Text PDFMotivation: Pooling multiple samples increases the efficiency and lowers the cost of DNA sequencing. One approach to multiplexing is to use short DNA indices to uniquely identify each sample. After sequencing, reads must be assigned in silico to the sample of origin, a process referred to as demultiplexing.
View Article and Find Full Text PDFThe sequencing of libraries containing molecules shorter than the read length, such as in ancient or forensic applications, may result in the production of reads that include the adaptor, and in paired reads that overlap one another. Challenges for the processing of such reads are the accurate identification of the adaptor sequence and accurate reconstruction of the original sequence most likely to have given rise to the observed read(s). We introduce an algorithm that removes the adaptors and reconstructs the original DNA sequences using a Bayesian maximum a posteriori probability approach.
View Article and Find Full Text PDFWe present the DNA sequence of 17,367 protein-coding genes in two Neandertals from Spain and Croatia and analyze them together with the genome sequence recently determined from a Neandertal from southern Siberia. Comparisons with present-day humans from Africa, Europe, and Asia reveal that genetic diversity among Neandertals was remarkably low, and that they carried a higher proportion of amino acid-changing (nonsynonymous) alleles inferred to alter protein structure or function than present-day humans. Thus, Neandertals across Eurasia had a smaller long-term effective population than present-day humans.
View Article and Find Full Text PDFMotivation: The conversion of the raw intensities obtained from next-generation sequencing platforms into nucleotide sequences with well-calibrated quality scores is a critical step in the generation of good sequence data. While recent model-based approaches can yield highly accurate calls, they require a substantial amount of processing time and/or computational resources. We previously introduced Ibis, a fast and accurate basecaller for the Illumina platform.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
February 2013
Hominins with morphology similar to present-day humans appear in the fossil record across Eurasia between 40,000 and 50,000 y ago. The genetic relationships between these early modern humans and present-day human populations have not been established. We have extracted DNA from a 40,000-y-old anatomically modern human from Tianyuan Cave outside Beijing, China.
View Article and Find Full Text PDFWe present a DNA library preparation method that has allowed us to reconstruct a high-coverage (30×) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of "missing evolution" in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans.
View Article and Find Full Text PDFUsing DNA extracted from a finger bone found in Denisova Cave in southern Siberia, we have sequenced the genome of an archaic hominin to about 1.9-fold coverage. This individual is from a group that shares a common origin with Neanderthals.
View Article and Find Full Text PDFAlthough the last few years have seen great progress in DNA sequence retrieval from fossil specimens, some of the characteristics of ancient DNA remain poorly understood. This is particularly true for blocking lesions, i.e.
View Article and Find Full Text PDFNeandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development.
View Article and Find Full Text PDFHigh-throughput sequencing technologies have opened up a new avenue for studying extinct organisms. Here we identify and quantify biases introduced by particular characteristics of ancient DNA samples. These analyses demonstrate the importance of closely related genomic sequence for correctly identifying and classifying bona fide endogenous DNA fragments.
View Article and Find Full Text PDFDNA sequences determined from ancient organisms have high error rates, primarily due to uracil bases created by cytosine deamination. We use synthetic oligonucleotides, as well as DNA extracted from mammoth and Neandertal remains, to show that treatment with uracil-DNA-glycosylase and endonuclease VIII removes uracil residues from ancient DNA and repairs most of the resulting abasic sites, leaving undamaged parts of the DNA fragments intact. Neandertal DNA sequences determined with this protocol have greatly increased accuracy.
View Article and Find Full Text PDFWe present a method of targeted DNA sequence retrieval from DNA sources which are heavily degraded and contaminated with microbial DNA, as is typical of ancient bones. The method greatly reduces sample destruction and sequencing demands relative to direct PCR or shotgun sequencing approaches. We used this method to reconstruct the complete mitochondrial DNA (mtDNA) genomes of five Neandertals from across their geographic range.
View Article and Find Full Text PDFThe Illumina Genome Analyzer generates millions of short sequencing reads. We present Ibis (Improved base identification system), an accurate, fast and easy-to-use base caller that significantly reduces the error rate and increases the output of usable reads. Ibis is faster and more robust with respect to chemistry and technology than other publicly available packages.
View Article and Find Full Text PDFAlthough the emergence of high-throughput sequencing technologies has enabled whole-genome sequencing from extinct organisms, little progress has been made in accelerating targeted sequencing from highly degraded DNA. Here, we present a novel and highly sensitive method for targeted sequencing of ancient and degraded DNA, which couples multiplex PCR directly with sample barcoding and high-throughput sequencing. Using this approach, we obtained a 96% complete mitochondrial genome data set from 31 cave bear (Ursus spelaeus) samples using only two 454 Life Sciences (Roche) GS FLX runs.
View Article and Find Full Text PDFAnalysis of Neandertal DNA holds great potential for investigating the population history of this group of hominins, but progress has been limited due to the rarity of samples and damaged state of the DNA. We present a method of targeted ancient DNA sequence retrieval that greatly reduces sample destruction and sequencing demands and use this method to reconstruct the complete mitochondrial DNA (mtDNA) genomes of five Neandertals from across their geographic range. We find that mtDNA genetic diversity in Neandertals that lived 38,000 to 70,000 years ago was approximately one-third of that in contemporary modern humans.
View Article and Find Full Text PDFA complete mitochondrial (mt) genome sequence was reconstructed from a 38,000 year-old Neandertal individual with 8341 mtDNA sequences identified among 4.8 Gb of DNA generated from approximately 0.3 g of bone.
View Article and Find Full Text PDFUnlabelled: We present a tool suited for searching for many short nucleotide sequences in large databases, allowing for a predefined number of gaps and mismatches. The commandline-driven program implements a non-deterministic automata matching algorithm on a keyword tree of the search strings. Both queries with and without ambiguity codes can be searched.
View Article and Find Full Text PDFParallel tagged sequencing (PTS) is a molecular barcoding method designed to adapt the recently developed high-throughput 454 parallel sequencing technology for use with multiple samples. Unlike other barcoding methods, PTS can be applied to any type of double-stranded DNA (dsDNA) sample, including shotgun DNA libraries and pools of PCR products, and requires no amplification or gel purification steps. The method relies on attaching sample-specific barcoding adapters, which include sequence tags and a restriction site, to blunt-end repaired DNA samples by ligation and strand-displacement.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
September 2007
High-throughput direct sequencing techniques have recently opened the possibility to sequence genomes from Pleistocene organisms. Here we analyze DNA sequences determined from a Neandertal, a mammoth, and a cave bear. We show that purines are overrepresented at positions adjacent to the breaks in the ancient DNA, suggesting that depurination has contributed to its degradation.
View Article and Find Full Text PDFHigh-throughput 454 DNA sequencing technology allows much faster and more cost-effective sequencing than traditional Sanger sequencing. However, the technology imposes inherent limitations on the number of samples that can be processed in parallel. Here we introduce parallel tagged sequencing (PTS), a simple, inexpensive and flexible barcoding technique that can be used for parallel sequencing any number and type of double-stranded nucleic acid samples.
View Article and Find Full Text PDF