The epidermal growth factor receptor, EGFR, is frequently activated in lung cancer and glioblastoma by genomic alterations including missense mutations. The different mutation spectra in these diseases are reflected in divergent responses to EGFR inhibition: significant patient benefit in lung cancer, but limited in glioblastoma. Here, we report a comprehensive mutational analysis of EGFR function.
View Article and Find Full Text PDFStructural variants (SVs) rearrange large segments of DNA and can have profound consequences in evolution and human disease. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD) have become integral in the interpretation of single-nucleotide variants (SNVs). However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs.
View Article and Find Full Text PDFUnlike most tumor suppressor genes, the most common genetic alterations in tumor protein p53 (TP53) are missense mutations. Mutant p53 protein is often abundantly expressed in cancers and specific allelic variants exhibit dominant-negative or gain-of-function activities in experimental models. To gain a systematic view of p53 function, we interrogated loss-of-function screens conducted in hundreds of human cancer cell lines and performed TP53 saturation mutagenesis screens in an isogenic pair of TP53 wild-type and null cell lines.
View Article and Find Full Text PDFSummary: We present an updated version of our computational pipeline, PathSeq, for the discovery and identification of microbial sequences in genomic and transcriptomic libraries from eukaryotic hosts. This pipeline is available in the Genome Analysis Toolkit (GATK) as a suite of configurable tools that can report the microbial composition of DNA or RNA short-read sequencing samples and identify unknown sequences for downstream assembly of novel organisms. GATK PathSeq enables sample analysis in minutes at low cost.
View Article and Find Full Text PDFStructural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements.
View Article and Find Full Text PDFClinical exome sequencing routinely identifies missense variants in disease-related genes, but functional characterization is rarely undertaken, leading to diagnostic uncertainty. For example, mutations in PPARG cause Mendelian lipodystrophy and increase risk of type 2 diabetes (T2D). Although approximately 1 in 500 people harbor missense variants in PPARG, most are of unknown consequence.
View Article and Find Full Text PDFComplete knowledge of the genetic variation in individual human genomes is a crucial foundation for understanding the etiology of disease. Genetic variation is typically characterized by sequencing individual genomes and comparing reads to a reference. Existing methods do an excellent job of detecting variants in approximately 90% of the human genome; however, calling variants in the remaining 10% of the genome (largely low-complexity sequence and segmental duplications) is challenging.
View Article and Find Full Text PDFCichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs.
View Article and Find Full Text PDFBackground: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence).
View Article and Find Full Text PDFThe discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae.
View Article and Find Full Text PDFBackground: Copy number variants (CNVs) account for substantial variation between genomes and are a major source of normal and pathogenic phenotypic differences. The dog is an ideal model to investigate mutational mechanisms that generate CNVs as its genome lacks a functional ortholog of the PRDM9 gene implicated in recombination and CNV formation in humans. Here we comprehensively assay CNVs using high-density array comparative genomic hybridization in 50 dogs from 17 dog breeds and 3 gray wolves.
View Article and Find Full Text PDFExceptionally accurate genome reference sequences have proven to be of great value to microbial researchers. Thus, to date, about 1800 bacterial genome assemblies have been "finished" at great expense with the aid of manual laboratory and computational processes that typically iterate over a period of months or even years. By applying a new laboratory design and new assembly algorithm to 16 samples, we demonstrate that assemblies exceeding finished quality can be obtained from whole-genome shotgun data and automated computation.
View Article and Find Full Text PDFLow-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome.
View Article and Find Full Text PDFMassively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach.
View Article and Find Full Text PDFSoft-tissue sarcomas, which result in approximately 10,700 diagnoses and 3,800 deaths per year in the United States, show remarkable histologic diversity, with more than 50 recognized subtypes. However, knowledge of their genomic alterations is limited. We describe an integrative analysis of DNA sequence, copy number and mRNA expression in 207 samples encompassing seven major subtypes.
View Article and Find Full Text PDFDomestic animals are excellent models for genetic studies of phenotypic evolution. They have evolved genetic adaptations to a new environment, the farm, and have been subjected to strong human-driven selection leading to remarkable phenotypic changes in morphology, physiology and behaviour. Identifying the genetic changes underlying these developments provides new insight into general mechanisms by which genetic variation shapes phenotypic diversity.
View Article and Find Full Text PDFPhytophthora infestans is the most destructive pathogen of potato and a model organism for the oomycetes, a distinct lineage of fungus-like eukaryotes that are related to organisms such as brown algae and diatoms. As the agent of the Irish potato famine in the mid-nineteenth century, P. infestans has had a tremendous effect on human history, resulting in famine and population displacement.
View Article and Find Full Text PDFWe report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation.
View Article and Find Full Text PDFChromosome 17 is unusual among the human chromosomes in many respects. It is the largest human autosome with orthology to only a single mouse chromosome, mapping entirely to the distal half of mouse chromosome 11. Chromosome 17 is rich in protein-coding genes, having the second highest gene density in the genome.
View Article and Find Full Text PDFBackground: Visual assessment of cerebrospinal fluid (CSF) for xanthochromia (yellow color) is practiced by the majority of laboratories worldwide as a means of diagnosing intracranical bleeds.
Methods: Colorimetric and spectrophotometric analysis of CSF samples for recognizing the presence of bilirubin either in low concentrations or in the presence of hemolysed blood.
Results: The experiments provide the physiological and colorimetric basis for abandoning visual assessment of CSF for xanthochromia.
The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences.
View Article and Find Full Text PDF