Objectives: The aim of this data paper is to describe a collection of 33 genomic, transcriptomic and epigenomic sequencing datasets of the B-cell acute lymphoblastic leukemia (ALL) cell line REH. REH is one of the most frequently used cell lines for functional studies of pediatric ALL, and these data provide a multi-faceted characterization of its molecular features. The datasets described herein, generated with short- and long-read sequencing technologies, can both provide insights into the complex aberrant karyotype of REH, and be used as reference datasets for sequencing data quality assessment or for methods development.
View Article and Find Full Text PDFBackground: Ovarian cancer is the eighth most common cancer among women and has a 5-year survival of only 30-50%. The survival is close to 90% for patients in stage I but only 20% for patients in stage IV. The presently available biomarkers have insufficient sensitivity and specificity for early detection and there is an urgent need to identify novel biomarkers.
View Article and Find Full Text PDFBackground: Cytosine modifications in DNA such as 5-methylcytosine (5mC) underlie a broad range of developmental processes, maintain cellular lineage specification, and can define or stratify types of cancer and other diseases. However, the wide variety of approaches available to interrogate these modifications has created a need for harmonized materials, methods, and rigorous benchmarking to improve genome-wide methylome sequencing applications in clinical and basic research. Here, we present a multi-platform assessment and cross-validated resource for epigenetics research from the FDA's Epigenomics Quality Control Group.
View Article and Find Full Text PDFThe lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence.
View Article and Find Full Text PDFClinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers.
View Article and Find Full Text PDFThe powerful HiSeq X sequencers with their patterned flowcell technology and fast turnaround times are instrumental for many large-scale genomic and epigenomic studies. However, assessment of DNA methylation by sodium bisulfite treatment results in sequencing libraries of low diversity, which may impact data quality and yield. In this report we assess the quality of WGBS data generated on the HiSeq X system in comparison with data generated on the HiSeq 2500 system and the newly released NovaSeq system.
View Article and Find Full Text PDFThe omnigenic model of complex disease stipulates that the majority of the heritability will be explained by the effects of common variation on genes in the periphery of core disease pathways. Rare variant associations, expected to explain far less of the heritability, may be enriched in core disease genes and thus will be instrumental in the understanding of complex disease pathogenesis and their potential therapeutic targets. Here, using complementary whole-exome sequencing, high-density imputation, and in vitro cellular assays, we identify candidate core genes in the pathogenesis of systemic lupus erythematosus (SLE).
View Article and Find Full Text PDFHere we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry.
View Article and Find Full Text PDFThe sequencing of highly virulent Escherichia coli O104:H4 strains isolated during the outbreak of bloody diarrhea and hemolytic uremic syndrome in Europe in 2011 revealed a genome that contained a Shiga toxin encoding prophage and a plasmid encoding enteroaggregative fimbriae. Here, we present the draft genome sequence of a strain isolated in Sweden from a patient who had travelled to Tunisia in 2010 (E112/10) and was found to differ from the outbreak strains by only 38 SNPs in non-repetitive regions, 16 of which were mapped to the branch to the outbreak strain. We identified putatively adaptive mutations in genes for transporters, outer surface proteins and enzymes involved in the metabolism of carbohydrates.
View Article and Find Full Text PDFA large number of genome-wide association studies have been performed during the past five years to identify associations between SNPs and human complex diseases and traits. The assignment of a functional role for the identified disease-associated SNP is not straight-forward. Genome-wide expression quantitative trait locus (eQTL) analysis is frequently used as the initial step to define a function while allele-specific gene expression (ASE) analysis has not yet gained a wide-spread use in disease mapping studies.
View Article and Find Full Text PDFGenome-wide association analysis on monozygotic twin-pairs offers a route to discovery of gene environment interactions through testing for variability loci associated with sensitivity to individual environment/lifestyle. We present a genome-wide scan of loci associated with intra-pair differences in serum lipid and apolipoprotein levels. We report data for 1,720 monozygotic female twin-pairs from GenomEUtwin project with 2.
View Article and Find Full Text PDFThe insulin-like growth factor 1 receptor (IGF-1R) plays crucial roles in developmental and cancer biology. Most of its biological effects have been ascribed to its tyrosine kinase activity, which propagates signaling through the phosphatidylinositol 3-kinase and mitogen-activated protein kinase pathways. Here, we report that IGF-1 promotes the modification of IGF-1R by small ubiquitin-like modifier protein-1 (SUMO-1) and its translocation to the nucleus.
View Article and Find Full Text PDFPopulation structure can provide novel insight into the human past, and recognizing and correcting for such stratification is a practical concern in gene mapping by many association methodologies. We investigate these patterns, primarily through principal component (PC) analysis of whole genome SNP polymorphism, in 2099 individuals from populations of Northern European origin (Ireland, United Kingdom, Netherlands, Denmark, Sweden, Finland, Australia, and HapMap European-American). The major trends (PC1 and PC2) demonstrate an ability to detect geographic substructure, even over a small area like the British Isles, and this information can then be applied to finely dissect the ancestry of the European-Australian and European-American samples.
View Article and Find Full Text PDFScreening for gene copy-number alterations (CNAs) has improved by applying genome-wide microarrays, where SNP arrays also allow analysis of loss of heterozygozity (LOH). We here analyzed 10 chronic lymphocytic leukemia (CLL) samples using four different high-resolution platforms: BAC arrays (32K), oligonucleotide arrays (185K, Agilent), and two SNP arrays (250K, Affymetrix and 317K, Illumina). Cross-platform comparison revealed 29 concordantly detected CNAs, including known recurrent alterations, which confirmed that all platforms are powerful tools when screening for large aberrations.
View Article and Find Full Text PDFBackground: The aim of this study was to investigate the effect of the plasma concentration of irbesartan, a specific angiotensin II type 1 receptor (AT1R) antagonist, and the blood pressure response in relation to AT1R gene polymorphisms.
Methods: Plasma irbesartan was analyzed in 42 patients with mild-to-moderate hypertension and left ventricular hypertrophy from the Swedish Irbesartan Left Ventricular Hypertrophy Investigation vs. Atenolol (SILVHIA) trial, who were treated with irbesartan as monotherapy for 12 weeks.
We studied how well the European CEU samples used in the Haplotype Mapping Project (HapMap) represent five European populations by analyzing nuclear family samples from the Swedish, Finnish, Dutch, British and Australian (European ancestry) populations. The number of samples from each population (about 30 parent-offspring trios) was similar to that in the HapMap sample sets. A panel of 186 single nucleotide polymorphisms (SNPs) distributed over the 1.
View Article and Find Full Text PDFObjective: To determine whether genetic variants of the interferon regulatory factor 5 (IRF-5) and Tyk-2 genes are associated with rheumatoid arthritis (RA).
Methods: Five single-nucleotide polymorphisms (SNPs) in IRF5 and 3 SNPs in Tyk2 were analyzed in a Swedish cohort of 1,530 patients with RA and 881 controls. A replication study was performed in a Dutch cohort of 387 patients with RA and 181 controls.
To survey the quality of SNP genotyping, a joint Nordic quality assessment (QA) round was organized between 11 laboratories in the Nordic and Baltic countries. The QA round involved blinded genotyping of 47 DNA samples for 18 or six randomly selected SNPs. The methods used by the participating laboratories included all major platforms for small- to medium-size SNP genotyping.
View Article and Find Full Text PDFThis study demonstrates an array-based platform to genotype simultaneously single nucleotide polymorphisms (SNPs) and some short insertions/deletions (indels) by the integration of the universal tag/anti-tag (TAT) system, liquid-phase primer extension (LIPEX), and a novel two-color detection strategy on an array format (TATLIPEXA). The TAT system permits a universal chip to be used for many applications, and the LIPEX simplifies the sample preparation but improves the sensitivity significantly. More importantly, all SNPs and some short indels can be interrogated in a single reaction with only two fluorescent ddNTPs.
View Article and Find Full Text PDFHypertension is prevalent, affecting approx 20--25% of the adult population in the Western world. Primary hypertension is a multifactorial, complex disorder where many genes and genetic variants are assumed to interact with environmental factors in order to produce the specific blood pressure level for a given individual. Family and twin studies show that between 30 and 60% of blood pressure variation is determined by genetic factors.
View Article and Find Full Text PDFBackground: Our aim was to determine whether the change in left ventricular (LV) mass in response to antihypertensive treatment could be predicted by multivariate analysis of single nucleotide polymorphisms (SNPs) in candidate genes reflecting pathways likely to be involved in blood pressure control.
Methods: Patients with mild to moderate primary hypertension and LV hypertrophy were randomized in a double-blind fashion to treatment with either the angiotensin II type 1 receptor antagonist irbesartan (n = 48) or the beta1 adrenoreceptor blocker atenolol (n = 49). A microarray-based minisequencing system was used for genotyping 74 SNPs in 25 genes.
Background: Each of the human genes or transcriptional units is likely to contain single nucleotide polymorphisms that may give rise to sequence variation between individuals and tissues on the level of RNA. Based on recent studies, differential expression of the two alleles of heterozygous coding single nucleotide polymorphisms (SNPs) may be frequent for human genes. Methods with high accuracy to be used in a high throughput setting are needed for systematic surveys of expressed sequence variation.
View Article and Find Full Text PDF