Gene promoter and enhancer sequences are bound by transcription factors and are depleted of methylated CpG sites (cytosines preceding guanines in DNA). The absence of methylated CpGs in these sequences typically correlates with increased gene expression, indicating a regulatory role for methylation. We used nanopore sequencing to determine haplotype-specific methylation rates of 15.
View Article and Find Full Text PDFBackground: In 2021, the American College of Medical Genetics and Genomics (ACMG) recommended reporting actionable genotypes in 73 genes associated with diseases for which preventive or therapeutic measures are available. Evaluations of the association of actionable genotypes in these genes with life span are currently lacking.
Methods: We assessed the prevalence of coding and splice variants in genes on the ACMG Secondary Findings, version 3.
Microsatellites are polymorphic tracts of short tandem repeats with one to six base-pair (bp) motifs and are some of the most polymorphic variants in the genome. Using 6084 Icelandic parent-offspring trios we estimate 63.7 (95% CI: 61.
View Article and Find Full Text PDFMemory T-cell responses following SARS-CoV-2 infection have been extensively investigated but many studies have been small with a limited range of disease severity. Here we analyze SARS-CoV-2 reactive T-cell responses in 768 convalescent SARS-CoV-2-infected (cases) and 500 uninfected (controls) Icelanders. The T-cell responses are stable three to eight months after SARS-CoV-2 infection, irrespective of disease severity and even those with the mildest symptoms induce broad and persistent T-cell responses.
View Article and Find Full Text PDFDetailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data. Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank.
View Article and Find Full Text PDFAge-related hearing impairment (ARHI) is the most common sensory disorder in older adults. We conducted a genome-wide association meta-analysis of 121,934 ARHI cases and 591,699 controls from Iceland and the UK. We identified 21 novel sequence variants, of which 13 are rare, under either additive or recessive models.
View Article and Find Full Text PDFLong-read sequencing (LRS) promises to improve the characterization of structural variants (SVs). We generated LRS data from 3,622 Icelanders and identified a median of 22,636 SVs per individual (a median of 13,353 insertions and 9,474 deletions). We discovered a set of 133,886 reliably genotyped SV alleles and imputed them into 166,281 individuals to explore their effects on diseases and other traits.
View Article and Find Full Text PDFThe success of genome-wide association studies (GWAS) in identifying common, low-penetrance variant-cancer associations for the past decade is undisputed. However, discovering additional high-penetrance cancer mutations in unknown cancer predisposing genes requires detection of variant-cancer association of ultra-rare coding variants. Consequently, large-scale next-generation sequence data with associated phenotype information are needed.
View Article and Find Full Text PDFIron is essential for many biological functions and iron deficiency and overload have major health implications. We performed a meta-analysis of three genome-wide association studies from Iceland, the UK and Denmark of blood levels of ferritin (N = 246,139), total iron binding capacity (N = 135,430), iron (N = 163,511) and transferrin saturation (N = 131,471). We found 62 independent sequence variants associating with iron homeostasis parameters at 56 loci, including 46 novel loci.
View Article and Find Full Text PDFThousands of genomic structural variants (SVs) segregate in the human population and can impact phenotypic traits and diseases. Their identification in whole-genome sequence data of large cohorts is a major computational challenge. Most current approaches identify SVs in single genomes and afterwards merge the identified variants into a joint call set across many genomes.
View Article and Find Full Text PDFA major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.
View Article and Find Full Text PDFDespite the important role that monozygotic twins have played in genetics research, little is known about their genomic differences. Here we show that monozygotic twins differ on average by 5.2 early developmental mutations and that approximately 15% of monozygotic twins have a substantial number of these early developmental mutations specific to one of them.
View Article and Find Full Text PDFMotivation: Data analysis is requisite on reliable data. In genetics this includes verifying that the sample is not contaminated with another, a problem ubiquitous in biology.
Results: In human, and other diploid species, DNA contamination from the same species can be found by the presence of three haplotypes between polymorphic SNPs.
Pelvic organ prolapse (POP) is a downward descent of one or more of the pelvic organs, resulting in a protrusion of the vaginal wall and/or uterus. We performed a genome-wide association study of POP using data from Iceland and the UK Biobank, a total of 15,010 cases with hospital-based diagnosis code and 340,734 female controls, and found eight sequence variants at seven loci associating with POP (P < 5 × 10); seven common (minor allele frequency >5%) and one with minor allele frequency of 4.87%.
View Article and Find Full Text PDFSummary: popSTR2 is an update and augmentation of our previous work 'popSTR: a population-based microsatellite genotyper'. To make genotyping sensitive to inter-sample differences, we supply a kernel to estimate sample-specific slippage rates. For clinical sequencing purposes, a panel of known pathogenic repeat expansions is provided along with a script that scans and flags for manual inspection markers indicative of a pathogenic expansion.
View Article and Find Full Text PDFAnalysis of sequence diversity in the human genome is fundamental for genetic studies. Structural variants (SVs) are frequently omitted in sequence analysis studies, although each has a relatively large impact on the genome. Here, we present GraphTyper2, which uses pangenome graphs to genotype SVs and small variants using short-reads.
View Article and Find Full Text PDFGenetic diversity arises from recombination and de novo mutation (DNM). Using a combination of microarray genotype and whole-genome sequence data on parent-child pairs, we identified 4,531,535 crossover recombinations and 200,435 DNMs. The resulting genetic map has a resolution of 682 base pairs.
View Article and Find Full Text PDFTwo familial forms of colorectal cancer (CRC), Lynch syndrome (LS) and familial adenomatous polyposis (FAP), are caused by rare mutations in DNA mismatch repair genes (MLH1, MSH2, MSH6, PMS2) and the genes APC and MUTYH, respectively. No information is available on the presence of high-risk CRC mutations in the Romanian population. We performed whole-genome sequencing of 61 Romanian CRC cases with a family history of cancer and/or early onset of disease, focusing the analysis on candidate variants in the LS and FAP genes.
View Article and Find Full Text PDF