Background: Estimating the genetic component of a complex phenotype is a complicated problem, mainly because there are many allele effects to estimate from a limited number of phenotypes. In spite of this difficulty, linear methods with variable selection have been able to give good predictions of additive effects of individuals. However, prediction of non-additive genetic effects is challenging with the usual prediction methods.
View Article and Find Full Text PDFMotivation: Cellular identity and behavior is controlled by complex gene regulatory networks. Transcription factors (TFs) bind to specific DNA sequences to regulate the transcription of their target genes. On the basis of these TF motifs in cis-regulatory elements we can model the influence of TFs on gene expression.
View Article and Find Full Text PDFRemodeling of chromatin accessibility is necessary for successful reprogramming of fibroblasts to neurons. However, it is still not fully known which transcription factors can induce a neuronal chromatin accessibility profile when overexpressed in fibroblasts. To identify such transcription factors, we used ATAC-sequencing to generate differential chromatin accessibility profiles between human fibroblasts and iNeurons, an in vitro neuronal model system obtained by overexpression of Neurog2 in induced pluripotent stem cells (iPSCs).
View Article and Find Full Text PDFCell-based small molecule screening is an effective strategy leading to new medicines. Scientists in the pharmaceutical industry as well as in academia have made tremendous progress in developing both large-scale and smaller-scale screening assays. However, an accessible and universal technology for measuring large numbers of molecular and cellular phenotypes in many samples in parallel is not available.
View Article and Find Full Text PDFWhole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of ∼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression.
View Article and Find Full Text PDFNeurons derived from human induced Pluripotent Stem Cells (hiPSCs) provide a promising new tool for studying neurological disorders. In the past decade, many protocols for differentiating hiPSCs into neurons have been developed. However, these protocols are often slow with high variability, low reproducibility, and low efficiency.
View Article and Find Full Text PDFBackground: During early embryonic development, one of the two X chromosomes in mammalian female cells is inactivated to compensate for a potential imbalance in transcript levels with male cells, which contain a single X chromosome. Here, we use mouse female embryonic stem cells (ESCs) with non-random X chromosome inactivation (XCI) and polymorphic X chromosomes to study the dynamics of gene silencing over the inactive X chromosome by high-resolution allele-specific RNA-seq.
Results: Induction of XCI by differentiation of female ESCs shows that genes proximal to the X-inactivation center are silenced earlier than distal genes, while lowly expressed genes show faster XCI dynamics than highly expressed genes.
Objectives: Pharmacogenetic studies of tumour necrosis factor inhibitors (TNFi) response in patients with rheumatoid arthritis (RA) have largely relied on the changes in complex disease scores, such as disease activity score 28 (DAS28), as a measure of treatment response. It is expected that genetic architecture of such complex score is heterogeneous and not very suitable for pharmacogenetic studies. We aimed to select the most optimal phenotype for TNFi response using heritability estimates.
View Article and Find Full Text PDFThrombocytopenia with absent radii (TAR) syndrome is a rare disorder combining specific skeletal abnormalities with a reduced platelet count. Rare proximal microdeletions of 1q21.1 are found in the majority of patients but are also found in unaffected parents.
View Article and Find Full Text PDFNearly three-quarters of the 143 genetic signals associated with platelet and erythrocyte phenotypes identified by meta-analyses of genome-wide association (GWA) studies are located at non-protein-coding regions. Here, we assessed the role of candidate regulatory variants associated with cell type-restricted, closely related hematological quantitative traits in biologically relevant hematopoietic cell types. We used formaldehyde-assisted isolation of regulatory elements followed by next-generation sequencing (FAIRE-seq) to map regions of open chromatin in three primary human blood cells of the myeloid lineage.
View Article and Find Full Text PDFThe blood group Vel was discovered 60 years ago, but the underlying gene is unknown. Individuals negative for the Vel antigen are rare and are required for the safe transfusion of patients with antibodies to Vel. To identify the responsible gene, we sequenced the exomes of five individuals negative for the Vel antigen and found that four were homozygous and one was heterozygous for a low-frequency 17-nucleotide frameshift deletion in the gene encoding the 78-amino-acid transmembrane protein SMIM1.
View Article and Find Full Text PDFShort insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations.
View Article and Find Full Text PDFAnaemia is a chief determinant of global ill health, contributing to cognitive impairment, growth retardation and impaired physical capacity. To understand further the genetic factors influencing red blood cells, we carried out a genome-wide association study of haemoglobin concentration and related parameters in up to 135,367 individuals. Here we identify 75 independent genetic loci associated with one or more red blood cell phenotypes at P < 10(-8), which together explain 4-9% of the phenotypic variance per trait.
View Article and Find Full Text PDFThe exon-junction complex (EJC) performs essential RNA processing tasks. Here, we describe the first human disorder, thrombocytopenia with absent radii (TAR), caused by deficiency in one of the four EJC subunits. Compound inheritance of a rare null allele and one of two low-frequency SNPs in the regulatory regions of RBM8A, encoding the Y14 subunit of EJC, causes TAR.
View Article and Find Full Text PDFGenome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated.
View Article and Find Full Text PDFGray platelet syndrome (GPS) is a predominantly recessive platelet disorder that is characterized by mild thrombocytopenia with large platelets and a paucity of α-granules; these abnormalities cause mostly moderate but in rare cases severe bleeding. We sequenced the exomes of four unrelated individuals and identified NBEAL2 as the causative gene; it has no previously known function but is a member of a gene family that is involved in granule development. Silencing of nbeal2 in zebrafish abrogated thrombocyte formation.
View Article and Find Full Text PDFSummary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project.
View Article and Find Full Text PDFSmall insertions and deletions (indels) are a common and functionally important type of sequence polymorphism. Most of the focus of studies of sequence variation is on single nucleotide variants (SNVs) and large structural variants. In principle, high-throughput sequencing studies should allow identification of indels just as SNVs.
View Article and Find Full Text PDFRecent studies have shown that linkage disequilibrium (LD) between single-nucleotide polymorphism (SNP) markers is widespread. Assuming linkage equilibrium has been shown to cause false positives in linkage studies where parental genotypes are not available. Therefore, linkage analysis methods that can deal with LD are required to accurately analyze SNP marker data sets.
View Article and Find Full Text PDFWe propose an analytical approximation method for the estimation of multipoint identity by descent (IBD) probabilities in pedigrees containing a moderate number of distantly related individuals. We show that in large pedigrees where cases are related through untyped ancestors only, it is possible to formulate the hidden Markov model of the Lander-Green algorithm in terms of the IBD configurations of the cases. We use a first-order Markov approximation to model the changes in this IBD-configuration variable along the chromosome.
View Article and Find Full Text PDFWe present CVMHAPLO, a probabilistic method for haplotyping in general pedigrees with many markers. CVMHAPLO reconstructs the haplotypes by assigning in every iteration a fixed number of the ordered genotypes with the highest marginal probability, conditioned on the marker data and ordered genotypes assigned in previous iterations. CVMHAPLO makes use of the cluster variation method (CVM) to efficiently estimate the marginal probabilities.
View Article and Find Full Text PDFBackground: Computing exact multipoint LOD scores for extended pedigrees rapidly becomes infeasible as the number of markers and untyped individuals increase. When markers are excluded from the computation, significant power may be lost. Therefore accurate approximate methods which take into account all markers are desirable.
View Article and Find Full Text PDF