Publications by authors named "Yang I Li"

Only a third of immune-associated loci from genome-wide association studies (GWAS) colocalize with expression quantitative trait loci (eQTLs). To learn about causal genes and mechanisms at the remaining loci, we created a unified single-cell chromatin accessibility (scATAC-seq) map in peripheral blood comprising a total of 282,424 cells from 48 individuals. Clustering and topic modeling of scATAC data identified discrete cell-types and continuous cell states, which helped reveal disease-relevant cellular contexts, and allowed mapping of genetic effects on chromatin accessibility across these contexts.

View Article and Find Full Text PDF

Alternative splicing (AS) in human genes is widely viewed as a mechanism for enhancing proteomic diversity. AS can also impact gene expression levels without increasing protein diversity by producing 'unproductive' transcripts that are targeted for rapid degradation by nonsense-mediated decay (NMD). However, the relative importance of this regulatory mechanism remains underexplored.

View Article and Find Full Text PDF

Human genetics has emerged as one of the most dynamic areas of biology, with a broadening societal impact. In this review, we discuss recent achievements, ongoing efforts, and future challenges in the field. Advances in technology, statistical methods, and the growing scale of research efforts have all provided many insights into the processes that have given rise to the current patterns of genetic variation.

View Article and Find Full Text PDF

Alternative splicing (AS) is pervasive in human genes, yet the specific function of most AS events remains unknown. It is widely assumed that the primary function of AS is to diversify the proteome, however AS can also influence gene expression levels by producing transcripts rapidly degraded by nonsense-mediated decay (NMD). Currently, there are no precise estimates for how often the coupling of AS and NMD (AS-NMD) impacts gene expression levels because rapidly degraded NMD transcripts are challenging to capture.

View Article and Find Full Text PDF

Obesity-associated morbidity is exacerbated by abdominal obesity, which can be measured as the waist-to-hip ratio adjusted for the body mass index (WHRadjBMI). Here we identify genes associated with obesity and WHRadjBMI and characterize allele-sensitive enhancers that are predicted to regulate WHRadjBMI genes in women. We found that several waist-to-hip ratio-associated variants map within primate-specific Alu retrotransposons harboring a DNA motif associated with adipocyte differentiation.

View Article and Find Full Text PDF

Long introns with short exons in vertebrate genes are thought to require spliceosome assembly across exons (exon definition), rather than introns, thereby requiring transcription of an exon to splice an upstream intron. Here, we developed CoLa-seq (co-transcriptional lariat sequencing) to investigate the timing and determinants of co-transcriptional splicing genome wide. Unexpectedly, 90% of all introns, including long introns, can splice before transcription of a downstream exon, indicating that exon definition is not obligatory for most human introns.

View Article and Find Full Text PDF
Article Synopsis
  • Researchers investigated the role of ADAR-mediated RNA editing in understanding genetic variants linked to inflammatory diseases, highlighting its significance in disease mechanisms.
  • They identified over 30,000 cis-RNA editing quantitative trait loci (edQTLs) across different human tissues, revealing a strong connection with autoimmune diseases.
  • The study suggests that reduced RNA editing may enhance immune responses and inflammation, implicating dsRNA editing as an important, yet overlooked, factor in common inflammatory diseases.
View Article and Find Full Text PDF

Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues. Pangolin outperforms state-of-the-art methods for predicting RNA splicing on a variety of prediction tasks.

View Article and Find Full Text PDF

Background: Alternative cleavage and polyadenylation (APA), an RNA processing event, occurs in over 70% of human protein-coding genes. APA results in mRNA transcripts with distinct 3' ends. Most APA occurs within 3' UTRs, which harbor regulatory elements that can impact mRNA stability, translation, and localization.

View Article and Find Full Text PDF

Background: The vast majority of trait-associated variants identified using genome-wide association studies (GWAS) are noncoding, and therefore assumed to impact gene regulation. However, the majority of trait-associated loci are unexplained by regulatory quantitative trait loci (QTLs).

Results: We perform a comprehensive characterization of the putative mechanisms by which GWAS loci impact human immune traits.

View Article and Find Full Text PDF

Single-cell RNA sequencing (scRNA-seq) technology is poised to replace bulk cell RNA sequencing for many biological and medical applications as it allows users to measure gene expression levels in a cell type-specific manner. However, data produced by scRNA-seq often exhibit batch effects that can be specific to a cell type, to a sample, or to an experiment, which prevent integration or comparisons across multiple experiments. Here, we present Dmatch, a method that leverages an external expression atlas of human primary cells and kernel density matching to align multiple scRNA-seq experiments for downstream biological analysis.

View Article and Find Full Text PDF

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci.

View Article and Find Full Text PDF

Motivation: Next-generation sequencing is rapidly improving diagnostic rates in rare Mendelian diseases, but even with whole genome or whole exome sequencing, the majority of cases remain unsolved. Increasingly, RNA sequencing is being used to solve many cases that evade diagnosis through sequencing alone. Specifically, the detection of aberrant splicing in many rare disease patients suggests that identifying RNA splicing outliers is particularly useful for determining causal Mendelian disease genes.

View Article and Find Full Text PDF

Background: Single-cell RNA-sequencing (scRNA-seq) is a rapidly evolving technology that enables measurement of gene expression levels at an unprecedented resolution. Despite the explosive growth in the number of cells that can be assayed by a single experiment, scRNA-seq still has several limitations, including high rates of dropouts, which result in a large number of genes having zero read count in the scRNA-seq data, and complicate downstream analyses.

Methods: To overcome this problem, we treat zeros as missing values and develop nonparametric deep learning methods for imputation.

View Article and Find Full Text PDF

The molecular mechanisms that govern the maturation of oligodendrocyte lineage cells remain unclear. Emerging studies have shown that N-methyladenosine (mA), the most common internal RNA modification of mammalian mRNA, plays a critical role in various developmental processes. Here, we demonstrate that oligodendrocyte lineage progression is accompanied by dynamic changes in mA modification on numerous transcripts.

View Article and Find Full Text PDF

Most complex traits, including diseases, have a large genetic component. Identifying the genetic variants and genes underlying phenotypic variation remains one of the most important objectives of current biomedical research. Unlike Mendelian or familial diseases, which are usually caused by mutations in the coding regions of individual genes, complex diseases are thought to result from the cumulative effects of a large number of variants, of which, the vast majority are noncoding.

View Article and Find Full Text PDF

Early genome-wide association studies (GWASs) led to the surprising discovery that, for typical complex traits, most of the heritability is due to huge numbers of common variants with tiny effect sizes. Previously, we argued that new models are needed to understand these patterns. Here, we provide a formal model in which genetic contributions to complex traits are partitioned into direct effects from core genes and indirect effects from peripheral genes acting in trans.

View Article and Find Full Text PDF

Quantification of gene expression levels at the single cell level has revealed that gene expression can vary substantially even across a population of homogeneous cells. However, it is currently unclear what genomic features control variation in gene expression levels, and whether common genetic variants may impact gene expression variation. Here, we take a genome-wide approach to identify expression variance quantitative trait loci (vQTLs).

View Article and Find Full Text PDF

Genome-wide association studies (GWAS) have identified over 41 susceptibility loci associated with Parkinson's Disease (PD) but identifying putative causal genes and the underlying mechanisms remains challenging. Here, we leverage large-scale transcriptomic datasets to prioritize genes that are likely to affect PD by using a transcriptome-wide association study (TWAS) approach. Using this approach, we identify 66 gene associations whose predicted expression or splicing levels in dorsolateral prefrontal cortex (DLFPC) and peripheral monocytes are significantly associated with PD risk.

View Article and Find Full Text PDF

In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.

View Article and Find Full Text PDF

The splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population.

View Article and Find Full Text PDF

Here we use deep sequencing to identify sources of variation in mRNA splicing in the dorsolateral prefrontal cortex (DLPFC) of 450 subjects from two aging cohorts. Hundreds of aberrant pre-mRNA splicing events are reproducibly associated with Alzheimer's disease. We also generate a catalog of splicing quantitative trait loci (sQTL) effects: splicing of 3,006 genes is influenced by genetic variation.

View Article and Find Full Text PDF

The excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable splicing events from short-read RNA-seq data and finds events of high complexity.

View Article and Find Full Text PDF

Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes.

View Article and Find Full Text PDF

A PHP Error was encountered

Severity: Warning

Message: fopen(/var/lib/php/sessions/ci_sessionjefjlb552f8v6j83out240ts9a517f87): Failed to open stream: No space left on device

Filename: drivers/Session_files_driver.php

Line Number: 177

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once

A PHP Error was encountered

Severity: Warning

Message: session_start(): Failed to read session data: user (path: /var/lib/php/sessions)

Filename: Session/Session.php

Line Number: 137

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once