X-linked genetic disorders typically affect females less severely than males owing to the presence of a second X Chromosome not carrying the deleterious variant. However, the phenotypic expression in females is highly variable, which may be explained by an allelic skew in X-Chromosome inactivation. Accurate measurement of X inactivation skew is crucial to understand and predict disease phenotype in carrier females, with prediction especially relevant for degenerative conditions.
View Article and Find Full Text PDFscPipe is a flexible R/Bioconductor package originally developed to analyse platform-independent single-cell RNA-Seq data. To expand its preprocessing capability to accommodate new single-cell technologies, we further developed scPipe to handle single-cell ATAC-Seq and multi-modal (RNA-Seq and ATAC-Seq) data. After executing multiple data cleaning steps to remove duplicated reads, low abundance features and cells of poor quality, a object is created that contains a sparse count matrix with features of interest in the rows and cells in the columns.
View Article and Find Full Text PDF1.0 introduced intuitive, point-and-click interactive graphics for differential gene expression analysis. Here, we present a major update to that brings improved interactivity and reproducibility using high-level visualization frameworks for R and JavaScript.
View Article and Find Full Text PDFBackground: Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied.
View Article and Find Full Text PDFA modified Chromium 10x droplet-based protocol that subsamples cells for both short-read and long-read (nanopore) sequencing together with a new computational pipeline (FLAMES) is developed to enable isoform discovery, splicing analysis, and mutation detection in single cells. We identify thousands of unannotated isoforms and find conserved functional modules that are enriched for alternative transcript usage in different cell types and species, including ribosome biogenesis and mRNA splicing. Analysis at the transcript level allows data integration with scATAC-seq on individual promoters, improved correlation with protein expression data, and linked mutations known to confer drug resistance to transcriptome heterogeneity.
View Article and Find Full Text PDFA key benefit of long-read nanopore sequencing technology is the ability to detect modified DNA bases, such as 5-methylcytosine. The lack of R/Bioconductor tools for the effective visualization of nanopore methylation profiles between samples from different experimental groups led us to develop the NanoMethViz R package. Our software can handle methylation output generated from a range of different methylation callers and manages large datasets using a compressed data format.
View Article and Find Full Text PDFCD1c presents lipid-based antigens to CD1c-restricted T cells, which are thought to be a major component of the human T cell pool. However, the study of CD1c-restricted T cells is hampered by the presence of an abundantly expressed, non-T cell receptor (TCR) ligand for CD1c on blood cells, confounding analysis of TCR-mediated CD1c tetramer staining. Here, we identified the CD36 family (CD36, SR-B1, and LIMP-2) as ligands for CD1c, CD1b, and CD1d proteins and showed that CD36 is the receptor responsible for non-TCR-mediated CD1c tetramer staining of blood cells.
View Article and Find Full Text PDFApplication of Oxford Nanopore Technologies' long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information.
View Article and Find Full Text PDFDespite advances in single-cell multi-omics, a single stem or progenitor cell can only be tested once. We developed clonal multi-omics, in which daughters of a clone act as surrogates of the founder, thereby allowing multiple independent assays per clone. With SIS-seq, clonal siblings in parallel "sister" assays are examined either for gene expression by RNA sequencing (RNA-seq) or for fate in culture.
View Article and Find Full Text PDFNAR Genom Bioinform
September 2020
RNA-seq datasets can contain millions of intron reads per library that are typically removed from downstream analysis. Only reads overlapping annotated exons are considered to be informative since mature mRNA is assumed to be the major component sequenced, especially for poly(A) RNA libraries. In this study, we show that intron reads are informative, and through exploratory data analysis of read coverage that intron signal is representative of both pre-mRNAs and intron retention.
View Article and Find Full Text PDFLong-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.
View Article and Find Full Text PDFMotivation: Bioinformatic analysis of single-cell gene expression data is a rapidly evolving field. Hundreds of bespoke methods have been developed in the past few years to deal with various aspects of single-cell analysis and consensus on the most appropriate methods to use under different settings is still emerging. Benchmarking the many methods is therefore of critical importance and since analysis of single-cell data usually involves multi-step pipelines, effective evaluation of pipelines involving different combinations of methods is required.
View Article and Find Full Text PDFTumors are composed of phenotypically heterogeneous cancer cells that often resemble various differentiation states of their lineage of origin. Within this hierarchy, it is thought that an immature subpopulation of tumor-propagating cancer stem cells (CSCs) differentiates into non-tumorigenic progeny, providing a rationale for therapeutic strategies that specifically eradicate CSCs or induce their differentiation. The clinical success of these approaches depends on CSC differentiation being unidirectional rather than reversible, yet this question remains unresolved even in prototypically hierarchical malignancies, such as acute myeloid leukemia (AML).
View Article and Find Full Text PDFThe Bioconductor project, a large collection of open source software for the comprehension of large-scale biological data, continues to grow with new packages added each week, motivating the development of software tools focused on exposing package metadata to developers and users. The resulting BiocPkgTools package facilitates access to extensive metadata in computable form covering the Bioconductor package ecosystem, facilitating downstream applications such as custom reporting, data and text mining of Bioconductor package text descriptions, graph analytics over package dependencies, and custom search approaches. The BiocPkgTools package has been incorporated into the Bioconductor project, installs using standard procedures, and runs on any system supporting R.
View Article and Find Full Text PDFSingle cell RNA-sequencing (scRNA-seq) technology has undergone rapid development in recent years, leading to an explosion in the number of tailored data analysis methods. However, the current lack of gold-standard benchmark datasets makes it difficult for researchers to systematically compare the performance of the many methods available. Here, we generated a realistic benchmark experiment that included single cells and admixtures of cells or RNA to create 'pseudo cells' from up to five distinct cancer cell lines.
View Article and Find Full Text PDFMotivation: graphics for RNA-sequencing and microarray gene expression analyses may contain upwards of tens of thousands of points. Details about certain genes or samples of interest are easily obscured in such dense summary displays. Incorporating interactivity into summary plots would enable additional information to be displayed on demand and facilitate intuitive data exploration.
View Article and Find Full Text PDFInnate lymphoid cells (ILCs) are enriched at mucosal surfaces, where they provide immune surveillance. All ILC subsets develop from a common progenitor that gives rise to pre-committed progenitors for each of the ILC lineages. Currently, the temporal control of gene expression that guides the emergence of these progenitors is poorly understood.
View Article and Find Full Text PDFThe ability to easily and efficiently analyse RNA-sequencing data is a key strength of the Bioconductor project. Starting with counts summarised at the gene-level, a typical analysis involves pre-processing, exploratory data analysis, differential expression testing and pathway analysis with the results obtained informing future experiments and validation studies. In this workflow article, we analyse RNA-sequencing data from the mouse mammary gland, demonstrating use of the popular package to import, organise, filter and normalise the data, followed by the package with its method, linear modelling and empirical Bayes moderation to assess differential expression and perform gene set testing.
View Article and Find Full Text PDFVariations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level.
View Article and Find Full Text PDFPooled library sequencing screens that perturb gene function in a high-throughput manner are becoming increasingly popular in functional genomics research. Irrespective of the mechanism by which loss of function is achieved, via either RNA interference using short hairpin RNAs (shRNAs) or genetic mutation using single guide RNAs (sgRNAs) with the CRISPR-Cas9 system, there is a need to establish optimal analysis tools to handle such data. Our open-source processing pipeline in edgeR provides a complete analysis solution for screen data, that begins with the raw sequence reads and ends with a ranked list of candidate genes for downstream biological validation.
View Article and Find Full Text PDF