Colorectal carcinoma (CRC) is a common cause of mortality, but a comprehensive description of its genomic landscape is lacking. Here we perform whole-genome sequencing of 2,023 CRC samples from participants in the UK 100,000 Genomes Project, thereby providing a highly detailed somatic mutational landscape of this cancer. Integrated analyses identify more than 250 putative CRC driver genes, many not previously implicated in CRC or other cancers, including several recurrent changes outside the coding genome.
View Article and Find Full Text PDFTumor genomic profiling is increasingly seen as a prerequisite to guide the treatment of patients with cancer. To explore the value of whole-genome sequencing (WGS) in broadening the scope of cancers potentially amenable to a precision therapy, we analysed whole-genome sequencing data on 10,478 patients spanning 35 cancer types recruited to the UK 100,000 Genomes Project. We identified 330 candidate driver genes, including 74 that are new to any cancer.
View Article and Find Full Text PDFMaturation of eukaryotic pre-mRNAs via splicing and polyadenylation is modulated across cell types and conditions by a variety of RNA-binding proteins (RBPs). Although there exist over 1,500 RBPs in human cells, their binding motifs and functions still remain to be elucidated, especially in the complex environment of tissues and in the context of diseases. To overcome the lack of methods for the systematic and automated detection of sequence motif-guided pre-mRNA processing regulation from RNA sequencing (RNA-Seq) data we have developed MAPP (Motif Activity on Pre-mRNA Processing).
View Article and Find Full Text PDFThe development of cancer is an evolutionary process involving the sequential acquisition of genetic alterations that disrupt normal biological processes, enabling tumor cells to rapidly proliferate and eventually invade and metastasize to other tissues. We investigated the genomic evolution of prostate cancer through the application of three separate classification methods, each designed to investigate a different aspect of tumor evolution. Integrating the results revealed the existence of two distinct types of prostate cancer that arise from divergent evolutionary trajectories, designated as the Canonical and Alternative evolutionary disease types.
View Article and Find Full Text PDFThe usage of alternative terminal exons results in messenger RNA (mRNA) isoforms that differ in their 3' untranslated regions (3' UTRs) and often also in their protein-coding sequences. Alternative 3' UTRs contain different sets of cis-regulatory elements known to regulate mRNA stability, translation and localization, all of which are vital to cell identity and function. In previous work, we revealed that ∼25 percent of the experimentally observed RNA 3' ends are located within regions currently annotated as intronic, indicating that many 3' end isoforms remain to be uncovered.
View Article and Find Full Text PDFMutational signature analysis is commonly performed in cancer genomic studies. Here, we present SigProfilerExtractor, an automated tool for extraction of mutational signatures, and benchmark it against another 13 bioinformatics tools by using 34 scenarios encompassing 2,500 simulated signatures found in 60,000 synthetic genomes and 20,000 synthetic exomes. For simulations with 5% noise, reflecting high-quality datasets, SigProfilerExtractor outperforms other approaches by elucidating between 20% and 50% more true-positive signatures while yielding 5-fold less false-positive signatures.
View Article and Find Full Text PDFViruses have evolved numerous mechanisms to exploit the molecular machinery of their host cells, including the broad spectrum of host RNA-binding proteins (RBPs). However, the RBP interactomes of most viruses are largely unknown. To shed light on the interaction landscape of RNA viruses with human host cell RBPs, we have analysed 197 single-stranded RNA (ssRNA) viral genome sequences and found that the majority of ssRNA virus genomes are significantly enriched or depleted in motifs for specific human RBPs, suggesting selection pressure on these interactions.
View Article and Find Full Text PDFThe International Virus Bioinformatics Meeting 2022 took place online, on 23-25 March 2022, and has attracted about 380 participants from all over the world. The goal of the meeting was to provide a meaningful and interactive scientific environment to promote discussion and collaboration and to inspire and suggest new research directions and questions. The participants created a highly interactive scientific environment even without physical face-to-face interactions.
View Article and Find Full Text PDFWe report an autosomal recessive, multi-organ tumor predisposition syndrome, caused by bi-allelic loss-of-function germline variants in the base excision repair (BER) gene MBD4. We identified five individuals with bi-allelic MBD4 variants within four families and these individuals had a personal and/or family history of adenomatous colorectal polyposis, acute myeloid leukemia, and uveal melanoma. MBD4 encodes a glycosylase involved in repair of G:T mismatches resulting from deamination of 5'-methylcytosine.
View Article and Find Full Text PDFThe novel betacoronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a worldwide pandemic (COVID-19) after emerging in Wuhan, China. Here we analyzed public host and viral RNA sequencing data to better understand how SARS-CoV-2 interacts with human respiratory cells. We identified genes, isoforms and transposable element families that are specifically altered in SARS-CoV-2-infected respiratory cells.
View Article and Find Full Text PDFGenerated by 3' end cleavage and polyadenylation at alternative polyadenylation (poly(A)) sites, alternative terminal exons account for much of the variation between human transcript isoforms. More than a dozen protocols have been developed so far for capturing and sequencing RNA 3' ends from a variety of cell types and species. In previous studies, we have used these data to uncover novel regulatory signals and cell type-specific isoforms.
View Article and Find Full Text PDFSequencing of RNA 3' ends has uncovered numerous sites that do not correspond to the termination sites of known transcripts. Through their 3' untranslated regions, protein-coding RNAs interact with RNA-binding proteins and microRNAs, which regulate many properties, including RNA stability and subcellular localization. We developed the terminal exon characterization (TEC) tool ( http://tectool.
View Article and Find Full Text PDFmiRNAs are small RNAs that regulate gene expression post-transcriptionally. By repressing the translation and promoting the degradation of target mRNAs, miRNAs may reduce the cell-to-cell variability in protein expression, induce correlations between target expression levels, and provide a layer through which targets can influence each other's expression as "competing RNAs" (ceRNAs). However, experimental evidence for these behaviors is limited.
View Article and Find Full Text PDF3' Untranslated regions (3' UTRs) length is regulated in relation to cellular state. To uncover key regulators of poly(A) site use in specific conditions, we have developed PAQR, a method for quantifying poly(A) site use from RNA sequencing data and KAPAC, an approach that infers activities of oligomeric sequence motifs on poly(A) site choice. Application of PAQR and KAPAC to RNA sequencing data from normal and tumor tissue samples uncovers motifs that can explain changes in cleavage and polyadenylation in specific cancers.
View Article and Find Full Text PDFStudies in the last decade have revealed a complex and dynamic variety of pre-mRNA cleavage and polyadenylation reactions. mRNAs with long 3' untranslated regions (UTRs) are generated in differentiated cells whereas proliferating cells preferentially express transcripts with short 3'UTRs. We describe the A-seq protocol, now at its second version, which was developed to map polyadenylation sites genome-wide and study the regulation of pre-mRNA 3' end processing.
View Article and Find Full Text PDFBackground: The transition between epithelial and mesenchymal phenotypes (EMT) occurs in a variety of contexts. It is critical for mammalian development and it is also involved in tumor initiation and progression. Master transcription factor (TF) regulators of this process are conserved between mouse and human.
View Article and Find Full Text PDFThe unprecedented outbreak of Ebola in West Africa resulted in over 28,000 cases and 11,000 deaths, underlining the need for a better understanding of the biology of this highly pathogenic virus to develop specific counter strategies. Two filoviruses, the Ebola and Marburg viruses, result in a severe and often fatal infection in humans. However, bats are natural hosts and survive filovirus infections without obvious symptoms.
View Article and Find Full Text PDFAlternative polyadenylation (APA) is a general mechanism of transcript diversification in mammals, which has been recently linked to proliferative states and cancer. Different 3' untranslated region (3' UTR) isoforms interact with different RNA-binding proteins (RBPs), which modify the stability, translation, and subcellular localization of the corresponding transcripts. Although the heterogeneity of pre-mRNA 3' end processing has been established with high-throughput approaches, the mechanisms that underlie systematic changes in 3' UTR lengths remain to be characterized.
View Article and Find Full Text PDFBackground: Understanding the regulation of gene expression, including transcription start site usage, alternative splicing, and polyadenylation, requires accurate quantification of expression levels down to the level of individual transcript isoforms. To comparatively evaluate the accuracy of the many methods that have been proposed for estimating transcript isoform abundance from RNA sequencing data, we have used both synthetic data as well as an independent experimental method for quantifying the abundance of transcript ends at the genome-wide level.
Results: We found that many tools have good accuracy and yield better estimates of gene-level expression compared to commonly used count-based approaches, but they vary widely in memory and runtime requirements.