Subclonal reconstruction algorithms use bulk DNA sequencing data to quantify parameters of tumor evolution, allowing an assessment of how cancers initiate, progress and respond to selective pressures. We launched the ICGC-TCGA (International Cancer Genome Consortium-The Cancer Genome Atlas) DREAM Somatic Mutation Calling Tumor Heterogeneity and Evolution Challenge to benchmark existing subclonal reconstruction algorithms. This 7-year community effort used cloud computing to benchmark 31 subclonal reconstruction algorithms on 51 simulated tumors.
View Article and Find Full Text PDFSummary: Existing clustering methods for characterizing cell populations from single-cell RNA sequencing are constrained by several limitations stemming from the fact that clusters often cannot be homogeneous, particularly for transitioning populations. On the other hand, dominant cell populations within samples can be identified independently by their strong gene co-expression signatures using methods unrelated to partitioning. Here, we introduce a clustering method, CASCC (co-expression-assisted single-cell clustering), designed to improve biological accuracy using gene co-expression features identified using an unsupervised adaptive attractor algorithm.
View Article and Find Full Text PDFCancer Metastasis Rev
September 2024
We identified a progenitor cell population highly enriched in samples from invasive and chemo-resistant carcinomas, characterized by a well-defined multigene signature including APOD, DCN, and LUM. This cell population has previously been labeled as consisting of inflammatory cancer-associated fibroblasts (iCAFs). The same signature characterizes naturally occurring fibro-adipogenic progenitors (FAPs) as well as stromal cells abundant in normal adipose tissue.
View Article and Find Full Text PDFCancer aggressiveness has been linked with obesity, and studies have shown that adipose tissue can enhance cancer progression. In this issue of Cancer Research, Hosni and colleagues discover a paracrine mechanism mediated by adipocyte precursor cells through which urothelial carcinomas become resistant to erdafitinib, a recently approved therapy inhibiting fibroblast growth factor receptors (FGFR). They identified neuregulin 1 (NRG1) secreted by adipocyte precursor cells as an activator of HER3 signaling that enables resistance.
View Article and Find Full Text PDFDuring the last ten years, many research results have been referring to a particular type of cancer-associated fibroblasts associated with poor prognosis, invasiveness, metastasis and resistance to therapy in multiple cancer types, characterized by a gene expression signature with prominent presence of genes COL11A1, THBS2 and INHBA. Identifying the underlying biological mechanisms responsible for their creation may facilitate the discovery of targets for potential pan-cancer therapeutics. Using a novel computational approach for single-cell gene expression data analysis identifying the dominant cell populations in a sequence of samples from patients at various stages, we conclude that these fibroblasts are produced by a pan-cancer cellular transition originating from a particular type of adipose-derived stromal cells naturally present in the stromal vascular fraction of normal adipose tissue, having a characteristic gene expression signature.
View Article and Find Full Text PDFAnalysis of large gene expression datasets from biopsies of cancer patients can identify co-expression signatures representing particular biomolecular events in cancer. Some of these signatures involve genomically co-localized genes resulting from the presence of copy number alterations (CNAs), for which analysis of the expression of the underlying genes provides valuable information about their combined role as oncogenes or tumor suppressor genes. Here we focus on the discovery and interpretation of such signatures that are present in multiple cancer types due to driver amplifications and deletions in particular regions of the genome after doing a comprehensive analysis combining both gene expression and CNA data from The Cancer Genome Atlas.
View Article and Find Full Text PDFSummary: We developed 2DImpute, an imputation method for correcting false zeros (known as dropouts) in single-cell RNA-sequencing (scRNA-seq) data. It features preventing excessive correction by predicting the false zeros and imputing their values by making use of the interrelationships between both genes and cells in the expression matrix. We showed that 2DImpute outperforms several leading imputation methods by applying it on datasets from various scRNA-seq protocols.
View Article and Find Full Text PDFIEEE/ACM Trans Comput Biol Bioinform
January 2022
Bulk samples of the same patient are heterogeneous in nature, comprising of different subpopulations (subclones) of cancer cells. Cells in a tumor subclone are characterized by unique mutational genotype profile. Resolving tumor heterogeneity by estimating the genotypes, cellular proportions and the number of subclones present in the tumor can help in understanding cancer progression and treatment.
View Article and Find Full Text PDFTumor DNA sequencing data can be interpreted by computational methods that analyze genomic heterogeneity to infer evolutionary dynamics. A growing number of studies have used these approaches to link cancer evolution with clinical progression and response to therapy. Although the inference of tumor phylogenies is rapidly becoming standard practice in cancer genome analyses, standards for evaluating them are lacking.
View Article and Find Full Text PDFTumors are heterogeneous in the sense that they consist of multiple subpopulations of cells, referred to as subclones, each of which is characterized by a distinct profile of genomic variations such as somatic mutations. Inferring the underlying clonal landscape has become an important topic in that it can help in understanding cancer development and progression, and thereby help in improving treatment. We describe a novel state-space model, based on the feature allocation framework and an efficient sequential Monte Carlo (SMC) algorithm, using the somatic mutation data obtained from tumor samples to estimate the number of subclones, as well as their characterization.
View Article and Find Full Text PDFSimilar environmental risk factors have been implicated in different neuropsychiatric disorders (including major psychiatric and neurodegenerative diseases), indicating the existence of common epigenetic mechanisms underlying the pathogenesis shared by different illnesses. To investigate such commonality, we applied an unsupervised computational approach identifying several consensus co-expression and co-methylation signatures from a data cohort of postmortem prefrontal cortex (PFC) samples from individuals with six different neuropsychiatric disorders-schizophrenia, bipolar disorder, major depression, alcoholism, Alzheimer's and Parkinson's-as well as healthy controls. Among our results, we identified a pair of strongly interrelated co-expression and co-methylation (E-M) signatures showing consistent and significant disease association in multiple types of disorders.
View Article and Find Full Text PDFWe performed an extensive immunogenomic analysis of more than 10,000 tumors comprising 33 diverse cancer types by utilizing data compiled by TCGA. Across cancer types, we identified six immune subtypes-wound healing, IFN-γ dominant, inflammatory, lymphocyte depleted, immunologically quiet, and TGF-β dominant-characterized by differences in macrophage or lymphocyte signatures, Th1:Th2 cell ratio, extent of intratumoral heterogeneity, aneuploidy, extent of neoantigen load, overall cell proliferation, expression of immunomodulatory genes, and prognosis. Specific driver mutations correlated with lower (CTNNB1, NRAS, or IDH1) or higher (BRAF, TP53, or CASP8) leukocyte levels across all cancers.
View Article and Find Full Text PDFExploring linkage disequilibrium (LD) patterns among the single nucleotide polymorphism (SNP) sites can improve the accuracy and cost-effectiveness of genomic association studies, whereby representative (tag) SNPs are identified to sufficiently represent the genomic diversity in populations. There has been considerable amount of effort in developing efficient algorithms to select tag SNPs from the growing large-scale data sets. Methods using the classical pairwise-LD and multi-locus LD measures have been proposed that aim to reduce the computational complexity and to increase the accuracy, respectively.
View Article and Find Full Text PDFCancer Epidemiol Biomarkers Prev
December 2014
Background: The winning model of the Sage Bionetworks/DREAM Breast Cancer Prognosis Challenge made use of several molecular features, called attractor metagenes, as well as another metagene defined by the average expression level of the two genes FGD3 and SUSD3. This is a follow-up study toward developing a breast cancer prognostic test derived from and improving upon that model.
Methods: We designed a feature selector facility calculating the prognostic scores of combinations of features, including those that we had used earlier, as well as those used in existing breast cancer biomarker assays, identifying the optimal selection of features for the test.
EURASIP J Bioinform Syst Biol
May 2014
Copy number variations (CNVs) are abundant in the human genome. They have been associated with complex traits in genome-wide association studies (GWAS) and expected to continue playing an important role in identifying the etiology of disease phenotypes. As a result of current high throughput whole-genome single-nucleotide polymorphism (SNP) arrays, we currently have datasets that simultaneously have integer copy numbers in CNV regions as well as SNP genotypes.
View Article and Find Full Text PDFJanus kinase-2 (JAK2) supports breast cancer growth, and clinical trials testing JAK2 inhibitors are under way. In addition to the tumor epithelium, JAK2 is also expressed in other tissues including immune cells; whether the JAK2 mRNA levels in breast tumors correlate with outcomes has not been evaluated. Using a case-control design, JAK2 mRNA was measured in 223 archived breast tumors and associations with distant recurrence were evaluated by logistic regression.
View Article and Find Full Text PDFBackground: Anti-angiogenesis is a validated strategy to treat cancer, with efficacy in controlling both primary tumor growth and metastasis. The role of the Notch family of proteins in tumor angiogenesis is still emerging, but recent data suggest that Notch signaling may function in the physiologic response to loss of VEGF signaling, and thus participate in tumor adaptation to VEGF inhibitors.
Methods: We asked whether combining Notch and VEGF blockade would enhance suppression of tumor angiogenesis and growth, using the NGP neuroblastoma model.
Background: DNA pooling constitutes a cost effective alternative in genome wide association studies. In DNA pooling, equimolar amounts of DNA from different individuals are mixed into one sample and the frequency of each allele in each position is observed in a single genotype experiment. The identification of haplotype frequencies from pooled data in addition to single locus analysis is of separate interest within these studies as haplotypes could increase statistical power and provide additional insight.
View Article and Find Full Text PDFThe accuracy with which cancer phenotypes can be predicted by selecting and combining molecular features is compromised by the large number of potential features available. In an effort to design a robust prognostic model to predict breast cancer survival, we hypothesized that signatures consisting of genes that are coexpressed in multiple cancer types should correspond to molecular events that are prognostic in all cancers, including breast cancer. We previously identified several such signatures--called attractor metagenes--in an analysis of multiple tumor types.
View Article and Find Full Text PDFMining gene expression profiles has proven valuable for identifying signatures serving as surrogates of cancer phenotypes. However, the similarities of such signatures across different cancer types have not been strong enough to conclude that they represent a universal biological mechanism shared among multiple cancer types. Here we present a computational method for generating signatures using an iterative process that converges to one of several precise attractors defining signatures representing biomolecular events, such as cell transdifferentiation or the presence of an amplicon.
View Article and Find Full Text PDFBackground: Typically, the first phase of a genome wide association study (GWAS) includes genotyping across hundreds of individuals and validation of the most significant SNPs. Allelotyping of pooled genomic DNA is a common approach to reduce the overall cost of the study. Knowledge of haplotype structure can provide additional information to single locus analyses.
View Article and Find Full Text PDFMany large genome-wide association studies include nuclear families with more than one child (trio families), allowing for analysis of differences between siblings (sib pair analysis). Statistical power can be increased when haplotypes are used instead of genotypes. Currently, haplotype inference in families with more than one child can be performed either using the familial information or statistical information derived from the population samples but not both.
View Article and Find Full Text PDFGene expression profiling has provided insights into different cancer types and revealed tissue-specific expression signatures. Alterations in microRNA expression contribute to the pathogenesis of many types of human diseases. Few studies have integrated all levels of gene expression, miRNA and methylation to uncover correlations between these data types.
View Article and Find Full Text PDF