Challenges in projecting clustering results across gene expression-profiling datasets.

J Natl Cancer Inst

Department of Experimental Oncology, Fondazione IRCCS Istituto Nazionale dei Tumori, Milano, Italy.

Published: November 2007

Background: Gene expression microarray studies for several types of cancer have been reported to identify previously unknown subtypes of tumors. For breast cancer, a molecular classification consisting of five subtypes based on gene expression microarray data has been proposed. These subtypes have been reported to exist across several breast cancer microarray studies, and they have demonstrated some association with clinical outcome. A classification rule based on the method of centroids has been proposed for identifying the subtypes in new collections of breast cancer samples; the method is based on the similarity of the new profiles to the mean expression profile of the previously identified subtypes.

Methods: Previously identified centroids of five breast cancer subtypes were used to assign 99 breast cancer samples, including a subset of 65 estrogen receptor-positive (ER+) samples, to five breast cancer subtypes based on microarray data for the samples. The effect of mean centering the genes (i.e., transforming the expression of each gene so that its mean expression is equal to 0) on subtype assignment by method of centroids was assessed. Further studies of the effect of mean centering and of class prevalence in the test set on the accuracy of method of centroids classifications of ER status were carried out using training and test sets for which ER status had been independently determined by ligand-binding assay and for which the proportion of ER+ and ER- samples were systematically varied.

Results: When all 99 samples were considered, mean centering before application of the method of centroids appeared to be helpful for correctly assigning samples to subtypes, as evidenced by the expression of genes that had previously been used as markers to identify the subtypes. However, when only the 65 ER+ samples were considered for classification, many samples appeared to be misclassified, as evidenced by an unexpected distribution of ER+ samples among the resultant subtypes. When genes were mean centered before classification of samples for ER status, the accuracy of the ER subgroup assignments was highly dependent on the proportion of ER+ samples in the test set; this effect of subtype prevalence was not seen when gene expression data were not mean centered.

Conclusions: Simple corrections such as mean centering of genes aimed at microarray platform or batch effect correction can have undesirable consequences because patient population effects can easily be confused with these assay-related effects. Careful thought should be given to the comparability of the patient populations before attempting to force data comparability for purposes of assigning subtypes to independent subjects.

Download full-text PDF

Source
http://dx.doi.org/10.1093/jnci/djm216DOI Listing

Publication Analysis

Top Keywords

breast cancer
24
gene expression
16
method centroids
16
er+ samples
16
samples
12
subtypes
10
expression microarray
8
microarray studies
8
subtypes based
8
microarray data
8

Similar Publications

Background: Breast cancer (BC) is the most common cancer in women in the U.S. and a leading cause of cancer-related deaths.

View Article and Find Full Text PDF

Introduction: Triple-negative breast cancer (TNBC) is the most challenging subtype of breast cancer to treat. While previous studies have demonstrated that ginsenoside Rh2 induces apoptosis in TNBC cells, the specific molecular targets and underlying mechanisms remain poorly understood. This study aims to uncover the molecular mechanisms through which ginsenoside Rh2 regulates apoptosis and proliferation in TNBC, offering new insights into its therapeutic potential.

View Article and Find Full Text PDF

The Dynamic Changes of COL11A1 Expression During the Carcinogenesis and Development of Breast Cancer and as a Candidate Diagnostic and Prognostic Marker.

Breast J

January 2025

Tianjin Key Laboratory of Lung Cancer Metastasis and Tumor Microenvironment, Tianjin Lung Cancer Institute, Tianjin Medical University General Hospital, Tianjin 300052, China.

Collagen type XI alpha 1 (COL11A1), a critical member of the collagen superfamily, is essential for tissue structure and integrity. This study aimed to validate previously identified variations in COL11A1 expression during breast cancer carcinogenesis and progression, as well as elucidate their clinical implications. COL11A1 mRNA expression levels were assessed using real-time reverse transcription-PCR (RT-PCR) in 30 pairs of normal breast tissue and primary breast cancer, 30 pairs of primary breast cancer and lymph node metastases, 30 benign tumors, and 107 primary breast cancers.

View Article and Find Full Text PDF

Quantitative immunohistochemistry analysis of breast Ki67 based on artificial intelligence.

Open Life Sci

December 2024

Department of Pathology, Hangzhou Women's Hospital, 369 Kunpeng Road, Shangcheng District, Hangzhou, 310008, Zhejiang, China.

Breast cancer is a common malignant tumor of women. Ki67 is an important biomarker of cell proliferation. With the quantitative analysis, it is an important indicator of malignancy for breast cancer diagnosis.

View Article and Find Full Text PDF

Increasing evidence has shown that physical exercise remarkably inhibits oncogenesis and progression of numerous cancers and exercise-responsive microRNAs (miRNAs) exert a marked role in exercise-mediated tumor suppression. In this research, expression and prognostic values of exercise-responsive miRNAs were examined in breast cancer (BRCA) and further pan-cancer types. In addition, multiple independent public and in-house cohorts, in vitro assays involving multiple, macrophages, fibroblasts, and tumor cells, and in vivo models were utilized to uncover the tumor-suppressive roles of miR-29a-3p in cancers.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!