Novel and simple transformation algorithm for combining microarray data sets.

Ki-Yeol Kim Dong Hyuk Ki Ha Jin Jeong Hei-Cheul Jeung Hyun Cheol Chung Sun Young Rha

BMC Bioinformatics

Oral Cancer Research Institute, Yonsei University College of Dentistry, Seoul, Korea.

Published: June 2007

Background: With microarray technology, variability in experimental environments such as RNA sources, microarray production, or the use of different platforms, can cause bias. Such systematic differences present a substantial obstacle to the analysis of microarray data, resulting in inconsistent and unreliable information. Therefore, one of the most pressing challenges in the field of microarray technology is how to integrate results from different microarray experiments or combine data sets prior to the specific analysis.

Results: Two microarray data sets based on a 17k cDNA microarray system were used, consisting of 82 normal colon mucosa and 72 colorectal cancer tissues. Each data set was prepared from either total RNA or amplified mRNA, and the difference of RNA source between these two data sets was detected by ANOVA (Analysis of variance) model. A simple integration method was introduced which was based on the distributions of gene expression ratios among different microarray data sets. The method transformed gene expression ratios into the form of a reference data set on a gene by gene basis. Hierarchical clustering analysis, density and box plots, and mixture scores with correlation coefficients revealed that the two data sets were well intermingled, indicating that the proposed method minimized the experimental bias. In addition, any RNA source effect was not detected by the proposed transformation method. In the mixed data set, two previously identified subgroups of normal and tumor were well separated, and the efficiency of integration was more prominent in tumor groups than normal groups. The transformation method was slightly more effective when a data set with strong homogeneity in the same experimental group was used as a reference data set.

Conclusion: Proposed method is simple but useful to combine several data sets from different experimental conditions. With this method, biologically useful information can be detectable by applying various analytic methods to the combined data set with increased sample size.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914088	PMC
http://dx.doi.org/10.1186/1471-2105-8-218	DOI Listing

Publication Analysis

Top Keywords

data sets

data set

microarray data

data

microarray

microarray technology

combine data

rna source

gene expression

expression ratios

Similar Publications

Nutritional intelligence in the food system: Combining food, health, data and AI expertise.

Nutr Bull

January 2025

Queen's University Belfast, Belfast, UK.

Danielle I McCarthy

Transformative change is needed across the food system to improve health and environmental outcomes. As food, nutrition, environmental and health data are generated beyond human scale, there is an opportunity for technological tools to support multifactorial, integrated, scalable approaches to address the complexities of dietary behaviour change. Responsible technology could act as a mechanistic conduit between research, policy, industry and society, enabling timely, informed decision making and action by all stakeholders across the food system.

View Article and Find Full Text PDF

Similar Publications

Chromosome-level Genome Assembly of Korean Long-tailed Chicken and Pangenome of 40 Gallus gallus Assemblies.

Sci Data

January 2025

Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.

Hanshin D Shin Wonchoul Park Han-Ha Chai Youngho Lee Jaehoon Jung

This study presents the first chromosome-level genome assembly of the Korean long-tailed chicken (KLC), a unique breed of Gallus gallus known as Ginkkoridak. Our assembly achieved a super contig N50 of 5.7 Mbp and a scaffold N50 exceeding 90 Mb, with a genome completeness of 96.

View Article and Find Full Text PDF

Similar Publications

The competitive esports physiological, affective, and video dataset.

Sci Data

January 2025

Department of Psychology, Stanford University, Stanford, USA.

Maciej Behnke Wadim Krzyżaniak Jan Nowak Szymon Kupiński Patrycja Chwiłkowska

Esports refers to competitive video gaming where individuals compete against each other in organized tournaments for prize money. Here, we present the Competitive Esports Physiological, Affective, and Video (CEPAV) dataset, in which 300 male Counter Strike: Global Offensive gamers participated in a study aimed at optimizing affect during esports tournament. The CEPAV dataset includes (1) physiological data, capturing the player's cardiovascular responses from before, during, and after over 3000 CS: GO matches; (2) self-reported affective data, detailing the affective states experienced before gameplay; and (3) video data, providing a visual record of 552 in-laboratory gaming sessions.

View Article and Find Full Text PDF

Similar Publications

Chromosome-level genome assembly, annotation, and population genomic resource of argali (Ovis ammon).

Sci Data

January 2025

Key Laboratory of Ecological Safety and Sustainable Development in Arid Lands, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, 830011, China.

Mu-Yang Wang Bao-Lin Zhang Qi-Qi Liang Xin-Ming Lian Ke Zhang

Argali stands as the largest species among wild sheep in Central and East Asia, with a concerning rate of decline estimated at 30%. The intraspecific taxonomy of argali remains contentious due to limited genomic data and unclear geographic separation. In this study, we constructed a chromosome-level genome assembly and annotation for the Tibetan argali (O.

View Article and Find Full Text PDF

Similar Publications

Chromosome-scale genome assembly of three-spotted seahorse (Hippocampus trimaculatus) with a unique karyotype.

Sci Data

January 2025

Laboratory of Aquatic Genomics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, 518057, China.

Ning Li Xinhui Zhang Xin Liu Xueqiang Lin Cancan Hu

Three-spotted seahorse (Hippocampi trimaculata) is a unique fish with important economic and medicinal values, and its total chromosome number is potentially quite different from other seahorse species. Herein, we constructed a chromosome-level genome assembly for this special seahorse by integration of MGI short-read, PacBio HiFi long-read and Hi-C sequencing techniques. A 416.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!