Reuse of imputed data in microarray analysis increases imputation efficiency.

BMC Bioinformatics

School of engineering, Information and Communications University, 103-6 Munji-dong, Yusung-gu, Daejon 305-714, South Korea.

Published: October 2004

Background: The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked.

Results: We developed a new cluster-based imputation method called sequential K-nearest neighbor (SKNN) method. This imputes the missing values sequentially from the gene having least missing values, and uses the imputed values for the later imputation. Although it uses the imputed values, the efficiency of this new method is greatly improved in its accuracy and computational complexity over the conventional KNN-based method and other methods based on maximum likelihood estimation. The performance of SKNN was in particular higher than other imputation methods for the data with high missing rates and large number of experiments. Application of Expectation Maximization (EM) to the SKNN method improved the accuracy, but increased computational time proportional to the number of iterations. The Multiple Imputation (MI) method, which is well known but not applied previously to microarray data, showed a similarly high accuracy as the SKNN method, with slightly higher dependency on the types of data sets.

Conclusions: Sequential reuse of imputed data in KNN-based imputation greatly increases the efficiency of imputation. The SKNN method should be practically useful to save the data of some microarray experiments which have high amounts of missing entries. The SKNN method generates reliable imputed values which can be used for further cluster-based analysis of microarray data.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC528735PMC
http://dx.doi.org/10.1186/1471-2105-5-160DOI Listing

Publication Analysis

Top Keywords

sknn method
20
microarray data
16
imputed values
16
missing values
12
data
10
imputation
9
method
9
reuse imputed
8
imputed data
8
data microarray
8

Similar Publications

Enhancers, a class of distal cis-regulatory elements located in the non-coding region of DNA, play a key role in gene regulation. It is difficult to identify enhancers from DNA sequence data because enhancers are freely distributed in the non-coding region, with no specific sequence features, and having a long distance with the targeted promoters. Therefore, this study presents a stacking ensemble learning method to accurately identify enhancers and classify enhancers into strong and weak enhancers.

View Article and Find Full Text PDF

Classification of Epileptic EEG Signals Using Synchrosqueezing Transform and Machine Learning.

Int J Neural Syst

May 2021

Department of Electrical and Electronics Engineering, Izmir University of Economics, Balcova 35330, Izmir, Turkey.

Epilepsy is a neurological disease that is very common worldwide. Patient's electroencephalography (EEG) signals are frequently used for the detection of epileptic seizure segments. In this paper, a high-resolution time-frequency (TF) representation called Synchrosqueezing Transform (SST) is used to detect epileptic seizures.

View Article and Find Full Text PDF

Reuse of imputed data in microarray analysis increases imputation efficiency.

BMC Bioinformatics

October 2004

School of engineering, Information and Communications University, 103-6 Munji-dong, Yusung-gu, Daejon 305-714, South Korea.

Background: The imputation of missing values is necessary for the efficient use of DNA microarray data, because many clustering algorithms and some statistical analysis require a complete data set. A few imputation methods for DNA microarray data have been introduced, but the efficiency of the methods was low and the validity of imputed values in these methods had not been fully checked.

Results: We developed a new cluster-based imputation method called sequential K-nearest neighbor (SKNN) method.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!