Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier.

J Biomed Inform

Department of Computer Science and Engineering, NIT Rourkela, Orissa 769008, India. Electronic address:

Published: April 2016

AI Article Synopsis

  • Microarray-based gene expression profiling is crucial for cancer-related tasks like classification and diagnosis, but it generates large, constantly changing datasets.
  • Identifying significantly expressed genes is essential for understanding cancer, and current methods typically use a two-phase approach: feature selection/extraction followed by classification.
  • The paper proposes new statistical methods using MapReduce for feature selection and implements a K-nearest neighbor classifier, demonstrating that these models process big data more efficiently than traditional methods within a Hadoop framework.

Article Abstract

Microarray-based gene expression profiling has emerged as an efficient technique for classification, prognosis, diagnosis, and treatment of cancer. Frequent changes in the behavior of this disease generates an enormous volume of data. Microarray data satisfies both the veracity and velocity properties of big data, as it keeps changing with time. Therefore, the analysis of microarray datasets in a small amount of time is essential. They often contain a large amount of expression, but only a fraction of it comprises genes that are significantly expressed. The precise identification of genes of interest that are responsible for causing cancer are imperative in microarray data analysis. Most existing schemes employ a two-phase process such as feature selection/extraction followed by classification. In this paper, various statistical methods (tests) based on MapReduce are proposed for selecting relevant features. After feature selection, a MapReduce-based K-nearest neighbor (mrKNN) classifier is also employed to classify microarray data. These algorithms are successfully implemented in a Hadoop framework. A comparative analysis is done on these MapReduce-based models using microarray datasets of various dimensions. From the obtained results, it is observed that these models consume much less execution time than conventional models in processing big data.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2016.03.002DOI Listing

Publication Analysis

Top Keywords

microarray data
12
analysis microarray
8
big data
8
microarray datasets
8
data
7
microarray
5
analysis
4
microarray leukemia
4
leukemia data
4
data efficient
4

Similar Publications

Human adenovirus type 36 (HAdV-D36) has been putatively linked to obesity in animals and has been associated with obesity in humans in some but not all studies. Despite extensive epidemiological research there is limited information about its receptor profile. We investigated the receptor portfolio of HAdV-D36 using a combined structural biology and virology approach.

View Article and Find Full Text PDF

High-Security Data Encryption Enabled by DNA Multi-Strand Solid-Phase Hybridization and Displacement in Inkjet-Printed Microarrays.

ACS Appl Mater Interfaces

January 2025

Biomanufacturing Center, Department of Mechanical Engineering, Tsinghua University, Beijing 100084, China.

Multicolor fluorescent encryption systems that respond to specific stimuli have drawn widespread attention to data storage and encryption due to their low cost and facile data access. However, existing encryption systems are limited by encryption materials, restricting their encryption depth. This study uses DNA molecules as encryption materials that offer exceptional specificity and encryption depth within sequences.

View Article and Find Full Text PDF

The zona glomerulosa (ZG) synthesizes the mineralocorticoid aldosterone. The primary role of aldosterone is the maintenance of volume and electrolyte homeostasis. Aldosterone synthesis is primarily regulated via tightly controlled oscillations in intracellular calcium levels in response to stimulation.

View Article and Find Full Text PDF

Imaging-based spatial transcriptomics (ST) is evolving rapidly as a pivotal technology in studying the biology of tumors and their associated microenvironments. However, the strengths of the commercially available ST platforms in studying spatial biology have not been systematically evaluated using rigorously controlled experiments. In this study, we used serial 5-µm sections of formalin-fixed, paraffin-embedded surgically resected lung adenocarcinoma and pleural mesothelioma tumor samples in tissue microarrays to compare the performance of the single cell ST platforms CosMx, MERFISH, and Xenium (uni/multi-modal) platforms in reference to bulk RNA sequencing, multiplex immunofluorescence, GeoMx Digital Spatial Profiler, and hematoxylin and eosin staining data for the same samples.

View Article and Find Full Text PDF

Expanding the clinical spectrum of 19p13.3 microduplication syndrome: a case report highlighting nephrotic syndrome and literature review.

BMC Pediatr

January 2025

Pediatric Internal Medicine, Yantai Yuhuangding Hospital, No.20 Yuhuangding East Road, Zhifu District, Yantai City, Shandong, 264000, China.

Background: Common clinical findings in patients with 19p13.3 duplication include intrauterine growth restriction, intellectual disability, developmental delay, microcephaly, and distinctive facial features. In this study, we report the case of a patient with 19p13.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!