Compressed NN: K-Nearest Neighbors with Data Compression.

Entropy (Basel)

Computer Technology Department, University of Alicante, 03080 Alicante, Spain.

Published: February 2019

The NN (k-nearest neighbors) classification algorithm is one of the most widely used non-parametric classification methods, however it is limited due to memory consumption related to the size of the dataset, which makes them impractical to apply to large volumes of data. Variations of this method have been proposed, such as condensed KNN which divides the training dataset into clusters to be classified, other variations reduce the input dataset in order to apply the algorithm. This paper presents a variation of the NN algorithm, of the type structure less NN, to work with categorical data. Categorical data, due to their nature, can be compressed in order to decrease the memory requirements at the time of executing the classification. The method proposes a previous phase of compression of the data to then apply the algorithm on the compressed data. This allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required. Experiments and tests carried out on known datasets show the reduction in the volume of information stored in memory and maintain the accuracy of the classification. They also show a slight decrease in processing time because the information is decompressed in real time (on-the-fly) while the algorithm is running.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7514715PMC
http://dx.doi.org/10.3390/e21030234DOI Listing

Publication Analysis

Top Keywords

k-nearest neighbors
8
apply algorithm
8
categorical data
8
data
6
algorithm
5
memory
5
compressed k-nearest
4
neighbors data
4
data compression
4
compression k-nearest
4

Similar Publications

Supervised machine learning statistical models for visual outcome prediction in macular hole surgery: a single-surgeon, standardized surgery study.

Int J Retina Vitreous

January 2025

Department of Retina and Vitreous, Narayana Nethralaya, #121/C, 1st R Block, Chord Road, Rajaji Nagar, Bengaluru, 560010, India.

Purpose: To evaluate the predictive accuracy of various machine learning (ML) statistical models in forecasting postoperative visual acuity (VA) outcomes following macular hole (MH) surgery using preoperative optical coherence tomography (OCT) parameters.

Methods: This retrospective study included 158 eyes (151 patients) with full-thickness MHs treated between 2017 and 2023 by the same surgeon and using the same intraoperative surgical technique. Data from electronic medical records and OCT scans were extracted, with OCT-derived qualitative and quantitative MH characteristics recorded.

View Article and Find Full Text PDF

Tyrosine-protein kinase Src plays a key role in cell proliferation and growth under favorable conditions, but its overexpression and genetic mutations can lead to the progression of various inflammatory diseases. Due to the specificity and selectivity problems of previously discovered inhibitors like dasatinib and bosutinib, we employed an integrated machine learning and structure-based drug repurposing strategy to find novel, targeted, and non-toxic Src kinase inhibitors. Different machine learning models including random forest (RF), k-nearest neighbors (K-NN), decision tree, and support vector machine (SVM), were trained using already available bioactivity data of Src kinase targeting compounds.

View Article and Find Full Text PDF

Background And Purpose: Magnetic Resonance Imaging is widely used to assess disease burden in multiple sclerosis (MS). This study aimed to evaluate the effectiveness of a commercially available k-nearest neighbors (k-NN) software in quantifying white matter lesion (WML) burden in MS. We compared the software's WML quantification to expert radiologists' assessments.

View Article and Find Full Text PDF

Eutrophication is one of the most relevant concerns due to the risk to water supply and food security. Nitrogen and phosphorus chemical species concentrations determined the risk and magnitude of eutrophication. These analyses are even more relevant in basins with intensive agriculture due to agrochemical discharges.

View Article and Find Full Text PDF

Depression is more than just feeling sad. It is a severe and multifaceted mental health condition that impacts millions of individuals around the globe. Regrettably, it can even be more prevalent in university students of underdeveloped and developing countries like Bangladesh because of academic pressure, family and societal expectations, financial limitations, stigmatized social and cultural norms, unemployment concerns, lack of mental health awareness, etc.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!