Instance selection is becoming increasingly relevant due to the huge amount of data that is constantly produced in many fields of research. At the same time, most of the recent pattern recognition problems involve highly complex datasets with a large number of possible explanatory variables. For many reasons, this abundance of variables significantly harms classification or recognition tasks. There are efficiency issues, too, because the speed of many classification algorithms is largely improved when the complexity of the data is reduced. One of the approaches to address problems that have too many features or instances is feature or instance selection, respectively. Although most methods address instance and feature selection separately, both problems are interwoven, and benefits are expected from facing these two tasks jointly. This paper proposes a new memetic algorithm for dealing with many instances and many features simultaneously by performing joint instance and feature selection. The proposed method performs four different local search procedures with the aim of obtaining the most relevant subsets of instances and features to perform an accurate classification. A new fitness function is also proposed that enforces instance selection but avoids putting too much pressure on removing features. We prove experimentally that this fitness function improves the results in terms of testing error. Regarding the scalability of the method, an extension of the stratification approach is developed for simultaneous instance and feature selection. This extension allows the application of the proposed algorithm to large datasets. An extensive comparison using 55 medium to large datasets from the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 30 large problems, with very good results. The accuracy of the method for class-imbalanced problems in a set of 40 datasets is shown. The usefulness of the method is also tested using decision trees and support vector machines as classification methods.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1162/EVCO_a_00102 | DOI Listing |
Acta Otorhinolaryngol Ital
December 2024
Orthodontics and Pediatric Dentistry Unit, Section of Dentistry, Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy.
Osteochondroma (OC) is a common bone tumour that rarely affects the mandibular condylar process. This pathology can show typical clinical features, such as facial asymmetry, deviation of the chin and dental inferior midline, changes in condylar morphology and malocclusion with an increased posterior mandibular vertical height. The management of condylar OC is a debated topic among surgeons.
View Article and Find Full Text PDFBMC Med Imaging
January 2025
Electronics and Communications, Arab Academy for Science, Heliopolis, Cairo, 2033, Egypt.
Invasive breast cancer diagnosis and treatment planning require an accurate assessment of human epidermal growth factor receptor 2 (HER2) expression levels. While immunohistochemical techniques (IHC) are the gold standard for HER2 evaluation, their implementation can be resource-intensive and costly. To reduce these obstacles and expedite the procedure, we present an efficient deep-learning model that generates high-quality IHC-stained images directly from Hematoxylin and Eosin (H&E) stained images.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Computer Science, Faculty of Computers and Information, Suez University, P. O. Box 43221, Suez, Egypt.
Diabetes is a long-term condition characterized by elevated blood sugar levels. It can lead to a variety of complex disorders such as stroke, renal failure, and heart attack. Diabetes requires the most machine learning help to diagnose diabetes illness at an early stage, as it cannot be treated and adds significant complications to our health-care system.
View Article and Find Full Text PDFJ Neurosurg Case Lessons
January 2025
Department of Radiology and Biomedical Imaging, University of California, San Francisco, California.
Background: Spinal ependymomas are typically slow-growing tumors with a favorable prognosis. Recently, a new aggressive subtype has emerged with its own distinct histopathological and molecular features characterized by MYCN amplification. However, this subtype of spinal ependymoma is rare, and studies on its imaging characteristics are limited.
View Article and Find Full Text PDFEduc Psychol Meas
January 2025
Faculty of Psychology and Educational Sciences, KU Leuven, Campus KULAK, Kortrijk, Belgium.
Multidimensional Item Response Theory (MIRT) is applied routinely in developing educational and psychological assessment tools, for instance, for exploring multidimensional structures of items using exploratory MIRT. A critical decision in exploratory MIRT analyses is the number of factors to retain. Unfortunately, the comparative properties of statistical methods and innovative Machine Learning (ML) methods for factor retention in exploratory MIRT analyses are still not clear.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!