An Efficient Data Partitioning to Improve Classification Performance While Keeping Parameters Interpretable.

Kristjan Korjus Martin N Hebart Raul Vicente

PLoS One

Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia.

Published: August 2017

Supervised machine learning needs to divide data into training, validation, and test sets, which complicates finding the best model parameters when data is limited.
A new method called "Cross-validation and cross-testing" allows researchers to reuse test data without introducing bias, thus improving overall classifier performance evaluation.
This approach has shown better results in identifying significant findings while keeping parameters interpretable, making it a valuable addition to existing machine learning techniques, especially for models where parameter interpretation is crucial.

Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001642	PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0161788	PLOS

Publication Analysis

Top Keywords

machine learning

generalization performance

test data

novel approach

parameters

data

approach

efficient data

data partitioning

partitioning improve

Similar Publications

A Comparison of Different Machine Learning Classifiers in Predicting Xerostomia and Sticky Saliva due to Head and Neck Radiotherapy using a Multi-objective, Multimodal Radiomics Model.

Biomed Phys Eng Express

January 2025

Radiation Oncology, Emory University, Emory Midtown Hospital, Atlanta, Georgia, 30322, UNITED STATES.

Benyamin Khajetash Ghasem Hajianfar Amin Talebi Beth Ghavidel Seied Rabi Mahdavi

Although radiotherapy techniques are the primary treatment for head and neck cancer (HNC), they are still associated with substantial toxicity, and side effect. Machine learning (ML) based radiomics models for predicting toxicity mostly rely on features extracted from pre-treatment imaging data. This study aims to compare different models in predicting radiation-induced xerostomia and sticky saliva in both early and late stage of HNC patients using CT and MRI image features along with demographics and dosimetric information.

View Article and Find Full Text PDF

Similar Publications

EEG-derived brainwave patterns for depression diagnosis via hybrid machine learning and deep learning frameworks.

Appl Neuropsychol Adult

January 2025

Faculty Xavier Institute of Engineering, Mahim, India.

Nitin Ahire

In the fields of engineering, science, technology, and medicine, artificial intelligence (AI) has made significant advancements. In particular, the application of AI techniques in medicine, such as machine learning (ML) and deep learning (DL), is rapidly growing and offers great potential for aiding physicians in the early diagnosis of illnesses. Depression, one of the most prevalent and debilitating mental illnesses, is projected to become the leading cause of disability worldwide by 2040.

View Article and Find Full Text PDF

Similar Publications

Transformers for Neuroimage Segmentation: Scoping Review.

J Med Internet Res

January 2025

Department of Computer Science and Software Engineering, United Arab Emirates University, Al Ain, United Arab Emirates.

Maya Iratni Amira Abdullah Mariam Aldhaheri Omar Elharrouss Alaa Abd-Alrazaq

Background: Neuroimaging segmentation is increasingly important for diagnosing and planning treatments for neurological diseases. Manual segmentation is time-consuming, apart from being prone to human error and variability. Transformers are a promising deep learning approach for automated medical image segmentation.

View Article and Find Full Text PDF

Similar Publications

An alternative approach to biokinetic modelling for phenol pollutant degradation and microbial growth using Genetic Programming.

Environ Technol

January 2025

Centre for Biotechnology, Kalasalingam Academy of Research and Education, Krishnankoil, India.

Suganya Krishnan Chandrasekaran Sivapragasam Naresh K Sharma

Biokinetic models can optimise pollutant degradation and enhance microbial growth processes, aiding to protect ecosystem protection. Traditional biokinetic approaches (such as Monod, Haldane, etc.) can be challenging, as they require detailed knowledge of the organism's metabolism and the ability to solve numerous kinetic differential equations based on the principles of micro, molecular biology and biochemistry (first engineering principles) which can lead to discrepancies between predicted and actual degradation rates.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!