An Efficient Data Partitioning to Improve Classification Performance While Keeping Parameters Interpretable.

PLoS One

Computational Neuroscience Lab, Institute of Computer Science, University of Tartu, Tartu, Estonia.

Published: August 2017

AI Article Synopsis

  • Supervised machine learning needs to divide data into training, validation, and test sets, which complicates finding the best model parameters when data is limited.
  • A new method called "Cross-validation and cross-testing" allows researchers to reuse test data without introducing bias, thus improving overall classifier performance evaluation.
  • This approach has shown better results in identifying significant findings while keeping parameters interpretable, making it a valuable addition to existing machine learning techniques, especially for models where parameter interpretation is crucial.

Article Abstract

Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001642PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0161788PLOS

Publication Analysis

Top Keywords

machine learning
8
generalization performance
8
test data
8
novel approach
8
parameters
6
data
6
approach
5
efficient data
4
data partitioning
4
partitioning improve
4

Similar Publications

Although radiotherapy techniques are the primary treatment for head and neck cancer (HNC), they are still associated with substantial toxicity, and side effect. Machine learning (ML) based radiomics models for predicting toxicity mostly rely on features extracted from pre-treatment imaging data. This study aims to compare different models in predicting radiation-induced xerostomia and sticky saliva in both early and late stage of HNC patients using CT and MRI image features along with demographics and dosimetric information.

View Article and Find Full Text PDF

In the fields of engineering, science, technology, and medicine, artificial intelligence (AI) has made significant advancements. In particular, the application of AI techniques in medicine, such as machine learning (ML) and deep learning (DL), is rapidly growing and offers great potential for aiding physicians in the early diagnosis of illnesses. Depression, one of the most prevalent and debilitating mental illnesses, is projected to become the leading cause of disability worldwide by 2040.

View Article and Find Full Text PDF

Transformers for Neuroimage Segmentation: Scoping Review.

J Med Internet Res

January 2025

Department of Computer Science and Software Engineering, United Arab Emirates University, Al Ain, United Arab Emirates.

Background: Neuroimaging segmentation is increasingly important for diagnosing and planning treatments for neurological diseases. Manual segmentation is time-consuming, apart from being prone to human error and variability. Transformers are a promising deep learning approach for automated medical image segmentation.

View Article and Find Full Text PDF

Biokinetic models can optimise pollutant degradation and enhance microbial growth processes, aiding to protect ecosystem protection. Traditional biokinetic approaches (such as Monod, Haldane, etc.) can be challenging, as they require detailed knowledge of the organism's metabolism and the ability to solve numerous kinetic differential equations based on the principles of micro, molecular biology and biochemistry (first engineering principles) which can lead to discrepancies between predicted and actual degradation rates.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!