Surrogate models are frequently used to replace costly engineering simulations. A single surrogate is frequently chosen based on previous experience or by fitting multiple surrogates and selecting one based on mean cross-validation errors. A novel stacking strategy will be presented in this paper. This new strategy results from reinterpreting the model selection process based on the generalization error. For the first time, this problem is proposed to be translated into a well-studied financial problem: portfolio management and optimization. In short, it is demonstrated that the individual residues calculated by leave-one-out procedures are samples from a given random variable ϵi, whose second non-central moment is the i-th model's generalization error. Thus, a stacking methodology based solely on evaluating the behavior of the linear combination of the random variables ϵi is proposed. At first, several surrogate models are calibrated. The Directed Bubble Hierarchical Tree (DBHT) clustering algorithm is then used to determine which models are worth stacking. The stacking weights can be calculated using any financial approach to the portfolio optimization problem. This alternative understanding of the problem enables practitioners to use established financial methodologies to calculate the models' weights, significantly improving the ensemble of models' out-of-sample performance. A study case is carried out to demonstrate the applicability of the new methodology. Overall, a total of 124 models were trained using a specific dataset: 40 Machine Learning models and 84 Polynomial Chaos Expansion models (which considered 3 types of base random variables, 7 least square algorithms for fitting the up to fourth order expansion's coefficients). Among those, 99 models could be fitted without convergence and other numerical issues. The DBHT algorithm with Pearson correlation distance and generalization error similarity was able to select a subgroup of 23 models from the 99 fitted ones, implying a reduction of about 77% in the total number of models, representing a good filtering scheme which still preserves diversity. Finally, it has been demonstrated that the weights obtained by building a Hierarchical Risk Parity (HPR) portfolio perform better for various input random variables, indicating better out-of-sample performance. In this way, an economic stacking strategy has demonstrated its worth in improving the out-of-sample capabilities of stacked models, which illustrates how the new understanding of model stacking methodologies may be useful.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470931PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0290331PLOS

Publication Analysis

Top Keywords

generalization error
16
stacking strategy
12
random variables
12
models
11
portfolio optimization
8
surrogate models
8
out-of-sample performance
8
models fitted
8
stacking
7
hposs hierarchical
4

Similar Publications

Objectives: To investigate maxillary canine movement accuracy and anchorage during space closure in first premolar extraction cases (maximum anchorage) using In-House Clear Aligners (IHCAs).

Materials And Methods: A randomised controlled trial with a split-mouth design recruited 16 adults in university setting. Each patient was randomly assigned by side for canine retraction using 12 IHCAs to both the experimental palatal power arm (Pa) and non-Pa control (C).

View Article and Find Full Text PDF

Nowadays, photoplethysmograph (PPG) technology is being used more often in smart devices and mobile phones due to advancements in information and communication technology in the health field, particularly in monitoring cardiac activities. Developing generative models to generate synthetic PPG signals requires overcoming challenges like data diversity and limited data available for training deep learning models. This paper proposes a generative model by adopting a genetic programming (GP) approach to generate increasingly diversified and accurate data using an initial PPG signal sample.

View Article and Find Full Text PDF

Background: INTER- and INTRAmuscular fat (IMF) is elevated in high metabolic states and can promote inflammation. While magnetic resonance imaging (MRI) excels in depicting IMF, the lack of reproducible tools prevents the ability to measure change and track intervention success.

Methods: We detail an open-source fully-automated iterative threshold-seeking algorithm (ITSA) for segmenting IMF from T1-weighted MRI of the calf and thigh within three cohorts (CaMos Hamilton (N = 54), AMBERS (N = 280), OAI (N = 105)) selecting adults 45-85 years of age.

View Article and Find Full Text PDF

Short time solar power forecasting using P-ELM approach.

Sci Rep

December 2024

School of Electrical and Information, Hunan University, Changsha, 410083, China.

Accurately predicting solar power to ensure the economical operation of microgrids and smart grids is a key challenge for integrating the large scale photovoltaic (PV) generation into conventional power systems. This paper proposes an accurate short-term solar power forecasting method using a hybrid machine learning algorithm, with the system trained using the pre-trained extreme learning machine (P-ELM) algorithm. The proposed method utilizes temperature, irradiance, and solar power output at instant i as input parameters, while the output parameters are temperature, irradiance, and solar power output at instant i+1, enabling next-day solar power output forecasting.

View Article and Find Full Text PDF

A new prediction model based on deep learning for pig house environment.

Sci Rep

December 2024

School of Mechanical and Electrical Engineering, Qiqihar University, Qiqihar, 161006, China.

A prediction model of the pig house environment based on Bayesian optimization (BO), squeeze and excitation block (SE), convolutional neural network (CNN) and gated recurrent unit (GRU) is proposed to improve the prediction accuracy and animal welfare and take control measures in advance. To ensure the optimal model configuration, the model uses a BO algorithm to fine-tune hyper-parameters, such as the number of GRUs, initial learning rate and L2 normal form regularization factor. The environmental data are fed into the SE-CNN block, which extracts the local features of the data through convolutional operations.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!