As a significant global concern, air pollution triggers enormous challenges in public health and ecological sustainability, necessitating the development of precise algorithms to forecast and mitigate its impacts, which has led to the development of many machine learning (ML)-based models for predicting air quality. Meanwhile, overfitting is a prevalent issue with ML algorithms that decreases their efficacy and generalizability. The present investigation, using an extensive collection of data from 16 sensors in Tehran, Iran, from 2013 to 2023, focuses on applying the Least Absolute Shrinkage and Selection Operator (Lasso) regularisation technique to enhance the forecasting precision of ambient air pollutants concentration models, including particulate matter (PM and PM), CO, NO, SO, and O while decreasing overfitting. The outputs were compared using the R-squared (R), mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and normalised mean square error (NMSE) indices. Despite the preliminary findings revealing that Lasso dramatically enhances model reliability by decreasing overfitting and determining key attributes, the model's performance in predicting gaseous pollutants against PM remained unsatisfactory (R = 0.80, R = 0.75, R = 0.45, R = 0.55, R = 0.65, and R = 0.35). The minimal degree of missing data presumably explained the strong performance of the PM model, while the high dynamism of gases and their chemical interactions, in conjunction with the inherent characteristics of the model, were the primary factors contributing to the poor performance of the model. Simultaneously, the successful implementation of the Lasso regularisation approach in mitigating overfitting and selecting more important features makes it highly suggested for application in air quality forecasting models.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11696743PMC
http://dx.doi.org/10.1038/s41598-024-84342-yDOI Listing

Publication Analysis

Top Keywords

lasso regularisation
12
air quality
12
square error
12
regularisation technique
8
mitigating overfitting
8
decreasing overfitting
8
performance model
8
overfitting
5
air
5
application lasso
4

Similar Publications

Objective: Secondary sclerosing cholangitis (SSC) represents a disease with a poor prognosis increasingly diagnosed in clinical settings. Notably, SSC in critically ill patients (SSC-CIP) is the most frequent cause. Variables associated with worse prognosis remain unclear.

View Article and Find Full Text PDF

As a significant global concern, air pollution triggers enormous challenges in public health and ecological sustainability, necessitating the development of precise algorithms to forecast and mitigate its impacts, which has led to the development of many machine learning (ML)-based models for predicting air quality. Meanwhile, overfitting is a prevalent issue with ML algorithms that decreases their efficacy and generalizability. The present investigation, using an extensive collection of data from 16 sensors in Tehran, Iran, from 2013 to 2023, focuses on applying the Least Absolute Shrinkage and Selection Operator (Lasso) regularisation technique to enhance the forecasting precision of ambient air pollutants concentration models, including particulate matter (PM and PM), CO, NO, SO, and O while decreasing overfitting.

View Article and Find Full Text PDF

Balancing accuracy and Interpretability: An R package assessing complex relationships beyond the Cox model and applications to clinical prediction.

Int J Med Inform

February 2025

Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.

Background: Accurate and interpretable models are essential for clinical decision-making, where predictions can directly impact patient care. Machine learning (ML) survival methods can handle complex multidimensional data and achieve high accuracy but require post-hoc explanations. Traditional models such as the Cox Proportional Hazards Model (Cox-PH) are less flexible, but fast, stable, and intrinsically transparent.

View Article and Find Full Text PDF

Machine learning methods for propensity and disease risk score estimation in high-dimensional data: a plasmode simulation and real-world data cohort analysis.

Front Pharmacol

October 2024

Pharmaco- and Device Epidemiology Group, Centre of Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), University of Oxford, Oxford, United Kingdom.

Introduction: Machine learning (ML) methods are promising and scalable alternatives for propensity score (PS) estimation, but their comparative performance in disease risk score (DRS) estimation remains unexplored.

Methods: We used real-world data comparing antihypertensive users to non-users with 69 negative control outcomes, and plasmode simulations to study the performance of ML methods in PS and DRS estimation. We conducted a cohort study using UK primary care records.

View Article and Find Full Text PDF

Predicting disease recurrence in patients with endometriosis: an observational study.

BMC Med

August 2024

Department of Obstetrics, Gynaecology and Newborn Health, University of Melbourne and Gynaecology Research Centre, Royal Women's Hospital, Grattan St & Flemington Rd, Parkville, VIC, 3052, Australia.

Background: Despite surgical and pharmacological interventions, endometriosis can recur. Reliable information regarding risk of recurrence following a first diagnosis is scant. The aim of this study was to examine clinical and survey data in the setting of disease recurrence to identify predictors of risk of endometriosis recurrence.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!