The aim of data preprocessing is to remove data artifacts-such as a baseline, scatter effects or noise-and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. In that approach, the focus was solely on improving the predictive performance of the chemometric model. This is, however, only one of the two relevant criteria in modeling: interpretation of the model results can be just as important. Variable selection is often used to achieve such interpretation. Data artifacts, however, may hamper proper variable selection by masking the true relevant variables. The choice of preprocessing therefore has a huge impact on the outcome of variable selection methods and may thus hamper an objective interpretation of the final model. To enhance such objective interpretation, we here integrate variable selection into the preprocessing selection approach that is based on DoE. We show that the entanglement of preprocessing selection and variable selection not only improves the interpretation, but also the predictive performance of the model. This is achieved by analyzing several experimental data sets of which the true relevant variables are available as prior knowledge. We show that a selection of variables is provided that complies more with the true informative variables compared to individual optimization of both model aspects. Importantly, the approach presented in this work is generic. Different types of models (e.g. PCR, PLS, …) can be incorporated into it, as well as different variable selection methods and different preprocessing methods, according to the taste and experience of the user. In this work, the approach is illustrated by using PLS as model and PPRV-FCAM (Predictive Property Ranked Variable using Final Complexity Adapted Models) for variable selection.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.aca.2016.08.022 | DOI Listing |
J Am Med Inform Assoc
January 2025
Department of Health Policy, Stanford School of Medicine, Stanford, CA 94305, United States.
Objectives: The inclusion of social drivers of health (SDOH) into predictive algorithms of health outcomes has potential for improving algorithm interpretation, performance, generalizability, and transportability. However, there are limitations in the availability, understanding, and quality of SDOH variables, as well as a lack of guidance on how to incorporate them into algorithms when appropriate to do so. As such, few published algorithms include SDOH, and there is substantial methodological variability among those that do.
View Article and Find Full Text PDFJMIR Med Inform
January 2025
School of Software, Taiyuan University of Technology, Jingzhong, China.
Background: The prompt and accurate identification of mild cognitive impairment (MCI) is crucial for preventing its progression into more severe neurodegenerative diseases. However, current diagnostic solutions, such as biomarkers and cognitive screening tests, prove costly, time-consuming, and invasive, hindering patient compliance and the accessibility of these tests. Therefore, exploring a more cost-effective, efficient, and noninvasive method to aid clinicians in detecting MCI is necessary.
View Article and Find Full Text PDFJMIR Res Protoc
January 2025
Foundation of Healthcare Technologies Society, New Delhi, India.
Background: Podcasts are an unconventional method of disseminating information through audio to the masses. They are an emerging portable technology and a valuable resource that provides unlimited access for promoting health among participants. Podcasts related to health care have been used as a source of medical education, but there is a dearth of studies on the use of podcasts as a source of health information.
View Article and Find Full Text PDFJ Med Internet Res
January 2025
NOVA National School of Public Health, Public Health Research Centre, Comprehensive Health Research Center, NOVA University Lisbon, Lisbon, Portugal.
Background: Heart failure (HF) is a significant global health problem, affecting approximately 64.34 million people worldwide. The worsening of HF, also known as HF decompensation, is a major factor behind hospitalizations, contributing to substantial health care costs related to this condition.
View Article and Find Full Text PDFJ Nephrol
January 2025
Department of Nephrology, The First Affiliated Hospital of Sun Yat-sen University, 58Th, Zhongshan Road II, Guangzhou, 510080, People's Republic of China.
Background: Positive anti-ribonucleoprotein antibodies may characterize a subgroup of patients affected by lupus nephritis with mild kidney damage, but little is known about their clinical features and long-term prognosis.
Methods: Patients were retrospectively selected from the lupus nephritis database ( http://ln.medidata.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!