The development of screening instruments for psychiatric disorders involves item selection from a pool of items in existing questionnaires assessing clinical and behavioral phenotypes. A screening instrument should consist of only a few items and have good accuracy in classifying cases and non-cases. Variable/item selection methods such as Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Net, Classification and Regression Tree, Random Forest, and the two-sample t-test can be used in such context. Unlike situations where variable selection methods are most commonly applied (e.g., ultra high-dimensional genetic or imaging data), psychiatric data usually have lower dimensions and are characterized by the following factors: correlations and possible interactions among predictors, unobservability of important variables (i.e., true variables not measured by available questionnaires), amount and pattern of missing values in the predictors, and prevalence of cases in the training data. We investigate how these factors affect the performance of several variable selection methods and compare them with respect to selection performance and prediction error rate via simulations. Our results demonstrated that: (1) for complete data, LASSO and Elastic Net outperformed other methods with respect to variable selection and future data prediction, and (2) for certain types of incomplete data, Random Forest induced bias in imputation, leading to incorrect ranking of variable importance. We propose the Imputed-LASSO combining Random Forest imputation and LASSO; this approach offsets the bias in Random Forest and offers a simple yet efficient item selection approach for missing data. As an illustration, we apply the methods to items from the standard Autism Diagnostic Interview-Revised version.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4026268PMC
http://dx.doi.org/10.1002/sim.5937DOI Listing

Publication Analysis

Top Keywords

variable selection
16
selection methods
16
random forest
16
selection
9
screening instruments
8
item selection
8
lasso elastic
8
elastic net
8
data
7
methods
6

Similar Publications

The flow convergence method includes calculation of the proximal isovelocity surface area (PISA) and is widely used to classify mitral regurgitation (MR) with echocardiography. It constitutes a primary decision factor for determination of treatment and should therefore be a robust quantification method. However, it is known for its tendency to underestimate MR and its dependence on user expertise.

View Article and Find Full Text PDF

Brassica villosa is characterized by its dense hairiness and high resistance against the fungal pathogen Sclerotinia sclerotiorum. Information on the genetic and molecular mechanisms governing trichome development in B. villosa is rare.

View Article and Find Full Text PDF

The increasing frequency of heat stress events due to climate change disrupts all stages of plant growth, significantly reducing yields, especially in crops like mung bean (Vigna radiata (L.) R. Wilczek).

View Article and Find Full Text PDF

A novel RFE-GRU model for diabetes classification using PIMA Indian dataset.

Sci Rep

January 2025

Department of Computer Science, Faculty of Computers and Information, Suez University, P. O. Box 43221, Suez, Egypt.

Diabetes is a long-term condition characterized by elevated blood sugar levels. It can lead to a variety of complex disorders such as stroke, renal failure, and heart attack. Diabetes requires the most machine learning help to diagnose diabetes illness at an early stage, as it cannot be treated and adds significant complications to our health-care system.

View Article and Find Full Text PDF

Background: Based on the presenting injury, patients undergoing abuse evaluation may be managed by different specialties. Our local child abuse specialist expressed concern over the variability in evaluation of patients presenting with injuries concerning for non-accidental trauma (NAT). The aim of this quality improvement project was to increase the percentage of patients for whom there is a concern for NAT who receive a guideline-adherent evaluation from 7.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!