Enhancing data pipelines for forecasting student performance: integrating feature selection with cross-validation.

Int J Educ Technol High Educ

Department of Ecology and Evolution, Program in Science Education, Stony Brook University, 650 Life Sciences Building, Stony Brook, NY 11794-5233 USA.

Published: August 2021

Unlabelled: Educators seek to harness knowledge from educational corpora to improve student performance outcomes. Although prior studies have compared the efficacy of data mining methods (DMMs) in pipelines for forecasting student success, less work has focused on identifying a set of relevant features prior to model development and quantifying the stability of feature selection techniques. Pinpointing a subset of pertinent features can (1) reduce the number of variables that need to be managed by stakeholders, (2) make "black-box" algorithms more interpretable, and (3) provide greater guidance for faculty to implement targeted interventions. To that end, we introduce a methodology integrating feature selection with cross-validation and rank each feature on subsets of the training corpus. This modified pipeline was applied to forecast the performance of 3225 students in a baccalaureate science course using a set of 57 features, four DMMs, and four filter feature selection techniques. Correlation Attribute Evaluation (CAE) and Fisher's Scoring Algorithm (FSA) achieved significantly higher Area Under the Curve (AUC) values for logistic regression (LR) and elastic net regression (GLMNET), compared to when this pipeline step was omitted. Relief Attribute Evaluation (RAE) was highly unstable and produced models with the poorest prediction performance. Borda's method identified grade point average, number of credits taken, and performance on concept inventory assessments as the primary factors impacting predictions of student performance. We discuss the benefits of this approach when developing data pipelines for predictive modeling in undergraduate settings that are more interpretable and actionable for faculty and stakeholders.

Supplementary Information: The online version contains supplementary material available at 10.1186/s41239-021-00279-6.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8591701PMC
http://dx.doi.org/10.1186/s41239-021-00279-6DOI Listing

Publication Analysis

Top Keywords

feature selection
16
student performance
12
data pipelines
8
pipelines forecasting
8
forecasting student
8
integrating feature
8
selection cross-validation
8
selection techniques
8
attribute evaluation
8
performance
6

Similar Publications

Land Surface Temperature (LST) is widely recognized as a sensitive indicator of climate change, and it plays a significant role in ecological research. The ERA5-Land LST dataset, developed and managed by the European Centre for Medium-Range Weather Forecasts (ECMWF), is extensively used for global or regional LST studies. However, its fine-scale application is limited by its low spatial resolution.

View Article and Find Full Text PDF

Autism spectrum disorder (ASD) affects up to 1 in 59 children, and is one of the most common neurodevelopmental disorders. Recent genomic studies have highlighted the role of rare variants in ASD. This study aimed to identify genes affected by rare variants shared by siblings with ASD and validate the function of a candidate gene FRRS1L.

View Article and Find Full Text PDF

Development and Validation of an AI-Based Multimodal Model for Pathological Staging of Gastric Cancer Using CT and Endoscopic Images.

Acad Radiol

January 2025

Guangxi Medical University, Nanning, Guangxi 530021, China (C.Z., D.H., B.W., S.W., Y.S., X.W.); Guangxi Key Laboratory of Enhanced Recovery After Surgery for Gastrointestinal Cancer, Nanning, Guangxi 530021, China (C.Z., D.H., B.W., S.W., Y.S., X.W.); Department of Gastrointestinal Gland Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi 530021, China (D.H., X.W.). Electronic address:

Rationale And Objectives: Accurate preoperative pathological staging of gastric cancer is crucial for optimal treatment selection and improved patient outcomes. Traditional imaging methods such as CT and endoscopy have limitations in staging accuracy.

Methods: This retrospective study included 691 gastric cancer patients treated from March 2017 to March 2024.

View Article and Find Full Text PDF

Radiomics and Deep Learning Model for Benign and Malignant Soft Tissue Tumors Differentiation of Extremities and Trunk.

Acad Radiol

January 2025

Department of Radiology, Southeast University Zhongda Hospital, No. 87 Dingjiaqiao Road, Gulou District, Nanjing, Jiangsu Province, China (M.Y., J.J.). Electronic address:

Rationale And Objectives: To develop radiomics and deep learning models for differentiating malignant and benign soft tissue tumors (STTs) preoperatively based on fat saturation T2-weighted imaging (FS-T2WI) of patients.

Materials And Methods: Data of 115 patients with STTs of extremities and trunk were collected from our hospital as the training set, and data of other 70 patients were collected from another center as the external validation set. Outlined Regions of interest included the intratumor and the peritumor region extending outward by 5 mm, then the corresponding radiomics features were extracted respectively.

View Article and Find Full Text PDF

Background: Hemorrhage is a major complication of brain arteriovenous malformations (AVMs) embolization, which can be related to persistent arteriovenous shunts that were not completely occluded during the embolization. In transvenous embolization (TVE) this risk is deemed higher for AVMs larger than 3 cm featuring multiple veins of drainage. Herein, we will discuss a few selected cases where brain AVMs with more than one draining vein were deemed safe for curative embolization with advanced endovascular techniques after a careful anatomical study through the four dimensional-digital subtraction angiography (4D-DSA) imaging.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!