An increasingly popular form of collaboration involves forming partnerships among researchers, educators, and community members to improve or transform education systems through research inquiry. However, not all partnerships are successful. The field needs valid, reliable, and useful measures to help with assessing progress toward partnership goals.
View Article and Find Full Text PDFEducators have become increasingly committed to social and emotional learning in schools. However, we know too little about the typical growth trajectories of the competencies that schools are striving to improve. We leverage data from the California Office to Reform Education, a consortium of districts in California serving over 1.
View Article and Find Full Text PDFSurvey scores are often the basis for understanding how individuals grow psychologically and socio-emotionally. A known problem with many surveys is that the items are all "easy"-that is, individuals tend to use only the top one or two response categories on the Likert scale. Such an issue could be especially problematic, and lead to ceiling effects, when the same survey is administered repeatedly over time.
View Article and Find Full Text PDFWhile a great deal of thought, planning, and money goes into the design of multisite randomized control trials (RCTs) that are used to evaluate the effectiveness of interventions in fields like education and psychology, relatively little thought is often paid to the measurement choices made in such evaluations. In this study, we conduct a series of simulation studies that consider a wide range of options for producing scores from multiple administration of assessments in the context of multisite RCTs. The scoring models considered range from the simple (sum scores) to highly complex (multilevel two-tier item response theory [IRT] models with latent regression).
View Article and Find Full Text PDFPsychol Assess
January 2023
Supporting students' social-emotional learning (SEL) is gaining emphasis in education. In particular, self-control is a construct that has been shown to predict academic outcomes, though much debate on this point exists. Although largely unexamined, inconsistent findings could stem from the fact that related surveys are often scored by multiple raters (e.
View Article and Find Full Text PDFThough much effort is often put into designing psychological studies, the measurement model and scoring approach employed are often an afterthought, especially when short survey scales are used (Flake & Fried, 2020). One possible reason that measurement gets downplayed is that there is generally little understanding of how calibration/scoring approaches could impact common estimands of interest, including treatment effect estimates, beyond random noise due to measurement error. Another possible reason is that the process of scoring is complicated, involving selecting a suitable measurement model, calibrating its parameters, then deciding how to generate a score, all steps that occur before the score is even used to examine the desired psychological phenomenon.
View Article and Find Full Text PDFWhen randomized control trials are not available, regression discontinuity (RD) designs are a viable quasi-experimental method shown to be capable of producing causal estimates of how a program or intervention affects an outcome. While the RD design and many related methodological innovations came from the field of psychology, RDs are underutilized among psychologists even though many interventions are assigned on the basis of scores from common psychological measures, a situation tailor-made for RDs. In this tutorial, we present a straightforward way to implement an RD model as a structural equation model (SEM).
View Article and Find Full Text PDFThis study provides empirical benchmarks that quantify typical changes in students' reports of social and emotional skills in a large, diverse sample. Data come from six cohorts of students (N = 361,815; 6% Asian, 8% Black, 68% White, 75% Latinx, 50% Female) who responded to the CORE survey from 2015 to 2018 and help quantify typical gains/declines in growth mindset, self-efficacy, self-management, and social awareness. Results show fluctuations in skills between 4th and 12th grade (changes ranging from -.
View Article and Find Full Text PDFConsiderable thought is often put into designing randomized control trials (RCTs). From power analyses and complex sampling designs implemented preintervention to nuanced quasi-experimental models used to estimate treatment effects postintervention, RCT design can be quite complicated. Yet when psychological constructs measured using survey scales are the outcome of interest, measurement is often an afterthought, even in RCTs.
View Article and Find Full Text PDFAppl Psychol Meas
January 2022
Researchers in the social sciences often obtain ratings of a construct of interest provided by multiple raters. While using multiple raters provides a way to help avoid the subjectivity of any given person's responses, rater disagreement can be a problem. A variety of models exist to address rater disagreement in both structural equation modeling and item response theory frameworks.
View Article and Find Full Text PDFBackground: Research shows that successfully transitioning from intermediate school to secondary school is pivotal for students to remain on track to graduate. Studies also indicate that a successful transition is a function not only of how prepared the students are academically but also whether they have the social-emotional learning (SEL) skills to succeed in a more independent secondary school environment.
Aim: Yet, little is known about whether students' SEL skills are stable over time, and if they are not, whether a student's initial level of SEL skills at the start of intermediate school or change in SEL skills over time is a better indicator of whether the student will be off track academically in 9th grade.
Suboptimal effort is a major threat to valid score-based inferences. While the effects of such behavior have been frequently examined in the context of mean group comparisons, minimal research has considered its effects on individual score use (e.g.
View Article and Find Full Text PDFRandomized control trials (RCTs) are considered the gold standard when evaluating the impact of psychological interventions, educational programs, and other treatments on outcomes of interest. However, few studies consider whether forms of measurement bias like noninvariance might impact estimated treatment effects from RCTs. Such bias may be more likely to occur when survey scales are utilized in studies and evaluations in ways not supported by validation evidence, which occurs in practice.
View Article and Find Full Text PDFAs low-stakes testing contexts increase, low test-taking effort may serve as a serious validity threat. One common solution to this problem is to identify noneffortful responses and treat them as missing during parameter estimation via the effort-moderated item response theory (EM-IRT) model. Although this model has been shown to outperform traditional IRT models (e.
View Article and Find Full Text PDFTo avoid the subjectivity of having a single person evaluate a construct of interest (e.g., a student's self-efficacy in school), multiple raters are often used.
View Article and Find Full Text PDFA huge portion of what we know about how humans develop, learn, behave, and interact is based on survey data. Researchers use longitudinal growth modeling to understand the development of students on psychological and social-emotional learning constructs across elementary and middle school. In these designs, students are typically administered a consistent set of self-report survey items across multiple school years, and growth is measured either based on sum scores or scale scores produced based on item response theory (IRT) methods.
View Article and Find Full Text PDFSurvey respondents employ different response styles when they use the categories of the Likert scale differently despite having the same true score on the construct of interest. For example, respondents may be more likely to use the extremes of the response scale independent of their true score. Research already shows that differing response styles can create a construct-irrelevant source of bias that distorts fundamental inferences made based on survey data.
View Article and Find Full Text PDF