Differential item functioning (DIF) analysis is one of the most important applications of item response theory (IRT) in psychological assessment. This study examined the performance of two Bayesian DIF methods, Bayes factor (BF) and deviance information criterion (DIC), with the generalized graded unfolding model (GGUM). The Type I error and power were investigated in a Monte Carlo simulation that manipulated sample size, DIF source, DIF size, DIF location, subpopulation trait distribution, and type of baseline model.
View Article and Find Full Text PDFCollateral information has been used to address subpopulation heterogeneity and increase estimation accuracy in some large-scale cognitive assessments. The methodology that takes collateral information into account has not been developed and explored in published research with models designed specifically for noncognitive measurement. Because the accurate noncognitive measurement is becoming increasingly important, we sought to examine the benefits of using collateral information in latent trait estimation with an item response theory model that has proven valuable for noncognitive testing, namely, the generalized graded unfolding model (GGUM).
View Article and Find Full Text PDFThis research developed a new ideal point-based item response theory (IRT) model for multidimensional forced choice (MFC) measures. We adapted the Zinnes and Griggs (ZG; 1974) IRT model and the multi-unidimensional pairwise preference (MUPP; Stark et al., 2005) model, henceforth referred to as ZG-MUPP.
View Article and Find Full Text PDFLikert-type measures have been criticized in psychological assessment because they are vulnerable to response biases, including central tendency, acquiescence, leniency, halo, and socially desirable responding. As an alternative, multidimensional forced choice (MFC) testing has been proposed to address these concerns. A number of researchers have developed item response theory (IRT) models for MFC data and have examined latent trait estimation with tests of different dimensionality and length.
View Article and Find Full Text PDFHistorically, multidimensional forced choice (MFC) measures have been criticized because conventional scoring methods can lead to ipsativity problems that render scores unsuitable for interindividual comparisons. However, with the recent advent of item response theory (IRT) scoring methods that yield normative information, MFC measures are surging in popularity and becoming important components in high-stake evaluation settings. This article aims to add to burgeoning methodological advances in MFC measurement by focusing on statement and person parameter recovery for the GGUM-RANK (generalized graded unfolding-RANK) IRT model.
View Article and Find Full Text PDFIn single-case research, multiple-baseline (MB) design provides the opportunity to estimate the treatment effect based on not only within-series comparisons of treatment phase to baseline phase observations, but also time-specific between-series comparisons of observations from those that have started treatment to those that are still in the baseline. For analyzing MB studies, two types of linear mixed modeling methods have been proposed: the within- and between-series models. In principle, those models were developed based on normality assumptions, however, normality may not always be found in practical settings.
View Article and Find Full Text PDFOver the last decade, researchers have come to recognize the benefits of ideal point item response theory (IRT) models for noncognitive measurement. Although most applied studies have utilized the Generalized Graded Unfolding Model (GGUM), many others have been developed. Most notably, David Andrich and colleagues published a series of papers comparing dominance and ideal point measurement perspectives, and they proposed ideal point models for dichotomous and polytomous single-stimulus responses, known as the Hyperbolic Cosine Model (HCM) and the General Hyperbolic Cosine Model (GHCM), respectively.
View Article and Find Full Text PDFElectronic learning systems have received increasing attention because they are easily accessible to many students and are capable of personalizing the learning environment in response to students' learning needs. To that end, using fast and flexible algorithms that keep track of the students' ability change in real time is desirable. Recently, the Elo rating system (ERS) has been applied and studied in both research and practical settings (Brinkhuis & Maris, 2009; Klinkenberg, Straatemeier, & van der Maas in Computers & Education, 57, 1813-1824, 2011).
View Article and Find Full Text PDFWhen multiple groups are compared, the error variance-covariance structure is not always invariant between groups. In this study we investigated the impacts of misspecified error structures on testing measurement invariance and the latent-factor mean difference between groups. A Monte Carlo study was conducted to examine how measurement invariance and latent mean difference tests were affected when heterogeneous error structures were misspecified as being invariant across groups.
View Article and Find Full Text PDFBackground: When developmental disabilities researchers use multiple-baseline designs they are encouraged to delay the start of an intervention until the baseline stabilizes or until preceding cases have responded to intervention. Using ongoing visual analyses to guide the timing of the start of the intervention can help to resolve potential ambiguities in the graphical display; however, these forms of response-guided experimentation have been criticized as a potential source of bias in treatment effect estimation and inference.
Aims And Methods: Monte Carlo simulations were used to examine the bias and precision of average treatment effect estimates obtained from multilevel models of four-case multiple-baseline studies with series lengths that varied from 19 to 49 observations per case.
We developed masked visual analysis (MVA) as a structured complement to traditional visual analysis. The purpose of the present investigation was to compare the effects of computer-simulated MVA of a four-case multiple-baseline (MB) design in which the phase lengths are determined by an ongoing visual analysis (i.e.
View Article and Find Full Text PDFForced-choice item response theory (IRT) models are being more widely used as a way of reducing response biases in noncognitive research and operational testing contexts. As applications have increased, there has been a growing need for methods to link parameters estimated in different examinee groups as a prelude to measurement equivalence testing. This study compared four linking methods for the Zinnes and Griggs (ZG) pairwise preference ideal point model.
View Article and Find Full Text PDFAppl Psychol Meas
March 2017
Concurrent calibration using anchor items has proven to be an effective alternative to separate calibration and linking for developing large item banks, which are needed to support continuous testing. In principle, anchor-item designs and estimation methods that have proven effective with dominance item response theory (IRT) models, such as the 3PL model, should also lead to accurate parameter recovery with ideal point IRT models, but surprisingly little research has been devoted to this issue. This study, therefore, had two purposes: (a) to develop software for concurrent calibration with, what is now the most widely used ideal point model, the generalized graded unfolding model (GGUM); (b) to compare the efficacy of different GGUM anchor-item designs and develop empirically based guidelines for practitioners.
View Article and Find Full Text PDFAppl Psychol Meas
October 2016
In recent years, there has been a surge of interest in measuring noncognitive constructs in educational and managerial/organizational settings. For the most part, these noncognitive constructs have been and continue to be measured using Likert-type (ordinal response) scales, which are susceptible to several types of response distortion. To deal with these response biases, researchers have proposed using forced-choice format, which requires respondents or raters to evaluate cognitive, affective, or behavioral descriptors presented in blocks of two or more.
View Article and Find Full Text PDF