To reduce the chance of Heywood cases or nonconvergence in estimating the 2PL or the 3PL model in the marginal maximum likelihood with the expectation-maximization (MML-EM) estimation method, priors for the item slope parameter in the 2PL model or for the pseudo-guessing parameter in the 3PL model can be used and the marginal maximum a posteriori (MMAP) and posterior standard error (PSE) are estimated. Confidence intervals (CIs) for these parameters and other parameters which did not take any priors were investigated with popular prior distributions, different error covariance estimation methods, test lengths, and sample sizes. A seemingly paradoxical result was that, when priors were taken, the conditions of the error covariance estimation methods known to be better in the literature (Louis or Oakes method in this study) did not yield the best results for the CI performance, while the conditions of the cross-product method for the error covariance estimation which has the tendency of upward bias in estimating the standard errors exhibited better CI performance.
View Article and Find Full Text PDFBackground: Responsive infant feeding occurs when a parent recognizes the infant's cues of hunger or satiety and responds promptly to these cues. It is known to promote healthy dietary patterns and infant weight gain and is recommended as part of the Dietary Guidelines for Americans. However, the use of responsive infant feeding can be challenging for many parents.
View Article and Find Full Text PDFAppl Psychol Meas
June 2021
Pseudo-guessing parameters are present in item response theory applications for many educational assessments. When sample size is not sufficiently large, the guessing parameters may be ignored from the analysis. This study examines the impact of ignoring pseudo-guessing parameters on measurement invariance analysis, specifically, on item difficulty, item discrimination, and mean and variance of ability distribution.
View Article and Find Full Text PDFEduc Psychol Meas
February 2020
A log-linear model (LLM) is a well-known statistical method to examine the relationship among categorical variables. This study investigated the performance of LLM in detecting differential item functioning (DIF) for polytomously scored items via simulations where various sample sizes, ability mean differences (impact), and DIF types were manipulated. Also, the performance of LLM was compared with that of other observed score-based DIF methods, namely ordinal logistic regression, logistic discriminant function analysis, Mantel, and generalized Mantel-Haenszel, regarding their Type I error (rejection rates) and power (DIF detection rates).
View Article and Find Full Text PDFWhen considering the two-parameter or the three-parameter logistic model for item responses from a multiple-choice test, one may want to assess the need for the lower asymptote parameters in the item response function and make sure the use of the three-parameter item response model. This study reports the degree of sensitivity of an overall model test M to detecting the presence of nonzero asymptotes in the item response function under normal and nonnormal ability distribution conditions.
View Article and Find Full Text PDFThe purpose of this article is twofold. The first is to provide evaluative information on the recovery of model parameters and their standard errors for the two-parameter item response theory (IRT) model using different estimation methods by Mplus. The second is to provide easily accessible information for practitioners, instructors, and students about the relationships between IRT and item factor analysis (FA) parameterizations.
View Article and Find Full Text PDFAppl Psychol Meas
November 2017
It has been widely known that the Type I error rates of goodness-of-fit tests using full information test statistics, such as Pearson's test statistic χ and the likelihood ratio test statistic , are problematic when data are sparse. Under such conditions, the limited information goodness-of-fit test statistic is recommended in model fit assessment for models with binary response data. A simulation study was conducted to investigate the power and Type I error rate of in fitting unidimensional models to many different types of multidimensional data.
View Article and Find Full Text PDFThe main purpose of the present study was to examine the validation and reliability of the Korean version of the Sport Anxiety Scale (SAS-2Kr) by evaluating its factorial invariance across gender. A total of 303 Korean collegiate athletes (198 males and 105 females) from 9 sports participated in the study, and they completed the demographic questionnaire and the SAS-2Kr containing 15 items to measure multidimensional trait anxiety and individual differences in the cognitive and somatic anxiety experienced by athletes. The results of this study indicated that the construct validity in the SAS-2Kr was well established in that the values of the standardized factor loadings, composite reliability, and average variance extracted values were above the recommended cutoff points.
View Article and Find Full Text PDFWhen categorical ordinal item response data are collected over multiple timepoints from a repeated measures design, an item response theory (IRT) modeling approach whose unit of analysis is an item response is suitable. This study proposes a few longitudinal IRT models and illustrates how a popular compensatory multidimensional IRT model can be utilized to formulate such longitudinal IRT models, which permits an investigation of ability growth at both individual and population levels. The equivalence of an existing multidimensional IRT model and those longitudinal IRT models is also elaborated so that one can make use of an existing multidimensional IRT model to implement the longitudinal IRT models.
View Article and Find Full Text PDFThe effect of guessing on the point estimate of coefficient alpha has been studied in the literature, but the impact of guessing and its interactions with other test characteristics on the interval estimators for coefficient alpha has not been fully investigated. This study examined the impact of guessing and its interactions with other test characteristics on four confidence interval (CI) procedures for coefficient alpha in terms of coverage rate (CR), length, and the degree of asymmetry of CI estimates. In addition, interval estimates of coefficient alpha when data follow the essentially tau-equivalent condition were investigated as a supplement to the case of dichotomous data with examinee guessing.
View Article and Find Full Text PDFThis study explored the utility of logistic mixed models for the analysis of differential item functioning when item response data were testlet-based. Decomposition of differential item functioning (DIF) into item level and testlet level for the testlet-based data was introduced to separate possible sources of DIF: (1) an item, (2) a testlet, and (3) both the item and the testlet. Simulation study was conducted to investigate the performance of several logistic mixed models as well as the Mantel-Haenszel method under the conditions, in which the item-related DIF and testlet-related DIF were present simultaneously.
View Article and Find Full Text PDFAppl Psychol Meas
June 2015
Guessing is known to influence the test reliability of multiple-choice tests. Although there are many studies that have examined the impact of guessing, they used rather restrictive assumptions (e.g.
View Article and Find Full Text PDFAppl Psychol Meas
March 2015
The use of mixture item response theory modeling is exemplified typically by comparing item profiles across different latent groups. The comparisons of item profiles presuppose that all model parameter estimates across latent classes are on a common scale. This note discusses the conditions and the model constraint issues to establish a common scale across latent classes.
View Article and Find Full Text PDFBehav Res Methods
September 2015
A differential item functioning (DIF) decomposition model separates a testlet item DIF into two sources: item-specific differential functioning and testlet-specific differential functioning. This article provides an alternative model-building framework and estimation approach for a DIF decomposition model that was proposed by Beretvas and Walker (2012). Although their model is formulated under multilevel modeling with the restricted pseudolikelihood estimation method, our approach illustrates DIF decomposition modeling that is directly built upon the random-weights linear logistic test model framework with the marginal maximum likelihood estimation method.
View Article and Find Full Text PDFBr J Math Stat Psychol
February 2015
This study investigated differential item functioning (DIF) mechanisms in the context of differential testlet effects across subgroups. Specifically, we investigated DIF manifestations when the stochastic ordering assumption on the nuisance dimension in a testlet does not hold. DIF hypotheses were formulated analytically using a parametric marginal item response function approach and compared with empirical DIF results from a unidimensional item response theory approach.
View Article and Find Full Text PDFWhen differential item functioning (DIF) is investigated, DIF classification is made using statistical test results and estimated DIF sizes in practice. One of the well-known DIF classifications is that of the Educational Testing Service (ETS) A (negligible DIF), B (medium DIF), and C (large DIF) rules. This article provides a clarifying note on (a) a sketch of the proof of the asymptotic normality of what is known as the Mantel-Haenszel (MH) delta, which provides the basis of a point and an interval null hypothesis test based on the MH delta, and (b) how to conduct an interval null hypothesis test using the MH delta, which is necessary for the C DIF classification.
View Article and Find Full Text PDFThe use of IRT models has not been rigorously applied in studies of the relationship between test-takers' confidence and accuracy. This study applied the Rasch measurement models to investigate the relationship between test-takers' confidence and accuracy on English proficiency tests, proposing potentially useful measures of under or overconfidence. The Rasch approach provided the scaffolding to formulate indices that can assess the discrepancy between confidence and accuracy at the item or total test level, as well as at particular ability levels locally.
View Article and Find Full Text PDFThis study is designed to investigate a multidimensional structure of academic achievement goal orientations from a diagnostic perspective, using the Rasch measurement models. A data set of Korean students who responded to the Patterns of Adaptive Learning Survey (PALS) was analyzed. Both consecutive unidimensional and multidimensional Rasch measurement models were applied for comparative purposes.
View Article and Find Full Text PDFThis paper proposes an emergency locking unit (ELU) for a seat belt retractor which is mounted on the back frame of a vehicle seat. The proposed unit uses a recliner sensor based on a MEMS acceleration sensor and solenoid mechanism. The seat has an upper frame supported to tilt on a lower frame.
View Article and Find Full Text PDFThe current Rasch testlet model (RT) assumes independence of the testlet effect and the target dimension. This article investigated the impact of the violation of that assumption on RT and the performance of an extended Rasch testlet model (ET) in which the random parameter variance-covariance matrix is estimated without any constraints. Our simulation results showed that ET was the same or superior to RT in its performance.
View Article and Find Full Text PDFThis study applies two approaches in creating a single scale from two separate statewide exams (Golden State Math Exam and California Standard Math Test) and compares some aspects of the two statewide tests. The first analysis involves a sequence of unidimensional Rasch scalings, using anchored items to scale the two tests together. The second analysis employs a 2-dimensional Rasch scaling using previous unidimensional analysis results to link the scales.
View Article and Find Full Text PDFThe Rasch model-based vertical scaling was evaluated by simulation study with respect to recovery of item parameter, linking constant, population mean (grade-to-grade growth), population standard deviation (grade-to-grade variability), and separation of grade distributions by effect size. The simulated vertical scale had five different grades with five different test levels. Controlled factors were data collection design, linking methods, and sample size.
View Article and Find Full Text PDFSince the 1970s, much attention has been devoted to the male advantage in standardized mathematics tests in the United States. Although girls are found to perform equally well as boys in math classes, they are consistently outperformed on standardized math tests. This study compared the males and females in the United States, all 15-year-olds, by their performance on the PISA 2003 mathematics assessment.
View Article and Find Full Text PDFWhen a new set of mixed format items is augmented with a previous old multiple-choice (MC) test, those mixed format items should be linked to the existing old MC test. This study used simulation to investigate sample size effect on recovery of known item parameter from the concurrent calibration in the context of horizontal equating, where the new mixed format tests are equated to the existing MC test which acts as the common linking items. In the partial credit model following the Andrich style parameterization, item location and item step parameters were differentially affected by the sample size.
View Article and Find Full Text PDFJ Acoust Soc Am
January 2005
The electroacoustic efficiency of high-power actuators used in thermoacoustic coolers may be estimated using a linear model involving a combination of six parameters. A method to identify these equivalent driver parameters from measured total electrical impedance and velocity-voltage transfer function data was developed. A commercially available, moving-magnet driver coupled to a functional thermoacoustic cooler was used to demonstrate the procedure experimentally.
View Article and Find Full Text PDF