The item wording (or keying) effect consists of logically inconsistent answers to positively and negatively worded items that tap into similar (but polarly opposite) content. Previous research has shown that this effect can be successfully modeled through the random intercept item factor analysis (RIIFA) model, as evidenced by the improvements in the model fit in comparison to models that only contain substantive factors. However, little is known regarding the capability of this model in recovering the uncontaminated person scores.
View Article and Find Full Text PDFCognitive diagnosis models (CDMs) allow classifying respondents into a set of discrete attribute profiles. The internal structure of the test is determined in a Q-matrix, whose correct specification is necessary to achieve an accurate attribute profile classification. Several empirical Q-matrix estimation and validation methods have been proposed with the aim of providing well-specified Q-matrices.
View Article and Find Full Text PDFDecisions on how to calibrate an item bank might have major implications in the subsequent performance of the adaptive algorithms. One of these decisions is model selection, which can become problematic in the context of cognitive diagnosis computerized adaptive testing, given the wide range of models available. This article aims to determine whether model selection indices can be used to improve the performance of adaptive tests.
View Article and Find Full Text PDFThe Q-matrix identifies the subset of attributes measured by each item in the cognitive diagnosis modelling framework. Usually constructed by domain experts, the Q-matrix might contain some misspecifications, disrupting classification accuracy. Empirical Q-matrix validation methods such as the general discrimination index (GDI) and Wald have shown promising results in addressing this problem.
View Article and Find Full Text PDFIn the context of cognitive diagnosis models (CDMs), a Q-matrix reflects the correspondence between attributes and items. The Q-matrix construction process is typically subjective in nature, which may lead to misspecifications. All this can negatively affect the attribute classification accuracy.
View Article and Find Full Text PDFCognitive diagnosis models (CDMs) are latent class multidimensional statistical models that help classify people accurately by using a set of discrete latent variables, commonly referred to as attributes. These models require a Q-matrix that indicates the attributes involved in each item. A potential problem is that the Q-matrix construction process, typically performed by domain experts, is subjective in nature.
View Article and Find Full Text PDFCurrently, there are two predominant approaches in adaptive testing. One, referred to as cognitive diagnosis computerized adaptive testing (CD-CAT), is based on cognitive diagnosis models, and the other, the traditional CAT, is based on item response theory. The present study evaluates the performance of two item selection rules (ISRs) originally developed in the CD-CAT framework, the double Kullback-Leibler information (DKL) and the generalized deterministic inputs, noisy "and" gate model discrimination index (GDI), in the context of traditional CAT.
View Article and Find Full Text PDFThis paper presents a new two-dimensional Multiple-Choice Model accounting for Omissions (MCMO). Based on Thissen and Steinberg multiple-choice models, the MCMO defines omitted responses as the result of the respondent not knowing the correct answer and deciding to omit rather than to guess given a latent propensity to omit. Firstly, using a Monte Carlo simulation, the accuracy of the parameters estimated from data with different sample sizes (500, 1,000, and 2,000 subjects), test lengths (20, 40, and 80 items) and percentages of omissions (5, 10, and 15%) were investigated.
View Article and Find Full Text PDFThis study analyses the extent to which cheating occurs in a real selection setting. A two-stage, unproctored and proctored, test administration was considered. Test score inconsistencies were concluded by applying a verification test (Guo and Drasgow Z-test).
View Article and Find Full Text PDFAn early step in the process of construct validation consists of establishing the fit of an unrestricted "exploratory" factorial model for a prespecified number of common factors. For this initial unrestricted model, researchers have often recommended and used fit indices to estimate the number of factors to retain. Despite the logical appeal of this approach, little is known about the actual accuracy of fit indices in the estimation of data dimensionality.
View Article and Find Full Text PDFBackground: The Exploratory Factor Analysis (EFA) procedure is one of the most commonly used in social and behavioral sciences. However, it is also one of the most criticized due to the poor management researchers usually display. The main goal is to examine the relationship between practices usually considered more appropriate and actual decisions made by researchers.
View Article and Find Full Text PDFTest security can be a major problem in computerized adaptive testing, as examinees can share information about the items they receive. Of the different item selection rules proposed to alleviate this risk, stratified methods are among those that have received most attention. In these methods, only low discriminative items can be presented at the beginning of the test and the mean information of the items increases as the test goes on.
View Article and Find Full Text PDFBackground: Criterion-referenced interpretations of tests are highly necessary, which usually involves the difficult task of establishing cut scores. Contrasting with other Item Response Theory (IRT)-based standard setting methods, a non-judgmental approach is proposed in this study, in which Item Characteristic Curve (ICC) transformations lead to the final cut scores.
Method: eCat-Listening, a computerized adaptive test for the evaluation of English Listening, was administered to 1,576 participants, and the proposed standard setting method was applied to classify them into the performance standards of the Common European Framework of Reference for Languages (CEFR).
Previous research evaluating the performance of Horn's parallel analysis (PA) factor retention method with ordinal variables has produced unexpected findings. Specifically, PA with Pearson correlations has performed as well as or better than PA with the more theoretically appropriate polychoric correlations. Seeking to clarify these findings, the current study employed a more comprehensive simulation study that included the systematic manipulation of 7 factors related to the data (sample size, factor loading, number of variables per factor, number of factors, factor correlation, number of response categories, and skewness) as well as 3 factors related to the PA method (type of correlation matrix, extraction method, and eigenvalue percentile).
View Article and Find Full Text PDFIn this study, eCAT-Listening, a new computerized adaptive test for the evaluation of English Listening, is described. Item bank development, anchor design for data collection, and the study of the psychometric properties of the item bank and the adaptive test are described. The calibration sample comprised 1.
View Article and Find Full Text PDFIn computerized adaptive testing, the most commonly used valuating function is the Fisher information function. When the goal is to keep item bank security at a maximum, the valuating function that seems most convenient is the matching criterion, valuating the distance between the estimated trait level and the point where the maximum of the information function is located. Recently, it has been proposed not to keep the same valuating function constant for all the items in the test.
View Article and Find Full Text PDFApplications of Item Response Theory require assessing the agreement between observations and model predictions at the item level. This paper compares approaches applied to polytomous scored items in a simulation study. Three fit-indexes are calculated: traditional chi-square index obtained by grouping examinees according to their estimated trait, an alternative that uses posterior distribution of trait and the third method, in which examinees are grouped according their observed total scores.
View Article and Find Full Text PDFThis paper has two objectives: (a) to provide a clear description of three methods for controlling the maximum exposure rate in computerized adaptive testing -the Symson-Hetter method, the restricted method, and the item-eligibility method- showing how all three can be interpreted as methods for constructing the variable sub-bank of items from which each examinee receives the items in his or her test; (b) to indicate the theoretical and empirical limitations of each method and to compare their performance. With the three methods, we obtained basically indistinguishable results in overlap rate and RMSE (differences in the third decimal place). The restricted method is the best method for controlling exposure rate, followed by the item-eligibility method.
View Article and Find Full Text PDFIf examinees were to know, beforehand, part of the content of a computerized adaptive test, their estimated trait levels would then have a marked positive bias. One of the strategies to avoid this consists of dividing a large item bank into several sub-banks and rotating the sub-bank employed (Ariel, Veldkamp & van der Linden, 2004). This strategy permits substantial improvements in exposure control at little cost to measurement accuracy, However, we do not know whether this option provides better results than using the master bank with greater restriction in the maximum exposure rates (Sympson & Hetter, 1985).
View Article and Find Full Text PDFBr J Math Stat Psychol
November 2008
The most commonly employed item selection rule in a computerized adaptive test (CAT) is that of selecting the item with the maximum Fisher information for the estimated trait level. This means a highly unbalanced distribution of item-exposure rates, a high overlap rate among examinees and, for item bank management, strong pressure to replace items with a high discrimination parameter in the bank. An alternative for mitigating these problems involves, at the beginning of the test, basing item selection mainly on randomness.
View Article and Find Full Text PDF