Pdf this study aims to investigate the performance of test equating methods extended to mixedformat tests within the framework of item. Estimating irt equating coefficients for mixed format tests. Observed score equating for mixed format tests using a simplestructure multidimensional irt framework. Test score equating is used to compare different test scores from different test forms. The r package plink has been developed to facilitate the linking of mixed format tests for multiple groups under a common item design using unidimensional and multidimensional irtbased methods.
Simplestructure multidimensional item response theory. Evaluating equating properties for mixed format tests by yi he an abstract of a thesis submitted in partial fulfillment of the requirements for the doctor of philosophy degree in psychological and quantitative foundations educational measurement and statistics in the graduate college of the university of iowa may 2011. Combinations of different item formats often allow for the measurement of a broader set of skills than the use of a single format. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. Data were simulated according to a twodimensional noncompensatory irt model for both equivalent and nonequivalent groups designs. It also includes functions for importing item andor ability parameters from common irt software, conducting irt truescore and observedscore equating, and. One of these issues in linking mixed item format tests is score comparability across test administration years.
Equating mixedformat tests with format representative and nonrepresentative common items, in mixedformat tests. Comparison of irt linking and equating methods with mixedformat tests. Several methods have been developed to conduct equating. The impact of equating method and format representation of anchor items on the adequacy of mixedformat test equating using nonequivalent groups unpublished doctoral dissertation. In this article, the results of a simulation study comparing the performance of separate and concurrent estimation of a unidimensional item response theory irt model applied to multidimensional noncompensatory data are reported. Item response theory irt truescore equating tse and irt observedscore equating ose methods were used under commonitem nonequivalent groups. A comparison of equating linking using the stockinglord method and concurrent calibration with mixed format tests in the nonequivalent groups commonitem design under irt.
Equating of mixedformat tests under a cineg design can be influenced by factors such as attributes of the test, the commonitem set, and examinees. The effect of mini and midi anchor tests on test equating. Common item nonequivalent groups equating design was used, hi this study. A mixedformat test is a test containing a mixture of different item formats e. This paper illustrates that the psychometric properties of scores and scales that are used with mixed. An item response theorybased equating method is proposed for the longterm scale maintenance of a mixed format test consisting of constructed response items and multiple choice items. As with other standardized tests, these mixedformat tests must be equated to ensure equivalence of scores across test forms. Comparison of test equating methods based on item response. Psychometric properties with a primary focus on equating vol. In addition to statistical procedures, successful equating, scaling and linking involves many aspects of testing, including procedures to develop tests, to administer and score tests and to interpret. The traditional linking method often applied to linking test forms. The proposed method is a modification of the traditional commonitem nonequivalent groups design. There has beea steady increase in the n use of mixedformat tests, that is, tests.
Irteq windows application that implements irt scaling and. Document resume li, yuan h lissitz, robert w yang, yu. Examining two strategies to link mixedformat tests using. Pdf effect of noncompensatory multidimensionality on. Computer programs college of education university of iowa. Methods and practices is a welcome update to a book which has become a classic in equating and linking. Chapter 5 examines the influence of irt calibration programs on irt equating results for mixedformat tests using some of the same data sets used in chapter 4. New material includes model determination in loglinear smoothing, indepth presentation of chained linear and equipercentile equating, equating criteria, test scoring and a new section on scores for mixedformat tests. A comparison of irt observed score kernel equating and. The purpose of this study was to examine the impact of dimensionality, commonitem set format, and different scale linking methods on preserving equity property with mixed format test equating. Combinations of different item formats often allow for the measurement of a broader set of skills. In almost all highstakes testing programs, test equating is necessary to ensure that test scores across multiple test administrations are equivalent and can. Mixedformat tests university of iowa college of education.
Data sets from this book are included with some of the programs. Perhaps most often, equating occurs in the context of the nonequivalent groups with anchor test neat design, in which a set. Mixedformat tests often are considered to be superior to tests containing only mc items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under the commonitem nonequivalent groups design cineg. The unidimensional methods include the meanmean, meansigma, haebara, and stockinglord methods. Items in the both forms such as a and b form of a test are referred as anchorcommon items. One challenge for mixed format test equating using irt methods with cineg design is how to extend traditional irt equating procedures that were originally developed for single format tests to those appropriate for mixed format tests. This option is particularly useful when a mixed format test form is to be simulated. This introduction to the r package plink is a slightly modified version of weeks 2010, published in the journal of statistical software. Using data simulated with empirical parameters from a statewide testing program, baldwin and baldwin studied the effect of anchor test length on the recovery of item parameters and increase in ability across four administrations in a mixed. Windows software that generates irt parameters and. Evaluating equating properties for mixedformat tests.
Test equating design frequently used in test equating studies is a. Jun 01, 2011 this paper illustrates that the psychometric properties of scores and scales that are used with mixed. This function conducts separate calibration of unidimensional or multidimensional irt singleformat or mixedformat item parameters for multiple groups. In this design, there are two different group tests a, b. Irt scale linking methods for mixedformat tests act research report 20045.
An example of generating item data, where 15 dichotomous items based on 3plm and 5 polytomous items based on grm 5 categories are generated, is shown in figure 3. This function supports all item response models available in plink with the exception of the multiplechoice model. Irt scale linking methods for mixedformat tests act research report 2004 5. Test equating secures the comparability of test scores across different test administrationsforms. This function conducts irt true score and observed score equating for unidimensional single format or mixed format item parameters for two or more groups. Observed score equating for mixedformat tests using a simplestructure multidimensional irt framework.
A mixed format test is a test containing a mixture of different item formats e. This changes output in the following tables listed in table 56. An alternative to the trend scoring method for adjusting scoring shifts in mixedformat tests test equating with constructed response items and mixedformat tests. In the present study, such a test is referred to as a mixedformat test.
Misfit in the context of test equating with mixedformat test data. In the third edition, each chapter contains a reference list, rather than having a single reference list at the end of the volume. Few, if any, studies to date have been conducted on the focus of the test level misfit with mixedformat test data, which is a typical case in operational assessment programs nowadays. An evaluation of linking methods in the presence of year to.
Document resume li, yuan h lissitz, robert w yang, yu nu. The formats of items in a mixedformat test are usually categorized into two classes. A comparison of equatinglinking using the stockinglord method and concurrent calibration with mixedformat tests in the nonequivalent groups commonitem design under irt. Paper presented at annual meeting of the national council on measurement in education, national council on measurement in education. The book is appealing to anyone interested in the topic of equating, scaling, and linking. As stated in the preface of the first volume, beginning in 2007 and continuing through 2011, with funding from the college board. For practitioners, the book provides a splendid introduction to the topics considered. The impact of equating method and format representation of. The use of multiple formats presents a number of measurement challenges, one of which is how to adequately. Evaluating equating properties for mixedformat tests by yi he an abstract of a thesis submitted in partial fulfillment of the requirements for the doctor of philosophy degree in psychological and quantitative foundations educational measurement and statistics in the graduate college of the university of iowa may 2011.
Today, irtbased linking is the most commonly used approach for developing vertical scales, and it is being used increasingly for equating particularly in the development of calibrated item banks. Pdf comparison of item response theory test equating methods. Mixedformat tests often are considered to be superior to tests containing only mc items although the use of multiple item formats leads to measurement challenges in the context of equating. The impact of test dimensionality, commonitem set format. This function conducts irt true score and observed score equating for unidimensional singleformat or mixedformat item parameters for two or more groups. In the present study, such a test is referred to as a mixed format test. Practical consequences of item response theory model. Windows pc console and graphical user interface gui versions and macintosh os9 console and os10 gui versions are available for at least some of the. Test scaling is the process of developing score scales that are used when scores on standardized tests are reported. When you specify the empirical option, proc mixed adjusts all standard errors and test statistics involving the fixedeffects parameters. Comparison of item response theory test equating methods for. With the help of irteq han, 2007 test equating program, equation equity. Estimating irt equating coefficients for mixedformat tests.
Contrast, corrb, covb, diffs, estimates, invcovb, lsmeans, slices, solutionf, tests1tests3. The results from the study conducted by donoghue 1994 indicated that, on average. Linking mixedformat tests using irtbased methods in r this became possible. The new edition of test equating, scaling, and linking. This study aims to in vestigate the performance of test equating methods extended to mixedformat tests within the framework o f item response theory irt. This paper compares three methods of item calibrationconcurrent calibration, separate calibration with linking, and fixed item parameter calibration that are frequently used for linking item parameters to a base scale. Irt scale linking methods for mixedformat tests1 introduction a test containing a mixture of different item formats is often used in both classroom and largescale assessments.
Application that implements irt scaling and equating computer program. The noncommercial software r is used throughout the book to illustrate how to perform different equating methods when scores data are collected under different data collection designs, such as equivalent groups design, single group design, counterbalanced design and non equivalent. Mixedformat test equating drum university of maryland. The program adopts a matrixsample external anchor equating design and employs mixedformat test data which contain dichotomously scored. New material includes model determination in loglinear smoothing, indepth presentation of chained linear and equipercentile equating, equating criteria, test scoring and. Practical consequences of item response theory model misfit. This book provides an introduction to test equating, scaling and linking, including those concepts and practical issues that are critical for developers and all other testing professionals.
This function conducts separate calibration of unidimensional or multidimensional irt single format or mixed format item parameters for multiple groups. The stocking and lord 1983 characteristic curve method of parameter linking was used in conjunction. Thus, a demand for a computer program that is more generalized and powerful for various uses in research and test development has grown in the field, and as a result, a window application. The purpose of this study was to examine the impact of dimensionality, commonitem set format, and different scale linking methods on preserving equity property with mixedformat test equating. Provides a simple common interface to the estimation of item parameters in irt models for binary responses with three different programs icl, bilogmg, and ltm, and a variety of functions useful with irt models. Simple interface to the estimation and plotting of irt models. An r package for linking mixedformat tests using irt. The r package plink has been developed to facilitate the linking of mixedformat tests for multiple groups under a common item design using unidimensional and multidimensional irtbased methods. Chapter 3 is also a simulation study that compares the equating of mixedformat tests using commonitem sets that contain solely of mc items to commonitem sets that contain both mc and fr items. As mentioned above, irt equating procedures have been well developed for single format tests. Equating for longterm scale maintenance of mixed format. Irt scale linking methods for mixed format tests1 introduction a test containing a mixture of different item formats is often used in both classroom and largescale assessments.
For example, available software cannot handle all the popular irt models being applied to test data, and cannot handle some of the popular equating designs. Center for advanced studies in measurement and assessment, university of. Linking mixed format tests using irtbased methods in r this became possible. Laboratory of psychometric and evaluative research report406. Frontiers practical consequences of item response theory. Bifactor mirt observedscore equating for mixedformat tests. Psychometric properties of raw and scale scores on mixed. One challenge for mixedformat test equating using irt methods with cineg design is how to extend traditional irt equating procedures that were originally developed for singleformat tests to those appropriate for mixedformat tests. As mentioned above, irt equating procedures have been well developed for singleformat tests. This study aims to in vestigate the performance of test equating methods extended to mixed format tests within the framework o f item response theory irt. The package also includes functions for importing item andor ability parameters from common irt software, conducting irt true score and observed score equating, and plotting item response curvessurfaces, vector plots, information plots, and comparison plots for examining parameter drift. This study provides new evidence on the performance of different irt models in equating tests.
The flexmirt irt software package fits a variety of unidimensional and multidimensional item response theory models also known as item factor analysis models to singlelevel and multilevel data in any number of groups. Psychometric properties with a primary focus on equating volume 1. Mixedformat tests containing both multiplechoice mc items and constructedresponse cr items are used in many testing programs. Mixedformat tests, containing both multiplechoice and freeresponse items, are widely and increasingly used in many largescale testing programs kolen. Equating of mixed format tests under a cineg design can be influenced by factors such as attributes of the test, the commonitem set, and examinees. Effects of test dimensionality and commonitem sets by yi cao dissertation submitted to the faculty of the graduate school of the university of maryland, college park, in partial fulfillment of the requirements for the degree of doctor of philosophy 2008 advisory committee. New material includes model determination in loglinear smoothing, indepth presentation of chained linear and equipercentile equating, equating criteria, test scoring and a new section on scores for mixed format tests. The r package plink has been developed to facilitate the linking of mixedformat tests for multiple groups under a common item design using unidimensional and multidimensional irt. One common equating design used in linking or equating tests from year to year is item response theory irt scaling using a nonequivalent, common item equating design. Test equating methods are used with many standardized tests in education and psychology to ensure that scores from multiple test forms can be used interchangeably. Effects of test dimensionality and commonitem sets.