Are We Measuring Ability or Guessing?: CTT and IRT Evidence from a Multiple-Choice Assessment in Econometric Test

Authors

  • Ajeng Wahyuni Universitas Islam Negeri Kiai Ageng Muhammad Besari Ponorogo, Indonesia
  • Yunaita Rahmawati Universitas Islam Negeri Kiai Ageng Muhammad Besari Ponorogo, Indonesia
  • Maulida Nurhidayati Universitas Islam Negeri Kiai Ageng Muhammad Besari Ponorogo, Indonesia
  • Muhtadin Amri Universitas Islam Negeri Kiai Ageng Muhammad Besari Ponorogo, Indonesia

DOI:

https://doi.org/10.18326/hipotenusa.v8i1.7088

Keywords:

classical test theory, item response theory, pseudo-guessing, item discrimination, econometrics assessment

Abstract

Multiple-choice tests are widely used in mathematics-related higher education courses because they are practical for assessing broad learning outcomes. Correct responses may not always indicate full conceptual mastery, as students may answer correctly through partial knowledge, distractor elimination, unintended item cues, or pseudo-guessing. This study evaluates the quality of a 20-item four-option multiple-choice econometrics assessment using Classical Test Theory (CTT) and Item Response Theory (IRT). The test was administered to 108 undergraduate students and was designed to measure econometrics competence as an applied mathematics construct, including quantitative and statistical reasoning, regression and model interpretation, hypothesis testing and inference, model assumptions and diagnostics, and data-based decision-making. CTT was used to examine item difficulty, item discrimination, while IRT was used to compare the 1PL, 2PL, and 3PL models and to diagnose pseudo-guessing. The results showed a mean score of 13.85 out of 20, KR-20 of 0.681, and Cronbach’s alpha of 0.677, indicating moderate but not strong internal consistency. CTT identified no difficult items, nine easy items, and three items with poor discrimination. The 1PL model had the lowest BIC and was therefore the most fit model, while the 3PL model was retained diagnostically because it estimates pseudo-guessing. Eight items, namely I01, I05, I06, I12, I15, I16, I18, and I20, had pseudo-guessing parameters above 0.25. These findings suggest that some correct responses may have been influenced by non-mastery factors. This study contributes to mathematics education by demonstrating how integrated CTT and IRT diagnostics can improve the validity of econometrics assessment as a measure of quantitative and statistical reasoning.

References

Ajilore, O. (2006). Econometric Issues in Education Finance. Review of Regional Studies, 36(2). https://doi.org/10.52324/001c.8317

Andrich, D., & Marais, I. (2019). Classical Test Theory (pp. 29–39). Springer Nature Singapore. https://doi.org/10.1007/978-981-13-7496-8_3

Brown, G. (2016). Item Response Theory: Complicated but better. Figshare. https://doi.org/10.17608/k6.auckland.3827082.v4

Fergadiotis, G., Casilio, M., Dickey, M. W., Steel, S., Nicholson, H., Fleegle, M., Swiderski, A., & Hula, W. D. (2023). Item Response Theory Modeling of the Verb Naming Test. Journal of Speech, Language, and Hearing Research, 66(5), 1718–1739. https://doi.org/10.1044/2023_jslhr-22-00458

Frey, F. (2020). Test Theory and Classical Test Theory. In The International Encyclopedia of Media Psychology (pp. 1–6). Wiley. https://doi.org/10.1002/9781119011071.iemp0047

Fuhrman, M. (1996). Developing Good Multiple-Choice Tests and Test Questions. Journal of Geoscience Education, 44(4), 379–384. https://doi.org/10.5408/1089-9995-44.4.379

Haladyna, T. (2022). Creating multiple-choice items for testing student learning. International Journal of Assessment Tools in Education, 9(Special Issue), 6–18. https://doi.org/10.21449/ijate.1196701

Hambleton, R., Swaminathan, H., & Rogers, H. (1992). Fundamentals of item response theory. Choice Reviews Online, 29(07), 29–4185. https://doi.org/10.5860/choice.29-4185

Harris, D. J. (2023). Theory and Principles of Educational Measurement (pp. 27–45). Routledge. https://doi.org/10.4324/9781003444534-3

Jeter, R., Chamberlain, D., & Rozier, K. (2024). An Integrated Methodology for Assessing Item Discrimination in Mathematics Assessments. Center for Open Science. https://doi.org/10.31235/osf.io/xvh7y

Newton, P. E. (2005). The public understanding of measurement inaccuracy. British Educational Research Journal, 31(4), 419–442. https://doi.org/10.1080/01411920500148648

Reise, S. P., & Revicki, D. A. (2014). Handbook of Item Response Theory Modeling. Routledge. https://doi.org/10.4324/9781315736013

Scheuneman, J. D., & Steinhaus, K. S. (1987). A Theoretical Framework For The Study Of Item Difficulty And Discrimination. ETS Research Report Series, 1987(2), i–35. https://doi.org/10.1002/j.2330-8516.1987.tb00248.x

Schmidt, K. M., & Embretson, S. E. (2012). Item Response Theory and Measuring Abilities. In Handbook of Psychology, Second Edition. John Wiley Sons. https://doi.org/10.1002/9781118133880.hop202016

Schmidt, S., Zlatkin-Troitschanskaia, O., & Shavelson, R. J. (2023). Modeling and Measuring Domain-Specific Quantitative Reasoning in Higher Education Business and Economics. Frontline Learning Research, 11(1), 40–56. https://doi.org/10.14786/flr.v11i1.885

Serbenyuk, S. (2021). On Some Aspects of the Examination in Econometrics. Journal of Vasyl Stefanyk Precarpathian National University, 8(3), 7–16. https://doi.org/10.15330/jpnu.8.3.7-16

Susanto, H. P., Abadi, A. M., ‎, H., Retnawati, H., Ali, R. M., & Djidu, H. (2025). Development of irtawsi: A User-Friendly R Package for IRT Analysis. JP3I (Jurnal Pengukuran Psikologi Dan Pendidikan Indonesia), 14(1), 1–23. https://doi.org/10.15408/jp3i.v14i1.32091

Wu, M. (2012). Using Item Response Theory as a Tool in Educational Measurement (pp. 157–185). Springer Netherlands. https://doi.org/10.1007/978-94-007-4507-0_9

Xie, L., & Liu, X. (2025). Exploring the Role of Response Time in Item Response Theory: Rethinking the PISA 2022 Creative Thinking Assessment. The Journal of Creative Behavior, 59(4). https://doi.org/10.1002/jocb.70072

Downloads

Published

2026-06-30

How to Cite

Ajeng Wahyuni, Yunaita Rahmawati, Maulida Nurhidayati, & Muhtadin Amri. (2026). Are We Measuring Ability or Guessing?: CTT and IRT Evidence from a Multiple-Choice Assessment in Econometric Test. Hipotenusa: Journal of Mathematical Society, 8(1), 104–116. https://doi.org/10.18326/hipotenusa.v8i1.7088