Full Length Article
DOI: https://doi.org/10.54216/IJAIET.050106
ChatGPT as an Assessment Design Tool in Higher Education: Evaluating Item Quality, Bloom’s Taxonomy Coverage, and Faculty Acceptance Across Academic Disciplines
The emergence of large language models capable of generating coherent, contextually grounded text at scale has created a new and contested tool for higher education assessment design: instructors can now produce examination questions, assignment prompts, and feedback rubrics in seconds rather than hours. Whether the items produced by these systems meet the quality standards required for valid, reliable, and pedagogically appropriate higher education assessment is an empirical question that the literature has only partially addressed. This paper reports a three-study investigation of ChatGPT as an assessment design tool in higher education, covering item quality, cognitive level coverage, student performance, and faculty acceptance. Study 1 presents an expert-panel evaluation of 360 assessment items—180 generated by ChatGPT and 180 created by experienced instructors across six academic disciplines and four item types, rated on seven quality dimensions including content accuracy, Bloom’s taxonomy alignment, linguistic clarity, and originality. Study 2 reports a faculty survey of 186 instructors examining adoption rates, perceived benefits, concerns, and the predictors of acceptance. Study 3 compares the performance of 412 students on counterbalanced ChatGPT-generated and instructor-created assessment items. ChatGPT-generated items score significantly below instructor-created items on Bloom’s taxonomy alignment and originality, but perform comparably or above on linguistic clarity and difficulty calibration. Student performance is modestly but significantly higher on ChatGPT-generated items, a finding that challenges simple assumptions about AI-generated assessment difficulty. Academic integrity concerns and higher-order cognitive coverage are the dominant faculty concerns, while time savings—averaging 77% reduction in item-writing time—is the most consistently cited benefit. The paper contributes a validated multi-dimensional item quality framework, a faculty acceptance model, and eight evidence-based guidelines for the responsible integration of ChatGPT in assessment design workflows.
Nadia Iftikhar,
Rabia Muslu
visibility
376
download
95