A Systematic Review of AI-Powered Uzbek Short-Answer

Grading Using NLP and Teacher-Annotated Datasets

Sanjar Raximjonov1,* Eugene Q. Castro1

1 Department of Computer Science, Central Asian University, Tashkent, Uzbekistan

Emails: 220304@centralasian.uz · e.castro@centralasian.uz

Received: November 04, 2025 Revised: December 11, 2025 Accepted: January 18, 2026 ⋆ Corresponding author

ABSTRACT

This paper presents a Systematic Literature Review (SLR) of AI-powered automated short-answer grading, with

a particular focus on low-resource languages such as Uzbek. The review follows the PRISMA 2020 guidelines to

ensure transparency and methodological rigor. Relevant peer-reviewed studies published between 2018 and 2025

were systematically identified, screened, and analyzed across multiple academic databases. In total, 33 studies were

included in the final synthesis. The reviewed literature indicates that transformer-based models, including mBERT

and XLM-R, generally achieve stronger performance than traditional machine learning approaches, while recent

large language models show potential in few-shot and zero-shot grading scenarios. The findings also highlight that

the limited availability of teacher-annotated datasets remains a major challenge for developing reliable automated

grading systems in low-resource educational contexts.

Keywords: Automated Short-Answer Grading Natural Language Processing Transformer Models Low-Resource

Languages Uzbek Language Systematic Literature Review

1. INTRODUCTION

The rapid growth of digital education has increased the demand

for scalable, consistent, and reliable assessment methods.

In many educational systems, including Uzbekistan,

short-answer questions are widely used to evaluate students’

conceptual understanding and reasoning skills. However,

grading such responses is still predominantly performed manually,

which is time-consuming, subjective, and difficult to

scale for large classes. These limitations often lead to inconsistencies

in grading quality and delays in feedback delivery,

which can negatively affect the learning process [1, 2].

Recent advances in artificial intelligence (AI) and natural

language processing (NLP) have enabled the development

of automated assessment systems capable of analyzing and

scoring textual responses. In high-resource languages such as

English, automated short-answer grading systems based on

machine learning and deep learning techniques have demonstrated

strong performance and high agreement with human

graders. In particular, transformer-based models, including

BERT and its multilingual variants, have significantly improved

semantic understanding by capturing contextual representations

of student responses [3].

Despite these advancements, low-resource languages such

as Uzbek remain largely underexplored in the context of

automated short-answer grading. One of the primary challenges

is the limited availability of teacher-annotated datasets,

which are essential for training and evaluating supervised

learning models. Furthermore, the morphological richness

of the Uzbek language and the lack of standardized evaluation

benchmarks make it difficult to directly transfer models

developed for high-resource languages.

Existing studies have proposed a wide range of automated

grading approaches, including traditional machine learning