A Systematic Review of AI-Powered Uzbek Short-Answer
Grading Using NLP and Teacher-Annotated Datasets
Sanjar Raximjonov1,* Eugene Q. Castro1
1 Department of Computer Science, Central Asian University, Tashkent, Uzbekistan
Emails: 220304@centralasian.uz · e.castro@centralasian.uz
Received: November 04, 2025 Revised: December 11, 2025 Accepted: January 18, 2026 ⋆ Corresponding author
ABSTRACT
This paper presents a Systematic Literature Review (SLR) of AI-powered automated short-answer grading, with
a particular focus on low-resource languages such as Uzbek. The review follows the PRISMA 2020 guidelines to
ensure transparency and methodological rigor. Relevant peer-reviewed studies published between 2018 and 2025
were systematically identified, screened, and analyzed across multiple academic databases. In total, 33 studies were
included in the final synthesis. The reviewed literature indicates that transformer-based models, including mBERT
and XLM-R, generally achieve stronger performance than traditional machine learning approaches, while recent
large language models show potential in few-shot and zero-shot grading scenarios. The findings also highlight that
the limited availability of teacher-annotated datasets remains a major challenge for developing reliable automated
grading systems in low-resource educational contexts.
Keywords: Automated Short-Answer Grading Natural Language Processing Transformer Models Low-Resource
Languages Uzbek Language Systematic Literature Review
1. INTRODUCTION
The rapid growth of digital education has increased the demand
for scalable, consistent, and reliable assessment methods.
In many educational systems, including Uzbekistan,
short-answer questions are widely used to evaluate students’
conceptual understanding and reasoning skills. However,
grading such responses is still predominantly performed manually,
which is time-consuming, subjective, and difficult to
scale for large classes. These limitations often lead to inconsistencies
in grading quality and delays in feedback delivery,
which can negatively affect the learning process [1, 2].
Recent advances in artificial intelligence (AI) and natural
language processing (NLP) have enabled the development
of automated assessment systems capable of analyzing and
scoring textual responses. In high-resource languages such as
English, automated short-answer grading systems based on
machine learning and deep learning techniques have demonstrated
strong performance and high agreement with human
graders. In particular, transformer-based models, including
BERT and its multilingual variants, have significantly improved
semantic understanding by capturing contextual representations
of student responses [3].
Despite these advancements, low-resource languages such
as Uzbek remain largely underexplored in the context of
automated short-answer grading. One of the primary challenges
is the limited availability of teacher-annotated datasets,
which are essential for training and evaluating supervised
learning models. Furthermore, the morphological richness
of the Uzbek language and the lack of standardized evaluation
benchmarks make it difficult to directly transfer models
developed for high-resource languages.
Existing studies have proposed a wide range of automated
grading approaches, including traditional machine learning