Automated Scoring Systems for Open-Ended Questions in Dutch Education

Automated Essay Scoring (AES) systems are increasingly used in educational settings, yet most are limited to assigning holistic scores and are focused on English-language contexts. This thesis investigates the feasibility of building explainable, multi-aspect AES models that align with rubric-defined criteria across three aspects, content, language, and presentation, for Dutch open-ended responses. We compare fine-tuned BERT-based transformer models with GPT-4o used in a prompt-based inference setting. RobBERT, a Dutch-specific model, performed best on the language aspect, while content and presentation were more difficult to model due to rubric variability and data limitations. SHAP analysis confirmed that its predictions were aligned with the approapriate rubric elements. GPT-4o achieved competitive results, particularly on content and language, with few-shot prompting and justification requests improving alignment. Expert review revealed inconsistencies in human annotations, calling into question the reliability of gold-standard annotations and highlighting the need for explainable AES systems. The results suggest that a hybrid approach, combining the robustness of fine-tuned models for well-defined aspects with the flexibility of prompt-based generative LLMs for more interpretive dimensions, may be best suited for practical deployment in Dutch educational settings.

Keywords

NLP; Transformer models; LLMs; Automated Essay Scoring; Multi-aspect Scoring; SHAP

URI

https://studenttheses.uu.nl/handle/20.500.12932/50896

Automated Scoring Systems for Open-Ended Questions in Dutch Education

Files

Publication date

Authors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI