Exploring potential of large language models for automated essay scoring in educationShow others and affiliations
2026 (English)In: Discover Artificial Intelligence, ISSN 2731-0809, Vol. 6, article id 166Article in journal (Refereed) Published
Abstract [en]
The assessment of open-ended written work is of vital importance to the student learning experience. Conventional essay grading methods heavily depend on expert manual assessment, making them susceptible to errors due to fatigue, bias, and subjectivity. To address this, recent research has introduced AI-based Automated Essay Scoring (AES) systems. While most studies have concentrated on predicting scores, only a few have integrated AES systems with the well-known Large Language Models (LLMs). This study explores the application of LLMs, including GPT and Gemini for AES. The proposed approach was evaluated on two benchmark datasets, namely “Hewlett Foundation: Automated Essay Scoring (ASAP–AES)” and “Learning Agency Lab–Automated Essay Scoring 2.0 (LA–AES)”. The proposed method achieved promising results in AES, demonstrating effectiveness on both the benchmark datasets. Statistical analysis revealed that Gemini outperformed GPT, achieving an average Quadratic Weighted Kappa (QWK) score of 0.45 on the ASAP–AES and 0.43 on the LA–AES. To assess the generalizability and objectivity of the proposed approach, real-world data was collected from an O-Level classroom at Sukkur IBA Community College, Pakistan. Multiple human evaluators participated in the study to examine potential biases in human assessment. The findings indicate that LLM-based scoring demonstrates improved objectivity and reduced bias compared to human assessors.
Place, publisher, year, edition, pages
Springer Nature, 2026. Vol. 6, article id 166
Keywords [en]
LLMs, Education, Assessment, Transformers, Writing evaluation, Automated essay scoring
National Category
Computer and Information Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
URN: urn:nbn:se:lnu:diva-145293DOI: 10.1007/s44163-026-01002-yScopus ID: 2-s2.0-105030708166OAI: oai:DiVA.org:lnu-145293DiVA, id: diva2:2042123
2026-02-272026-02-272026-03-02Bibliographically approved