lnu.sePublications
4748495051525350 of 307
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring potential of large language models for automated essay scoring in education
Sukkur IBA University, Pakistan.
Norwegian University of Science and Technology, Norway.
Sukkur IBA University, Pakistan.
Linnaeus University, Faculty of Technology, Department of Informatics.ORCID iD: 0000-0002-0199-2377
Show others and affiliations
2026 (English)In: Discover Artificial Intelligence, ISSN 2731-0809, Vol. 6, article id 166Article in journal (Refereed) Published
Abstract [en]

The assessment of open-ended written work is of vital importance to the student learning experience. Conventional essay grading methods heavily depend on expert manual assessment, making them susceptible to errors due to fatigue, bias, and subjectivity. To address this, recent research has introduced AI-based Automated Essay Scoring (AES) systems. While most studies have concentrated on predicting scores, only a few have integrated AES systems with the well-known Large Language Models (LLMs). This study explores the application of LLMs, including GPT and Gemini for AES. The proposed approach was evaluated on two benchmark datasets, namely “Hewlett Foundation: Automated Essay Scoring (ASAP–AES)” and “Learning Agency Lab–Automated Essay Scoring 2.0 (LA–AES)”. The proposed method achieved promising results in AES, demonstrating effectiveness on both the benchmark datasets. Statistical analysis revealed that Gemini outperformed GPT, achieving an average Quadratic Weighted Kappa (QWK) score of 0.45 on the ASAP–AES and 0.43 on the LA–AES. To assess the generalizability and objectivity of the proposed approach, real-world data was collected from an O-Level classroom at Sukkur IBA Community College, Pakistan. Multiple human evaluators participated in the study to examine potential biases in human assessment. The findings indicate that LLM-based scoring demonstrates improved objectivity and reduced bias compared to human assessors.

Place, publisher, year, edition, pages
Springer Nature, 2026. Vol. 6, article id 166
Keywords [en]
LLMs, Education, Assessment, Transformers, Writing evaluation, Automated essay scoring
National Category
Computer and Information Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
URN: urn:nbn:se:lnu:diva-145293DOI: 10.1007/s44163-026-01002-yScopus ID: 2-s2.0-105030708166OAI: oai:DiVA.org:lnu-145293DiVA, id: diva2:2042123
Available from: 2026-02-27 Created: 2026-02-27 Last updated: 2026-03-02Bibliographically approved

Open Access in DiVA

fulltext(4531 kB)4 downloads
File information
File name FULLTEXT01.pdfFile size 4531 kBChecksum SHA-512
c97d8f6954b84efd527e5a0d8e3e83378e4fd3efec25862bbc68a40ef8a4656f4d8913075997639e66c0a074eda14cd07040b3df6a20abc8b4d8dcdb6f99fc28
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Kastrati, Zenun

Search in DiVA

By author/editor
Kastrati, Zenun
By organisation
Department of Informatics
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 73 hits
4748495051525350 of 307
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf