lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
N-Grams as a Measure of Naturalness and Complexity
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

We live in a time where software is used everywhere. It is used even for creating other software by helping developers with writing or generating new code. To do this properly, metrics to measure software quality are being used to evaluate the final code. However, they are sometimes too costly to compute, or simply don't have the expected effect. Therefore, new and better ways of software evaluation are needed. In this research, we are investigating the usage of the statistical approaches used commonly in the natural language processing (NLP) area. In order to introduce and evaluate new metrics, a Java N-gram language model is created from a large Java language code corpus. Naturalness, a method-level metric, is introduced and calculated for chosen projects. The correlation with well-known software complexity metrics are calculated and discussed. The results, however, show that the metric, in the form that we have defined it, is not suitable for software complexity evaluation since it is highly correlated with a well-known metric (token count), which is much easier to compute. Different definition of the metric is suggested, which could be a target of future study and research.

Place, publisher, year, edition, pages
2019. , p. 34
Keywords [en]
language model, language processing, ngram, naturalness, java, code complexity, software quality, static analysis, code metrics
National Category
Software Engineering Computer Sciences
Identifiers
URN: urn:nbn:se:lnu:diva-90006OAI: oai:DiVA.org:lnu-90006DiVA, id: diva2:1369387
Educational program
Software Technology Programme, Master Programme, 120 credits
Supervisors
Examiners
Available from: 2019-11-12 Created: 2019-11-11 Last updated: 2019-11-12Bibliographically approved

Open Access in DiVA

fulltext(1878 kB)550 downloads
File information
File name FULLTEXT01.pdfFile size 1878 kBChecksum SHA-512
1455f541ba3b1a5b4b89068b1751c1474b994762d18ef62b175ef8e3d596aa80025b58fdc85b8fa676222f993566f2fa30f1963410bb282addb72f72c1f4f6ec
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Randák, Richard
By organisation
Department of computer science and media technology (CM)
Software EngineeringComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 550 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 294 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf