lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Validating the Quality of a Big Data Java Corpus
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
2018 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Recent research within the field of Software Engineering have used GitHub, the largest hub for open source projects with almost 20 million users and 57 million repositories, to mine large amounts of source code to get more trustworthy results when developing machine and deep learning models. Mining GitHub comes with many challenges since the dataset is large and the data does not only contain quality software projects. In this project, we try to mine projects from GitHub based on earlier research by others and try to validate the quality by comparing the projects with a small subset of quality projects with the help of software complexity metrics.

Place, publisher, year, edition, pages
2018. , p. 39
Keywords [en]
mining software repositories, GitHub, GHTorrent, Chidamber & Kemerer metrics, software complexity
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:lnu:diva-75410OAI: oai:DiVA.org:lnu-75410DiVA, id: diva2:1215731
Educational program
Datavetenskap, kandidatprogram, 60 hp
Supervisors
Examiners
Available from: 2018-06-11 Created: 2018-06-08 Last updated: 2018-06-11Bibliographically approved

Open Access in DiVA

fulltext(1102 kB)19 downloads
File information
File name FULLTEXT01.pdfFile size 1102 kBChecksum SHA-512
4b68a821d5a9318dd4a4e4cea1902f33d30e5567c114bbf53bf4a47f245be750cbd3fdf099720c82b6735a870e85a8ea0da9d9137a63bb14a30f75386755cf88
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Palmqvist, Simon
By organisation
Department of computer science and media technology (CM)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 19 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 53 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf