lnu.sePublications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Predicting Software Defectiveness by Mining Software Repositories
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
2018 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

One of the important aims of the continuous software development process is to localize and remove all existing program bugs as fast as possible. Such goal is highly related to software engineering and defectiveness estimation. Many big companies started to store source code in software repositories as the later grew in popularity. These repositories usually include static source code as well as detailed data for defects in software units. This allows analyzing all the data without interrupting programing process. The main problem of large, complex software is impossibility to control everything manually while the price of the error can be very high. This might result in developers missing defects on testing stage and increase of maintenance cost. The general research goal is to find a way of predicting future software defectiveness with high precision. Reducing maintenance and development costs will contribute to reduce the time-to-market and increase software quality.

To address the problem of estimating residual defects an approach was found to predict residual defectiveness of a software by the means of machine learning. For a prime machine learning algorithm, a regression decision tree was chosen as a simple and reliable solution. Data for this tree is extracted from static source code repository and divided into two parts: software metrics and defect data. Software metrics are formed from static code and defect data is extracted from reported issues in the repository. In addition to already reported bugs, they are augmented with unreported bugs found on “discussions” section in repository and parsed by a natural language processor. Metrics were filtered to remove ones, that were not related to defect data by applying correlation algorithm. Remaining metrics were weighted to use the most correlated combination as a training set for the decision tree. As a result, built decision tree model allows to forecast defectiveness with 89% chance for the particular product. This experiment was conducted using GitHub repository on a Java project and predicted number of possible bugs in a single file (Java class). The experiment resulted in designed method for predicting possible defectiveness from a static code of a single big (more than 1000 files) software version.

Place, publisher, year, edition, pages
2018. , p. 38
Keywords [en]
repository mining, software metric, correlation, defect, bug, natural language processing, Pearson coefficient, Breiman’s decision tree, machine learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:lnu:diva-78729OAI: oai:DiVA.org:lnu-78729DiVA, id: diva2:1261573
Subject / course
Computer Science
Educational program
Software Technology Programme, Master Programme, 60 credits
Presentation
2018-02-28, B3033V, Hus B, Videum, Vejdes plats 7, Växjö, 09:00 (English)
Supervisors
Examiners
Available from: 2018-11-19 Created: 2018-11-07 Last updated: 2018-11-19Bibliographically approved

Open Access in DiVA

17HT-4DV50E-Thesis-Project-Report-Stanislav-Kasianenko(3557 kB)7 downloads
File information
File name FULLTEXT01.pdfFile size 3557 kBChecksum SHA-512
d22bcc9ed8638526f88e52bc0a5a344698031b0e160c441238ab0236976a83cb40e220cb48c23a0207190e25fbb1e5a09132589b4783c3a25e98da79ba830aac
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Kasianenko, Stanislav
By organisation
Department of computer science and media technology (CM)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 7 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 82 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf