lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using source code density to improve the accuracy of automatic commit classification into maintenance activities
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;DISTA;DSIQ)
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;DISTA;DSIQ)ORCID iD: 0000-0003-1173-5187
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;DISTA;DSIQ)ORCID iD: 0000-0002-7565-3714
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;DISTA;DSIQ)ORCID iD: 0000-0002-0835-823X
2020 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 168, p. 1-19, article id 110673Article in journal (Refereed) Published
Abstract [en]

Source code is changed for a reason, e.g., to adapt, correct, or adapt it. This reason can provide valuable insight into the development process but is rarely explicitly documented when the change is committed to a source code repository. Automatic commit classification uses features extracted from commits to estimate this reason.

We introduce source code density, a measure of the net size of a commit, and show how it improves the accuracy of automatic commit classification compared to previous size-based classifications. We also investigate how preceding generations of commits affect the class of a commit, and whether taking the code density of previous commits into account can improve the accuracy further.

We achieve up to 89% accuracy and a Kappa of 0.82 for the cross-project commit classification where the model is trained on one project and applied to other projects. Models trained on single projects yield accuracies of up to 93% with a Kappa approaching 0.90. The accuracy of the automatic commit classification has a direct impact on software (process) quality analyses that exploit the classification, so our improvements to the accuracy will also improve the confidence in such analyses.

Place, publisher, year, edition, pages
Elsevier, 2020. Vol. 168, p. 1-19, article id 110673
Keywords [en]
Software quality, Commit classification, Source code density, Maintenance activities, Software evolution
National Category
Software Engineering Computer Sciences
Research subject
Computer Science, Software Technology; Computer and Information Sciences Computer Science, Computer Science; Computer Science, Software Technology
Identifiers
URN: urn:nbn:se:lnu:diva-95751DOI: 10.1016/j.jss.2020.110673ISI: 000557871300021Scopus ID: 2-s2.0-85085726544OAI: oai:DiVA.org:lnu-95751DiVA, id: diva2:1436576
Available from: 2020-06-08 Created: 2020-06-08 Last updated: 2023-09-27Bibliographically approved
In thesis
1. Quantifying Process Quality: The Role of Effective Organizational Learning in Software Evolution
Open this publication in new window or tab >>Quantifying Process Quality: The Role of Effective Organizational Learning in Software Evolution
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Real-world software applications must constantly evolve to remain relevant. This evolution occurs when developing new applications or adapting existing ones to meet new requirements, make corrections, or incorporate future functionality. Traditional methods of software quality control involve software quality models and continuous code inspection tools. These measures focus on directly assessing the quality of the software. However, there is a strong correlation and causation between the quality of the development process and the resulting software product. Therefore, improving the development process indirectly improves the software product, too. To achieve this, effective learning from past processes is necessary, often embraced through post mortem organizational learning. While qualitative evaluation of large artifacts is common, smaller quantitative changes captured by application lifecycle management are often overlooked. In addition to software metrics, these smaller changes can reveal complex phenomena related to project culture and management. Leveraging these changes can help detect and address such complex issues.

Software evolution was previously measured by the size of changes, but the lack of consensus on a reliable and versatile quantification method prevents its use as a dependable metric. Different size classifications fail to reliably describe the nature of evolution. While application lifecycle management data is rich, identifying which artifacts can model detrimental managerial practices remains uncertain. Approaches such as simulation modeling, discrete events simulation, or Bayesian networks have only limited ability to exploit continuous-time process models of such phenomena. Even worse, the accessibility and mechanistic insight into such gray- or black-box models are typically very low. To address these challenges, we suggest leveraging objectively captured digital artifacts from application lifecycle management, combined with qualitative analysis, for efficient organizational learning. A new language-independent metric is proposed to robustly capture the size of changes, significantly improving the accuracy of change nature determination. The classified changes are then used to explore, visualize, and suggest maintenance activities, enabling solid prediction of malpractice presence and -severity, even with limited data. Finally, parts of the automatic quantitative analysis are made accessible, potentially replacing expert-based qualitative analysis in parts.

Place, publisher, year, edition, pages
Växjö: Linnaeus University Press, 2023
Series
Linnaeus University Dissertations ; 504
Keywords
Software Size, Software Metrics, Commit Classification, Maintenance Activities, Software Quality, Process Quality, Project Management, Organizational Learning, Machine Learning, Visualization, Optimization
National Category
Computer and Information Sciences Software Engineering Mathematical Analysis Probability Theory and Statistics
Research subject
Computer Science, Software Technology; Computer Science, Information and software visualization; Computer and Information Sciences Computer Science, Computer Science; Statistics/Econometrics
Identifiers
urn:nbn:se:lnu:diva-124916 (URN)10.15626/LUD.504.2023 (DOI)9789180820738 (ISBN)9789180820745 (ISBN)
Public defence
2023-09-29, House D, D1136A, 351 95 Växjö, Växjö, 13:00 (English)
Opponent
Supervisors
Available from: 2023-09-28 Created: 2023-09-27 Last updated: 2024-05-06Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopusPre-print on arXiv.org

Authority records

Hönel, SebastianEricsson, MorganLöwe, WelfWingkvist, Anna

Search in DiVA

By author/editor
Hönel, SebastianEricsson, MorganLöwe, WelfWingkvist, Anna
By organisation
Department of computer science and media technology (CM)
In the same journal
Journal of Systems and Software
Software EngineeringComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 255 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf