lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploiting Relations, Sojourn-Times, and Joint Conditional Probabilities for Automated Commit Classification
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;DSIQ;DISTA)ORCID iD: 0000-0001-7937-1645
2023 (English)In: Proceedings of the 18th International Conference on Software TechnologiesJuly 10-12, 2023, in Rome, Italy / [ed] Hans-Georg Fill, Francisco José Domínguez-Mayo, Marten van Sinderen, and Leszek A. Maciaszek., SciTePress, 2023, p. 323-331Conference paper, Published paper (Refereed)
Abstract [en]

The automatic classification of commits can be exploited for numerous applications, such as fault prediction, or determining maintenance activities. Additional properties, such as parent-child relations or sojourn-times between commits, were not previously considered for this task. However, such data cannot be leveraged well using traditional machine learning models, such as Random forests. Suitable models are, e.g., Conditional Random Fields or recurrent neural networks. We reason about the Markovian nature of the problem and propose models to address it. The first model is a generalized dependent mixture model, facilitating the Forward algorithm for 1st- and 2nd-order processes, using maximum likelihood estimation. We then propose a second, non-parametric model, that uses Bayesian segmentation and kernel density estimation, which can be effortlessly adapted to work with nth-order processes. Using an existing dataset with labeled commits as ground truth, we extend this dataset with relations between and sojourn-times of commits, by re-engineering the labeling rules first and meeting a high agreement between labelers. We show the strengths and weaknesses of either kind of model and demonstrate their ability to outperform the state-of-the-art in automated commit classification.

Place, publisher, year, edition, pages
SciTePress, 2023. p. 323-331
Series
ICSOFT, ISSN 2184-2833
Keywords [en]
Software Maintenance, Repository Mining, Maintenance Activities
National Category
Software Engineering
Research subject
Computer Science, Software Technology
Identifiers
URN: urn:nbn:se:lnu:diva-124879DOI: 10.5220/0012077300003538ISBN: 9789897586651 (electronic)OAI: oai:DiVA.org:lnu-124879DiVA, id: diva2:1799930
Conference
18th International Conference on Software Technologies - ICSOFT 2023, Rome, Italy, July 10–12, 2023
Available from: 2023-09-25 Created: 2023-09-25 Last updated: 2024-05-06Bibliographically approved
In thesis
1. Quantifying Process Quality: The Role of Effective Organizational Learning in Software Evolution
Open this publication in new window or tab >>Quantifying Process Quality: The Role of Effective Organizational Learning in Software Evolution
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Real-world software applications must constantly evolve to remain relevant. This evolution occurs when developing new applications or adapting existing ones to meet new requirements, make corrections, or incorporate future functionality. Traditional methods of software quality control involve software quality models and continuous code inspection tools. These measures focus on directly assessing the quality of the software. However, there is a strong correlation and causation between the quality of the development process and the resulting software product. Therefore, improving the development process indirectly improves the software product, too. To achieve this, effective learning from past processes is necessary, often embraced through post mortem organizational learning. While qualitative evaluation of large artifacts is common, smaller quantitative changes captured by application lifecycle management are often overlooked. In addition to software metrics, these smaller changes can reveal complex phenomena related to project culture and management. Leveraging these changes can help detect and address such complex issues.

Software evolution was previously measured by the size of changes, but the lack of consensus on a reliable and versatile quantification method prevents its use as a dependable metric. Different size classifications fail to reliably describe the nature of evolution. While application lifecycle management data is rich, identifying which artifacts can model detrimental managerial practices remains uncertain. Approaches such as simulation modeling, discrete events simulation, or Bayesian networks have only limited ability to exploit continuous-time process models of such phenomena. Even worse, the accessibility and mechanistic insight into such gray- or black-box models are typically very low. To address these challenges, we suggest leveraging objectively captured digital artifacts from application lifecycle management, combined with qualitative analysis, for efficient organizational learning. A new language-independent metric is proposed to robustly capture the size of changes, significantly improving the accuracy of change nature determination. The classified changes are then used to explore, visualize, and suggest maintenance activities, enabling solid prediction of malpractice presence and -severity, even with limited data. Finally, parts of the automatic quantitative analysis are made accessible, potentially replacing expert-based qualitative analysis in parts.

Place, publisher, year, edition, pages
Växjö: Linnaeus University Press, 2023
Series
Linnaeus University Dissertations ; 504
Keywords
Software Size, Software Metrics, Commit Classification, Maintenance Activities, Software Quality, Process Quality, Project Management, Organizational Learning, Machine Learning, Visualization, Optimization
National Category
Computer and Information Sciences Software Engineering Mathematical Analysis Probability Theory and Statistics
Research subject
Computer Science, Software Technology; Computer Science, Information and software visualization; Computer and Information Sciences Computer Science, Computer Science; Statistics/Econometrics
Identifiers
urn:nbn:se:lnu:diva-124916 (URN)10.15626/LUD.504.2023 (DOI)9789180820738 (ISBN)9789180820745 (ISBN)
Public defence
2023-09-29, House D, D1136A, 351 95 Växjö, Växjö, 13:00 (English)
Opponent
Supervisors
Available from: 2023-09-28 Created: 2023-09-27 Last updated: 2024-05-06Bibliographically approved

Open Access in DiVA

fulltext(713 kB)252 downloads
File information
File name FULLTEXT01.pdfFile size 713 kBChecksum SHA-512
bbba2df65cd9a04739daad25aadfb0022de5e749fa0053d81698e2579809f46e45ecce1701dc06610db592d52e09a4a0d1dd1e8fce23f454659d18ac97fa6637
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Hönel, Sebastian

Search in DiVA

By author/editor
Hönel, Sebastian
By organisation
Department of computer science and media technology (CM)
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 252 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 393 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf