lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards Improved Initial Mapping in Semi Automatic Clustering
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;DSIQ;DISTA)ORCID iD: 0000-0003-1154-5308
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;DSIQ;DISTA)ORCID iD: 0000-0003-1173-5187
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;DSIQ;DISTA)ORCID iD: 0000-0002-0835-823X
2018 (English)In: ECSA 2018: PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON SOFTWARE ARCHITECTURE: COMPANION PROCEEDINGS, Association for Computing Machinery (ACM), 2018Conference paper, Published paper (Refereed)
Abstract [en]

An important step in Static Architecture Conformance Checking (SACC) is the mapping of source code entities to entities in the intended architecture. This step is currently relying on manual work, which is one hindrance for more widespread adoption of SACC in industry. Semi-automatic clustering is a promising approach to improve this, and the HuGMe clustering algorithm is an example of such a technique for use in SACC. But HuGMe relies on an initial set of clustered source code elements and algorithm parameters. We investigate the automatic mapping performance of HuGMe in two experiments to gain insight into what influence the starting set has in a medium-sized open source system, JabRef, which contain a relatively large number of architectural violations. Our results show that the highest automatic mapping performance can be achieved with a low number of elements within the initial set. However, the variability of the performance is high. We find a benefit in favoring source code elements with a high fan-out in the initial set.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2018.
Keywords [en]
Clustering, Software Architecture Conformance, HuGMe
National Category
Computer and Information Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
URN: urn:nbn:se:lnu:diva-80159DOI: 10.1145/3241403.3241456ISI: 000455670400051Scopus ID: 2-s2.0-85055708745ISBN: 978-1-4503-6483-6 (print)OAI: oai:DiVA.org:lnu-80159DiVA, id: diva2:1284853
Conference
12th European Conference on Software Architecture (ECSA), Madrid, Spain, Sep 24-28, 2018
Available from: 2019-02-01 Created: 2019-02-01 Last updated: 2024-05-06Bibliographically approved
In thesis
1. Incremental Clustering of Source Code: a Machine Learning Approach
Open this publication in new window or tab >>Incremental Clustering of Source Code: a Machine Learning Approach
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Technical debt at the architectural level is a severe threat to software development projects. Uncontrolled technical debt that is allowed to accumulate will undoubtedly hinder speedy development and maintenance, introduce bugs and problems in the software product, and may ultimately result in the abandonment of the source code. 

It is possible to detect debt accumulation by analyzing the source code and intended modules in the software architecture. However, this is seldom done in practice since it requires a correct and up-to-date mapping from source code to intended modules in the architecture. This mapping requires significant manual effort to create and maintain, something often considered too costly and laborsome. 

We investigate how to automate the mapping from source code to intended modules. The state-of-the-art considers it an incremental clustering problem, where source code entities should be clustered to the intended modules based on some similarity measure. As the system evolves and source code entities are added or modified, the clustering needs to be updated. 

The state-of-the-art techniques determine similarity based on either syntactic or semantic features, e.g., dependencies or identifier names. Large sets of parameters modify these features, e.g., weights for various types of dependencies. These parameters have a significant impact on how well the clustering performs. Unfortunately, we have not been able to identify any heuristics to help human experts determine a good set of parameters for a given system. Based on the parameters determined by, e.g., genetic optimization, it seems unlikely that general heuristics exist.

Instead, we compute the similarity using a multinomial na\"ive Bayes text classifier trained on tokens from the source code entities. We also include a novel feature that captures dependencies as text to add syntactic features. Our classifier, which relies on significantly fewer parameters, outperforms the state-of-the-art techniques, with their parameters set to near-optimal values.

We find that machine learning provides better mapping performance with fewer required parameters. We can successfully combine syntactic information with semantic information without additional parameters. We provide an open-source tool suite with a reference implementation of different techniques and a curated set of systems that can act as a ground truth benchmark.

Place, publisher, year, edition, pages
Linnaeus University Press, 2022. p. 46
Series
Linnaeus University Dissertations ; 436
Keywords
Machine Learning, Naive Bayes, Source Code Clustering, Incremental Clustering, Software Architecture, Technical Debt
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-110142 (URN)9789189460638 (ISBN)9789189460645 (ISBN)
Public defence
2022-03-04, Ma135 (Fullriggaren), Hus Magna, Universitetskajen, Kalmar, 14:42 (English)
Opponent
Supervisors
Available from: 2022-02-08 Created: 2022-02-04 Last updated: 2024-03-12Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Olsson, TobiasEricsson, MorganWingkvist, Anna

Search in DiVA

By author/editor
Olsson, TobiasEricsson, MorganWingkvist, Anna
By organisation
Department of computer science and media technology (CM)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 129 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf