lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Copula-based software metrics aggregation
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;Data Driven Software;Informat Qual Grp)ORCID iD: 0000-0002-3906-7611
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;Data Driven Software;Informat Qual Grp)ORCID iD: 0000-0002-7565-3714
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;Data Driven Software;Informat Qual Grp)ORCID iD: 0000-0003-1173-5187
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA;Data Driven Software;Informat Qual Grp)ORCID iD: 0000-0002-0835-823X
2021 (English)In: Software quality journal, ISSN 0963-9314, E-ISSN 1573-1367, Vol. 29, p. 863-899Article in journal (Refereed) Published
Abstract [en]

A quality model is a conceptual decomposition of an abstract notion of quality into relevant, possibly conflicting characteristics and further into measurable metrics. For quality assessment and decision making, metrics values are aggregated to characteristics and ultimately to quality scores. Aggregation has often been problematic as quality models do not provide the semantics of aggregation. This makes it hard to formally reason about metrics, characteristics, and quality. We argue that aggregation needs to be interpretable and mathematically well defined in order to assess, to compare, and to improve quality. To address this challenge, we propose a probabilistic approach to aggregation and define quality scores based on joint distributions of absolute metrics values. To evaluate the proposed approach and its implementation under realistic conditions, we conduct empirical studies on bug prediction of ca. 5000 software classes, maintainability of ca. 15000 open-source software systems, and on the information quality of ca. 100000 real-world technical documents. We found that our approach is feasible, accurate, and scalable in performance.

Place, publisher, year, edition, pages
Springer, 2021. Vol. 29, p. 863-899
Keywords [en]
Quality assessment, Quantitative methods, Software metrics, Aggregation, Multivariate statistical methods, Probabilistic models, Copula
National Category
Software Engineering
Research subject
Computer Science, Software Technology
Identifiers
URN: urn:nbn:se:lnu:diva-106779DOI: 10.1007/s11219-021-09568-9ISI: 000687914800001Scopus ID: 2-s2.0-85113308308Local ID: 2021OAI: oai:DiVA.org:lnu-106779DiVA, id: diva2:1590937
Available from: 2021-09-03 Created: 2021-09-03 Last updated: 2021-12-23Bibliographically approved
In thesis
1. Aggregation as Unsupervised Learning in Software Engineering and Beyond
Open this publication in new window or tab >>Aggregation as Unsupervised Learning in Software Engineering and Beyond
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Ranking alternatives is fundamental to effective decision making. However, creating an overall ranking is difficult if there are multiple criteria, and no single alternative performs best across all criteria. Software engineering is no exception.

Software quality is usually decomposed hierarchically into characteristics, and their quality can be assessed by various direct and indirect metrics. Although such quality models provide a basic understanding of what data to collect and which metrics to use, it is not clear how the metrics should be combined to assess the overall quality. Due to different approaches for aggregation of metrics, the same quality model and the same metrics for assessing the same software artifact could still lead to different assessment results and even to different interpretations.

The proposed aggregation approach in this thesis is well-defined, interpretable, and applicable under realistic conditions. This approach can turn the quality- model- and metric-based assessment of (software) quality into a reliable and reproducible process. We express quality as the probability of detecting something with equal or worse quality, based on all software artifacts observed; good and bad quality is expressed in terms of lower and higher probabilities. 

We validated our approach theoretically and empirically. We conducted empirical studies on Bug prediction, Maintainability assessment, and Information Quality.

We used Software Visualization to analyze the usability of aggregation for analyzing multivariate data in general and the effect of different alternative aggregation approaches, i.e., we designed and implemented an exploratory multivariate data visualization tool.

Finally, we applied our approach to Multi-criteria Ranking to evaluate its transferability to other domains. We evaluated it on a real-world decision-making problem for assessment and ranking of alternatives. Moreover, we applied our approach to the context of Machine Learning. We created a benchmark from a collection of regression problems, and evaluated how well the aggregation output agrees with a ground truth, and how well it represents the properties of the input variables.

The results showed that our approach is not only theoretically sound, it is also accurate, sensitive, identifies anomalies, scales in performance, and can support multi-criteria decision making. Furthermore, our approach is transferable to other domains that require aggregation in hierarchically structured models, and it can be used as an agnostic unsupervised predictor in the absence of a ground truth.

Place, publisher, year, edition, pages
Växjö: Linnaeus University Press, 2021. p. 51
Series
Linnaeus University Dissertations ; 430
Keywords
quality assessment, quantitative methods, aggregation, multi-criteria decision making, unsupervised machine learning
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:lnu:diva-108115 (URN)9789189460409 (ISBN)9789189460416 (ISBN)
Public defence
2021-12-17, Weber, building K, Växjö, 13:00 (English)
Opponent
Supervisors
Available from: 2021-11-24 Created: 2021-11-19 Last updated: 2024-03-06Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Ulan, MariaLöwe, WelfEricsson, MorganWingkvist, Anna

Search in DiVA

By author/editor
Ulan, MariaLöwe, WelfEricsson, MorganWingkvist, Anna
By organisation
Department of computer science and media technology (CM)
In the same journal
Software quality journal
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 219 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf