lnu.sePublications
Change search
Link to record
Permanent link

Direct link
Chatzimparmpas, AngelosORCID iD iconorcid.org/0000-0002-9079-2376
Publications (10 of 21) Show all publications
Chatzimparmpas, A., Martins, R. M., Telea, A. C. & Kerren, A. (2024). DeforestVis: Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps. Computer graphics forum (Print)
Open this publication in new window or tab >>DeforestVis: Behavior Analysis of Machine Learning Models with Surrogate Decision Stumps
2024 (English)In: Computer graphics forum (Print), ISSN 0167-7055, E-ISSN 1467-8659Article in journal (Refereed) Epub ahead of print
Abstract [en]

As the complexity of Machine Learning (ML) models increases and their application in different (and critical) domains grows, there is a strong demand for more interpretable and trustworthy ML. A direct, model-agnostic, way to interpret such models is to train surrogate models—such as rule sets and decision trees—that sufficiently approximate the original ones while being simpler and easier-to-explain. Yet, rule sets can become very lengthy, with many if-else statements, and decision tree depth grows rapidly when accurately emulating complex ML models. In such cases, both approaches can fail to meet their core goal—providing users with model interpretability. To tackle this, we propose DeforestVis, a visual analytics tool that offers summarization of the behavior of complex ML models by providing surrogate decision stumps (one-level decision trees) generated with the Adaptive Boosting (AdaBoost) technique. DeforestVis helps users to explore the complexity vs fidelity trade-off by incrementally generating more stumps, creating attribute-based explanations with weighted stumps to justify decision making, and analyzing the impact of rule overriding on training instance allocation between one or more stumps. An independent test set allows users to monitor the effectiveness of manual rule changes and form hypotheses based on case-by-case analyses. We show the applicability and usefulness of DeforestVis with two use cases and expert interviews with data analysts and model developers.

Place, publisher, year, edition, pages
John Wiley & Sons, 2024
Keywords
Surrogate model, model understanding, adaptive boosting, machine learning, visual analytics
National Category
Computer Sciences
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-127909 (URN)10.1111/cgf.15004 (DOI)001174196500001 ()2-s2.0-85185930256 (Scopus ID)
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2024-02-20 Created: 2024-02-20 Last updated: 2024-07-09
Chatzimparmpas, A., Kucher, K. & Kerren, A. (2024). Visualization for Trust in Machine Learning Revisited: The State of the Field in 2023. IEEE Computer Graphics and Applications, 44(3), 99-113
Open this publication in new window or tab >>Visualization for Trust in Machine Learning Revisited: The State of the Field in 2023
2024 (English)In: IEEE Computer Graphics and Applications, ISSN 0272-1716, E-ISSN 1558-1756, Vol. 44, no 3, p. 99-113Article in journal (Refereed) Published
Abstract [en]

Visualization for explainable and trustworthy machine learning remains one of the most important and heavily researched fields within information visualization and visual analytics with various application domains, such as medicine, finance, and bioinformatics. After our 2020 state-of-the-art report comprising 200 techniques, we have persistently collected peer-reviewed articles describing visualization techniques, categorized them based on the previously established categorization schema consisting of 119 categories, and provided the resulting collection of 542 techniques in an online survey browser. In this survey article, we present the updated findings of new analyses of this dataset as of fall 2023 and discuss trends, insights, and eight open challenges for using visualizations in machine learning. Our results corroborate the rapidly growing trend of visualization techniques for increasing trust in machine learning models in the past three years, with visualization found to help improve popular model explainability methods and check new deep learning architectures, for instance.

Place, publisher, year, edition, pages
IEEE, 2024
Keywords
trustworthy machine learning, visualization, interpretable machine learning, explainable machine learning
National Category
Computer Sciences
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-127242 (URN)10.1109/MCG.2024.3360881 (DOI)001252800600004 ()2-s2.0-85184317816 (Scopus ID)
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Note

Bibliografiskt granskad

Available from: 2024-01-29 Created: 2024-01-29 Last updated: 2024-07-09Bibliographically approved
Chatzimparmpas, A., Paulovich, F. V. & Kerren, A. (2023). HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques. Computer graphics forum (Print), 42(1), 135-154
Open this publication in new window or tab >>HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques
2023 (English)In: Computer graphics forum (Print), ISSN 0167-7055, E-ISSN 1467-8659, Vol. 42, no 1, p. 135-154Article in journal (Refereed) Published
Abstract [en]

Despite the tremendous advances in machine learning (ML), training with imbalanced data still poses challenges in many real-world applications. Among a series of diverse techniques to solve this problem, sampling algorithms are regarded as an efficient solution. However, the problem is more fundamental, with many works emphasizing the importance of instance hardness. This issue refers to the significance of managing unsafe or potentially noisy instances that are more likely to be misclassified and serve as the root cause of poor classification performance. This paper introduces HardVis, a visual analytics system designed to handle instance hardness mainly in imbalanced classification scenarios. Our proposed system assists users in visually comparing different distributions of data types, selecting types of instances based on local characteristics that will later be affected by the active sampling method, and validating which suggestions from undersampling or oversampling techniques are beneficial for the ML model. Additionally, rather than uniformly undersampling/oversampling a specific class, we allow users to find and sample easy and difficult to classify training instances from all classes. Users can explore subsets of data from different perspectives to decide all those parameters, while HardVis keeps track of their steps and evaluates the model’s predictive performance in a test set separately. The end result is a well-balanced data set that boosts the predictive power of the ML model. The efficacy and effectiveness of HardVis are demonstrated with a hypothetical usage scenario and a use case. Finally, we also look at how useful our system is based on feedback we received from ML experts.

Place, publisher, year, edition, pages
John Wiley & Sons, 2023
Keywords
visualization, visual analytics, interpretable machine learning, explainable machine learning, supervised machine learning, instance hardness, undersampling, oversampling
National Category
Computer Sciences Human Computer Interaction
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-117898 (URN)10.1111/cgf.14726 (DOI)000903704000001 ()2-s2.0-85144846767 (Scopus ID)
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2022-12-13 Created: 2022-12-13 Last updated: 2023-05-25Bibliographically approved
Ploshchik, I., Chatzimparmpas, A. & Kerren, A. (2023). MetaStackVis: Visually-Assisted Performance Evaluation of Metamodels. In: Proceedings of the 16th IEEE Pacific Visualization Symposium (PacificVis '23), visualization notes track, IEEE, 2023: . Paper presented at 16th IEEE Pacific Visualization Symposium (PacificVis '23), Seoul, Korea, April 18-21, 2023 (pp. 207-211). IEEE
Open this publication in new window or tab >>MetaStackVis: Visually-Assisted Performance Evaluation of Metamodels
2023 (English)In: Proceedings of the 16th IEEE Pacific Visualization Symposium (PacificVis '23), visualization notes track, IEEE, 2023, IEEE, 2023, p. 207-211Conference paper, Published paper (Refereed)
Abstract [en]

Stacking (or stacked generalization) is an ensemble learning method with one main distinctiveness from the rest: even though several base models are trained on the original data set, their predictions are further used as input data for one or more metamodels arranged in at least one extra layer. Composing a stack of models can produce high-performance outcomes, but it usually involves a trial-and-error process. Therefore, our previously developed visual analytics system, StackGenVis, was mainly designed to assist users in choosing a set of top-performing and diverse models by measuring their predictive performance. However, it only employs a single logistic regression metamodel. In this paper, we investigate the impact of alternative metamodels on the performance of stacking ensembles using a novel visualization tool, called MetaStackVis. Our interactive tool helps users to visually explore different singular and pairs of metamodels according to their predictive probabilities and multiple validation metrics, as well as their ability to predict specific prob- lematic data instances. MetaStackVis was evaluated with a usage scenario based on a medical data set and via expert interviews.

Place, publisher, year, edition, pages
IEEE, 2023
Keywords
Visual analytics, information visualization, interaction, stacking, metamodels, ensemble learning
National Category
Computer Sciences Human Computer Interaction
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-119860 (URN)10.1109/PacificVis56936.2023.00030 (DOI)2-s2.0-85163320910 (Scopus ID)
Conference
16th IEEE Pacific Visualization Symposium (PacificVis '23), Seoul, Korea, April 18-21, 2023
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2023-03-19 Created: 2023-03-19 Last updated: 2023-08-25Bibliographically approved
Chatzimparmpas, A., Martins, R. M. & Kerren, A. (2023). VisRuler: Visual Analytics for Extracting Decision Rules from Bagged and Boosted Decision Trees. Information Visualization, 22(2), 115-139
Open this publication in new window or tab >>VisRuler: Visual Analytics for Extracting Decision Rules from Bagged and Boosted Decision Trees
2023 (English)In: Information Visualization, ISSN 1473-8716, E-ISSN 1473-8724, Vol. 22, no 2, p. 115-139Article in journal (Refereed) Published
Abstract [en]

Bagging and boosting are two popular ensemble methods in machine learning (ML) that produce many individual decision trees. Due to the inherent ensemble characteristic of these methods, they typically outperform single decision trees or other ML models in predictive performance. However, numerous decision paths are generated for each decision tree, increasing the overall complexity of the model and hindering its use in domains that require trustworthy and explainable decisions, such as finance, social care, and health care. Thus, the interpretability of bagging and boosting algorithms—such as random forest and adaptive boosting—reduces as the number of decisions rises. In this paper, we propose a visual analytics tool that aims to assist users in extracting decisions from such ML models via a thorough visual inspection workflow that includes selecting a set of robust and diverse models (originating from different ensemble learning algorithms), choosing important features according to their global contribution, and deciding which decisions are essential for global explanation (or locally, for specific cases). The outcome is a final decision based on the class agreement of several models and the explored manual decisions exported by users. We evaluated the applicability and effectiveness of VisRuler via a use case, a usage scenario, and a user study. The evaluation revealed that most users managed to successfully use our system to explore decision rules visually, performing the proposed tasks and answering the given questions in a satisfying way.

Place, publisher, year, edition, pages
Sage Publications, 2023
Keywords
Decisions evaluation, rules interpretation, ensemble learning, visual analytics, visualization
National Category
Computer Sciences Human Computer Interaction
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-117897 (URN)10.1177/14738716221142005 (DOI)000916155000001 ()2-s2.0-85144842887 (Scopus ID)
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2022-12-13 Created: 2022-12-13 Last updated: 2023-03-06Bibliographically approved
Chatzimparmpas, A. (2023). Visual Analytics for Explainable and Trustworthy Machine Learning. (Doctoral dissertation). Växjö, Sweden: Linnaeus University Press
Open this publication in new window or tab >>Visual Analytics for Explainable and Trustworthy Machine Learning
2023 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

The deployment of artificial intelligence solutions and machine learning research has exploded in popularity in recent years, with numerous types of models proposed to interpret and predict patterns and trends in data from diverse disciplines. However, as the complexity of these models grows, it becomes increasingly difficult for users to evaluate and rely on the model results, since their inner workings are mostly hidden in black boxes, which are difficult to trust in critical decision-making scenarios. While automated methods can partly handle these problems, recent research findings suggest that their combination with innovative methods developed within information visualization and visual analytics can lead to further insights gained from models and, consequently, improve their predictive ability and enhance trustworthiness in the entire process. Visual analytics is the area of research that studies the analysis of vast and intricate information spaces by combining statistical and machine learning models with interactive visual interfaces. By following this methodology, human experts can better understand such spaces and apply their domain expertise in the process of building and improving the underlying models.

The primary goals of this dissertation are twofold, focusing on (1) methodological aspects, by conducting qualitative and quantitative meta-analyses to support the visualization research community in making sense of its literature and to highlight unsolved challenges, as well as (2) technical solutions, by developing visual analytics approaches for various machine learning models, such as dimensionality reduction and ensemble learning methods. Regarding the first goal, we define, categorize, and examine in depth the means for visual coverage of the different trust levels at each stage of a typical machine learning pipeline and establish a design space for novel visualizations in the area. Regarding the second goal, we discuss multiple visual analytics tools and systems implemented by us to facilitate the underlying research on the various stages of the machine learning pipeline, i.e., data processing, feature engineering, hyperparameter tuning, understanding, debugging, refining, and comparing models. Our approaches are data-agnostic, but mainly target tabular data with meaningful attributes in diverse domains, such as health care and finance. The applicability and effectiveness of this work were validated with case studies, usage scenarios, expert interviews, user studies, and critical discussions of limitations and alternative designs. The results of this dissertation provide new avenues for visual analytics research in explainable and trustworthy machine learning.

Abstract [sv]

Användningen av artificiell intelligens och maskininlärning har exploderat i popularitet de senaste åren, med många olika typer av modeller för att tolka och förutse mönster och trender i data från olika områden. Ju mer komplexa dessa modeller blir, desto vanligare är det att de behandlas som “svarta lådor” vilka inte medger någon insyn i hur ett visst utfall har beräknats. Detta gör det svårt för användare att utvärdera och lita på resultaten, vilket i sin tur försvårar användning i situationer där beslut av vikt ska fattas. Även om automatiserade metoder delvis kan hantera denna problematik, tyder de senaste forskningsresultaten på att dessa också bör kombineras med innovativa metoder inom informationsvisualisering och visuell analys för att ge bästa effekt. Denna kombination kan ge fördjupade insikter som kan användas för att förbättra modellernas förmåga samt för att öka tillförlitligheten i, och förtroendet för, den övergripande processen. Inom forskningsområdet visuell analys kombineras statistiska modeller och maskininlärning med interaktiva visuella gränssnitt, vilket möjliggör för domänexperter att analysera stora och komplexa datamängder, samt ger dem möjlighet att använda sina expertkunskaper för att utveckla och förbättra de underliggande modellerna.

De två huvudmålen för denna avhandling är att: (1) fokusera på metodologiska aspekter genom kvalitativa och kvantitativa metaanalyser i syfte att hjälpa forskare inom området att överblicka existerande litteratur och i syfte att lyfta fram kvarvarande utmaningar, samt (2) fokusera på tekniska lösningar genom att utveckla visuella analysmetoder för olika maskininlärningsmodeller, såsom dimensionsreducering och ensembleinlärning. För att uppnå det första målet definierar, kategoriserar och detaljgranskar vi former för visuell representation av tillförlitlighet i existerande maskininlärningsramverk, och utifrån detta formulerar vi riktlinjer för design av nya visualiseringar inom området. För att uppnå det andra målet diskuterar vi flera av våra egenutvecklade visuella analysverktyg och system, som utvecklats i syfte att möjliggöra specifik forskning på de olika stegen i ett generellt maskininlärningsramverk (vilket typiskt består av: databehandling, dataförädling, inställning av parametrar, förståelse, felsökning, förbättring, samt jämförelse av olika modeller). Våra metoder kan appliceras på många olika typer av data, men riktar sig främst mot data i tabellformat från områden såsom hälsovård och finans. Tillämplighet och relevans har validerats med hjälp av fallstudier, användningsfall, intervjuer med experter, användarstudier och diskussioner rörande begränsningar och möjliga alternativa designlösningar. Innehållet i denna avhandling öppnar upp nya inriktningar för forskning i visuell analys inom förklarlig och pålitlig maskininlärning.

Place, publisher, year, edition, pages
Växjö, Sweden: Linnaeus University Press, 2023. p. 360
Series
Linnaeus University Dissertations ; 482
Keywords
visualization, interaction, visual analytics, explainable machine learning, XAI, trustworthy machine learning, ensemble learning, dimensionality reduction, supervised learning, unsupervised learning, ML, AI, tabular data, visualisering, interaktion, visuell analys, förklarlig maskininlärning, XAI, pålitlig maskininlärning, ensembleinlärning, dimensionesreducering, övervakad inlärning, oövervakad inlärning, ML, AI, tabelldata
National Category
Computer Sciences Human Computer Interaction
Research subject
Computer and Information Sciences Computer Science, Computer Science; Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-118794 (URN)10.15626/LUD.482.2023 (DOI)9789189709942 (ISBN)9789189709959 (ISBN)
Public defence
2023-02-23, Weber, Hus K, Växjö, 09:00 (English)
Opponent
Supervisors
Available from: 2023-01-30 Created: 2023-01-27 Last updated: 2024-03-13Bibliographically approved
Chatzimparmpas, A., Park, V. & Kerren, A. (2022). Evaluating StackGenVis with a Comparative User Study. In: Proceedings of the 15th IEEE Pacific Visualization Symposium (PacificVis '22): . Paper presented at 15th IEEE Pacific Visualization Symposium (PacificVis '22), online conference, April 11-14, 2022 (pp. 161-165). IEEE
Open this publication in new window or tab >>Evaluating StackGenVis with a Comparative User Study
2022 (English)In: Proceedings of the 15th IEEE Pacific Visualization Symposium (PacificVis '22), IEEE, 2022, p. 161-165Conference paper, Published paper (Refereed)
Abstract [en]

Stacked generalization (also called stacking) is an ensemble method in machine learning that deploys a metamodel to summarize the predictive results of heterogeneous base models organized into one or more layers. Despite being capable of producing high-performance results, building a stack of models can be a trial-and-error procedure. Thus, our previously developed visual analytics system, entitled StackGenVis, was designed to monitor and control the entire stacking process visually. In this work, we present the results of a comparative user study we performed for evaluating the StackGenVis system. We divided the study participants into two groups to test the usability and effectiveness of StackGenVis compared to Orange Visual Stacking (OVS) in an exploratory usage scenario using healthcare data. The results indicate that StackGenVis is significantly more powerful than OVS based on the qualitative feedback provided by the participants. However, the average completion time for all tasks was comparable between both tools.

Place, publisher, year, edition, pages
IEEE, 2022
Series
IEEE Pacific Visualization Symposium, ISSN 2165-8765, E-ISSN 2165-8773
Keywords
Visualization, evaluation, user study, visual analytics, machine learning, stacked generalization, stacking, ensemble learning
National Category
Human Computer Interaction Other Computer and Information Science
Research subject
Computer Science, Information and software visualization; Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-109825 (URN)10.1109/PacificVis53943.2022.00025 (DOI)000850180500017 ()2-s2.0-85132430186 (Scopus ID)9781665423359 (ISBN)9781665423366 (ISBN)
Conference
15th IEEE Pacific Visualization Symposium (PacificVis '22), online conference, April 11-14, 2022
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2022-01-25 Created: 2022-01-25 Last updated: 2023-01-04Bibliographically approved
Chatzimparmpas, A., Martins, R. M., Kucher, K. & Kerren, A. (2022). FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches. IEEE Transactions on Visualization and Computer Graphics, 28(4), 1773-1791
Open this publication in new window or tab >>FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches
2022 (English)In: IEEE Transactions on Visualization and Computer Graphics, ISSN 1077-2626, E-ISSN 1941-0506, Vol. 28, no 4, p. 1773-1791Article in journal (Refereed) Published
Abstract [en]

The machine learning (ML) life cycle involves a series of iterative steps, from the effective gathering and preparation of the data—including complex feature engineering processes—to the presentation and improvement of results, with various algorithms to choose from in every step. Feature engineering in particular can be very beneficial for ML, leading to numerous improvements such as boosting the predictive results, decreasing computational times, reducing excessive noise, and increasing the transparency behind the decisions taken during the training. Despite that, while several visual analytics tools exist to monitor and control the different stages of the ML life cycle (especially those related to data and algorithms), feature engineering support remains inadequate. In this paper, we present FeatureEnVi, a visual analytics system specifically designed to assist with the feature engineering process. Our proposed system helps users to choose the most important feature, to transform the original features into powerful alternatives, and to experiment with different feature generation combinations. Additionally, data space slicing allows users to explore the impact of features on both local and global scales. FeatureEnVi utilizes multiple automatic feature selection techniques; furthermore, it visually guides users with statistical evidence about the influence of each feature (or subsets of features). The final outcome is the extraction of heavily engineered features, evaluated by multiple validation metrics. The usefulness and applicability of FeatureEnVi are demonstrated with two use cases and a case study. We also report feedback from interviews with two ML experts and a visualization researcher who assessed the effectiveness of our system.

Place, publisher, year, edition, pages
IEEE, 2022
Keywords
Feature selection, feature extraction, feature engineering, machine learning, visual analytics, visualization
National Category
Computer Sciences Human Computer Interaction
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-108801 (URN)10.1109/TVCG.2022.3141040 (DOI)000761227900006 ()34990365 (PubMedID)2-s2.0-85122858225 (Scopus ID)
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2022-01-05 Created: 2022-01-05 Last updated: 2022-03-29Bibliographically approved
Musleh, M., Chatzimparmpas, A. & Jusufi, I. (2022). Visual analysis of blow molding machine multivariate time series data. Journal of Visualization, 25, 1329-1342
Open this publication in new window or tab >>Visual analysis of blow molding machine multivariate time series data
2022 (English)In: Journal of Visualization, ISSN 1343-8875, E-ISSN 1875-8975, Vol. 25, p. 1329-1342Article in journal (Refereed) Published
Abstract [en]

The recent development in the data analytics field provides a boost in production for modern industries. Small-sized factories intend to take full advantage of the data collected by sensors used in their machinery. The ultimate goal is to minimize cost and maximize quality, resulting in an increase in profit. In collaboration with domain experts, we implemented a data visualization tool to enable decision-makers in a plastic factory to improve their production process. The tool is an interactive dashboard with multiple coordinated views supporting the exploration from both local and global perspectives. In summary, we investigate three different aspects: methods for preprocessing multivariate time series data, clustering approaches for the already refined data, and visualization techniques that aid domain experts in gaining insights into the different stages of the production process. Here we present our ongoing results grounded in a human-centered development process. We adopt a formative evaluation approach to continuously upgrade our dashboard design that eventually meets partners' requirements and follows the best practices within the field. We also conducted a case study with a domain expert to validate the potential application of the tool in the real-life context. Finally, we assessed the usability and usefulness of the tool with a two-layer summative evaluation that showed encouraging results.

Place, publisher, year, edition, pages
Springer, 2022
Keywords
Time series data, Unsupervised machine learning, Visualization
National Category
Computer and Information Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-115614 (URN)10.1007/s12650-022-00857-4 (DOI)000822967800001 ()35845181 (PubMedID)2-s2.0-85133821926 (Scopus ID)
Available from: 2022-08-03 Created: 2022-08-03 Last updated: 2022-11-22Bibliographically approved
Chatzimparmpas, A., Martins, R. M., Kucher, K. & Kerren, A. (2021). Empirical Study: Visual Analytics for Comparing Stacking to Blending Ensemble Learning. In: Ioan Dumitrache, Adina Magda Florea, Mihnea-Alexandru Moisescu, Florin Pop, and Alexandru Dumitraşcu (Ed.), Proceedings of the 23rd International Conference on Control Systems and Computer Science (CSCS23), 26–28 May 2021, Bucharest, Romania: . Paper presented at The 23rd International Conference on Control Systems and Computer Science (CSCS23), online conference, 26-28 May, 2021 (pp. 1-8). IEEE
Open this publication in new window or tab >>Empirical Study: Visual Analytics for Comparing Stacking to Blending Ensemble Learning
2021 (English)In: Proceedings of the 23rd International Conference on Control Systems and Computer Science (CSCS23), 26–28 May 2021, Bucharest, Romania / [ed] Ioan Dumitrache, Adina Magda Florea, Mihnea-Alexandru Moisescu, Florin Pop, and Alexandru Dumitraşcu, IEEE, 2021, p. 1-8Conference paper, Published paper (Other academic)
Abstract [en]

Stacked generalization (also called stacking) is an ensemble method in machine learning that uses a metamodel to combine the predictive results of heterogeneous base models arranged in at least one layer. K-fold cross-validation is employed at the various stages of training in this method. Nonetheless, another validation strategy is to try out several splits of data leading to different train and test sets for the base models and then use only the latter to train the metamodel—this is known as blending. In this work, we present a modification of an existing visual analytics system, entitled StackGenVis, that now supports the process of composing robust and diverse ensembles of models with both aforementioned methods. We have built multiple ensembles using our system with the two respective methods, and we tested the performance with six small- to large-sized data sets. The results indicate that stacking is significantly more powerful than blending based on three performance metrics. However, the training times of the base models and the final ensembles are lower and more stable during various train/test splits in blending rather than stacking.

Place, publisher, year, edition, pages
IEEE, 2021
Keywords
Stacking, blending, ensemble learning, machine learning, visual analytics, visualization, empirical study
National Category
Computer Sciences
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-106084 (URN)10.1109/CSCS52396.2021.00008 (DOI)2-s2.0-85112033363 (Scopus ID)9781665439404 (ISBN)9781665439398 (ISBN)
Conference
The 23rd International Conference on Control Systems and Computer Science (CSCS23), online conference, 26-28 May, 2021
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Note

Invited Paper

Available from: 2021-08-04 Created: 2021-08-04 Last updated: 2022-06-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-9079-2376

Search in DiVA

Show all publications