lnu.sePublikasjoner
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Using Multiple Embeddings for Visually Guided Text Similarity Analysis
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM). (ISOVIS)ORCID-id: 0000-0001-6150-0787
2025 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Making sense of large sets of data is a general and important challenge that occurs for many research fields and real-world scenarios. Therefore, many different specific computational methods for data mining and analysis have been developed, some which are specific to certain data types and some which are more general. Such methods often seek to reveal the intrinsic structure of relations between the data items in order to provide important insights beyond the individual data values. This can be done in many different ways, but interestingly several of the most prominent methods (such as clustering and dimensionality reduction) are based on similarity/closeness calculations. The concept of similarity may at first glance seem both intuitive and simple, but it provides several challenges conceptually, visually and computationally due to its inherently subjective nature.

Given the prevalence of similarity-based analysis methods within visual analytics (VA), we argue that there is a need for a better understanding of the potential and limitations of such methods---not only in their own specific contexts, but rather on a more common and general level. With this in mind, we have identified a current research gap regarding the need for a comprehensive approach on how to evaluate, compare and combine different models within the context of similarity calculations. In this thesis, we seek to fill this gap through a series of publications around the common thread of developing a coherent VA framework for similarity-based analysis of large textual data sets. Although we have founded our work on embedding-based similarity calculations on textual data, many of the general ideas and implications are generalizable to other computational approaches and data types as well.

Our work covers several important aspects of the problem area, each of which is needed in order to construct a comprehensive methodology framework. As a foundation for our work, and for positioning our contribution in the context of the current research frontier, we provide a comprehensive survey of the use of embeddings within VA applications. For a solid conceptual understanding of the concept of similarity, we provide an analysis of its inherently subjective nature and the challenges this entails. Computationally, we develop several new methods for evaluating, comparing and combining different models. As a direct result of this, we also uncover a surprisingly high level of model disagreement---even though only state-of-the-art models are used. Visually, we provide several new prototype VA tools aimed at including the analyst in the loop and promote trust and deep understanding. All in all, our work provides several new and important insights to a previously underresearched problem area.

sted, utgiver, år, opplag, sider
Linnaeus University Press, 2025.
Emneord [en]
Embeddings, Similarity Calculations, Visual Analytics, Text Mining
HSV kategori
Identifikatorer
URN: urn:nbn:se:lnu:diva-138916DOI: 10.15626/LUD.571.2025ISBN: 9789180822985 (tryckt)ISBN: 978-91-8082-299-2 (digital)OAI: oai:DiVA.org:lnu-138916DiVA, id: diva2:1962189
Disputas
2025-06-12, Newton, hus C, Växjö, 09:30 (engelsk)
Opponent
Tilgjengelig fra: 2025-06-02 Laget: 2025-05-28 Sist oppdatert: 2025-06-02bibliografisk kontrollert
Delarbeid
1. Visually Guided Network Reconstruction Using Multiple Embeddings
Åpne denne publikasjonen i ny fane eller vindu >>Visually Guided Network Reconstruction Using Multiple Embeddings
2023 (engelsk)Inngår i: Proceedings of the 16th IEEE Pacific Visualization Symposium (PacificVis '23), visualization notes track, IEEE, 2023, IEEE, 2023, s. 212-216Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Embeddings are powerful tools for transforming complex and unstructured data into numeric formats suitable for computational analysis tasks. In this paper, we extend our previous work on using multiple embeddings for text similarity calculations to the field of networks. The embedding ensemble approach improves network reconstruction performance compared to single-embedding strategies. Our visual analytics methodology is successful in handling both text and network data, which demonstrates its generalizability beyond its originally presented scope.

sted, utgiver, år, opplag, sider
IEEE, 2023
Emneord
Graph embedding, network embedding, similarity calculations, visual analytics, visualization
HSV kategori
Forskningsprogram
Datavetenskap, Informations- och programvisualisering
Identifikatorer
urn:nbn:se:lnu:diva-119859 (URN)10.1109/PacificVis56936.2023.00031 (DOI)2-s2.0-85163367392 (Scopus ID)9798350321241 (ISBN)9798350321258 (ISBN)
Konferanse
16th IEEE Pacific Visualization Symposium (PacificVis '23), Seoul, Korea, April 18-21, 2023
Forskningsfinansiär
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Tilgjengelig fra: 2023-03-19 Laget: 2023-03-19 Sist oppdatert: 2025-05-28bibliografisk kontrollert
2. Exploring Similarity Patterns in a Large Scientific Corpus
Åpne denne publikasjonen i ny fane eller vindu >>Exploring Similarity Patterns in a Large Scientific Corpus
2025 (engelsk)Inngår i: PLOS ONE, E-ISSN 1932-6203, Vol. 20, nr 4, artikkel-id e0321114Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Similarity-based analysis is a common and intuitive tool for exploring large data sets. For instance, grouping data items by their level of similarity, regarding one or several chosen aspects, can reveal patterns and relations from the intrinsic structure of the data and thus provide important insights in the sense-making process. Existing analytical methods (such as clustering and dimensionality reduction) tend to target questions such as "Which objects are similar?"; but since they are not necessarily well-suited to answer questions such as "How does the result change if we change the similarity criteria?" or "How are the items linked together by the similarity relations?" they do not unlock the full potential of similarity-based analysis—and here we see a gap to fill. In this paper, we propose that the concept of similarity could be regarded as both: (1) a relation between items, and (2) a property in its own, with a specific distribution over the data set. Based on this approach, we developed an embedding-based computational pipeline together with a prototype visual analytics tool which allows the user to perform similarity-based exploration of a large set of scientific publications. To demonstrate the potential of our method, we present two different use cases, and we also discuss the strengths and limitations of our approach.

sted, utgiver, år, opplag, sider
Public Library of Science (PLoS), 2025
Emneord
Visual Text Analytics, Text Mining, Text Embedding, Network Embedding, Similarity Calculations
HSV kategori
Forskningsprogram
Datavetenskap, Informations- och programvisualisering
Identifikatorer
urn:nbn:se:lnu:diva-137304 (URN)10.1371/journal.pone.0321114 (DOI)001488705600008 ()2-s2.0-105003254126 (Scopus ID)
Forskningsfinansiär
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Merknad

This work was partially supported through the ELLIIT environment for strategic research in Sweden. The work of Ilir Jusufi was supported in part by the Knowledge Foundation, Sweden, through the project ”Rekryteringar 21, Universitetslektor i spelteknik” under Contract 20210077.

Tilgjengelig fra: 2025-03-20 Laget: 2025-03-20 Sist oppdatert: 2025-05-28bibliografisk kontrollert
3. VA + Embeddings STAR: A State-of-the-Art Report on the Use of Embeddings in Visual Analytics
Åpne denne publikasjonen i ny fane eller vindu >>VA + Embeddings STAR: A State-of-the-Art Report on the Use of Embeddings in Visual Analytics
2023 (engelsk)Inngår i: Computer graphics forum (Print), ISSN 0167-7055, E-ISSN 1467-8659, Vol. 42, nr 3, s. 539-571Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Over the past years, an increasing number of publications in information visualization, especially within the field of visual analytics, have mentioned the term “embedding” when describing the computational approach. Within this context, embeddings are usually (relatively) low-dimensional, distributed representations of various data types (such as texts or graphs), and since they have proven to be extremely useful for a variety of data analysis tasks across various disciplines and fields, they have become widely used. Existing visualization approaches aim to either support exploration and interpretation of the embedding space through visual representation and interaction, or aim to use embeddings as part of the computational pipeline for addressing downstream analytical tasks. To the best of our knowledge, this is the first survey that takes a detailed look at embedding methods through the lens of visual analytics, and the purpose of our survey article is to provide a systematic overview of the state of the art within the emerging field of embedding visualization. We design a categorization scheme for our approach, analyze the current research frontier based on peer-reviewed publications, and discuss existing trends, challenges, and potential research directions for using embeddings in the context of visual analytics. Furthermore, we provide an interactive survey browser for the collected and categorized survey data, which currently includes 122 entries that appeared between 2007 and 2023.

sted, utgiver, år, opplag, sider
John Wiley & Sons, 2023
Emneord
embedding techniques, distributed representations, visual analytics, visualization
HSV kategori
Forskningsprogram
Datavetenskap, Informations- och programvisualisering
Identifikatorer
urn:nbn:se:lnu:diva-120749 (URN)10.1111/cgf.14859 (DOI)001020716600041 ()2-s2.0-85163625612 (Scopus ID)
Konferanse
25th EG Conference on Visualization (EuroVis '23), STAR track, 12-16 June 2023, Leipzig, Germany
Forskningsfinansiär
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsWallenberg AI, Autonomous Systems and Software Program (WASP)
Tilgjengelig fra: 2023-05-16 Laget: 2023-05-16 Sist oppdatert: 2025-05-28bibliografisk kontrollert
4. Using Similarity Network Analysis to Improve Text Similarity Calculations
Åpne denne publikasjonen i ny fane eller vindu >>Using Similarity Network Analysis to Improve Text Similarity Calculations
2025 (engelsk)Inngår i: Applied Network Science, E-ISSN 2364-8228, Vol. 10, artikkel-id 8Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Similarity-based analysis is a powerful and intuitive tool for exploring large data sets, for instance, for revealing patterns by grouping items by similarity or for recommending items based on selected samples. However, similarity is an abstract and subjective property which makes it hard to evaluate by a purely computational approach. Furthermore, there are usually several possible computational models that could be applied to the data, each with its own strengths and weaknesses. With this in mind, we aim to extend the research frontier regarding what impact the choice of a computational model may have on the results. In this paper, we target the scope of embedding-based similarity calculations on text documents and seek to answer the research question: "How can a better understanding of the continuous similarity distribution captured by different models lead to better similarity calculations on document sets?". We propose a new and generic methodology based on similarity network comparison, and based on this approach, we have developed a computational pipeline together with a prototype visual analytics tool that allows the user to easily assess the level of model agreement/disagreement. To demonstrate the potential of our method, as well as showing its application to real world scenarios, we apply it in an experimental setup using three state-of-the-art text embedding models and three different text corpora. In view of the surprisingly low level of model agreement regarding the data, we also discuss strategies for handling model disagreement.

sted, utgiver, år, opplag, sider
Springer Nature, 2025
Emneord
Embeddings, Text Similarity Calculations, Similarity Networks, Visual Analytics
HSV kategori
Forskningsprogram
Datavetenskap, Informations- och programvisualisering
Identifikatorer
urn:nbn:se:lnu:diva-137305 (URN)10.1007/s41109-025-00699-7 (DOI)001467943200001 ()2-s2.0-105000480934 (Scopus ID)
Forskningsfinansiär
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Merknad

This work was partially supported through the ELLIIT environment for strategic research in Sweden. The work of Ilir Jusufi was supported in part by the Knowledge Foundation, Sweden, through the project ”Rekryteringar 21, Universitetslektor i spelteknik” under Contract 20210077.

Tilgjengelig fra: 2025-03-20 Laget: 2025-03-20 Sist oppdatert: 2025-05-28bibliografisk kontrollert
5. Visually Guided Extraction of Prevalent Topics
Åpne denne publikasjonen i ny fane eller vindu >>Visually Guided Extraction of Prevalent Topics
2025 (engelsk)Inngår i: Information Visualization, ISSN 1473-8716, E-ISSN 1473-8724, Vol. 42, nr 2, s. 179-198Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The sensemaking process of large sets of text documents is highly challenging for tasks such as obtaining a comprehensive overview or keeping up with the most important trends and topics. Even though several established methods for condensation and summarization of large text corpora exist, many of them lack the ability to account for difference in prevalence between identified topics, which in turn impedes quantitative analysis. In this paper, we therefore propose a novel prevalence-aware method for topic extraction, and show how it can be used to obtain important insights from two text corpora with very different content. We also implemented a prototype visual analytics tool which guides the user in the search for relevant insights and promotes trust in the yielded results. We have verified our application by a user study, as well as by a validation run on a data set with previously known topic structure. The results clearly show that our approach is suitable for text mining, that is can be used by non-experts, and that it offers features which makes it an interesting candidate for use in several different analyze scenarios.

sted, utgiver, år, opplag, sider
SAGE Publications, 2025
Emneord
Visual Analytics, Text Mining, Text Embedding, Topic Modelling, Similarity Calculations
HSV kategori
Forskningsprogram
Datavetenskap, Informations- och programvisualisering
Identifikatorer
urn:nbn:se:lnu:diva-136101 (URN)10.1177/14738716241312400 (DOI)001408697200001 ()2-s2.0-85216198128 (Scopus ID)
Forskningsfinansiär
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Merknad

This work was partially supported through the ELLIIT environment for strategic research in Sweden. The work of Ilir Jusufi was supported in part by the Knowledge Foundation, Sweden, through the project ”Rekryteringar 21, Universitetslektor i spelteknik” under Contract 20210077.

Tilgjengelig fra: 2025-02-09 Laget: 2025-02-09 Sist oppdatert: 2025-05-28bibliografisk kontrollert

Open Access i DiVA

fulltext(18402 kB)240 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 18402 kBChecksum SHA-512
2159efe479a033d7fb435f5a3e353695a1e5a56ce282f76b03eaa5004535c1cf8622f1b20047ceca5752516e4ce20d414530332255339b60eacfdad03d459738
Type fulltextMimetype application/pdf

Andre lenker

Forlagets fulltekst

Søk i DiVA

Av forfatter/redaktør
Daniel, Witschard
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 240 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 463 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf