lnu.sePublications
Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An Investigation on the Impact of Non-Uniform Random Sampling Techniques for t-SNE
KTH Royal institute of technology, Sweden.
Linnaeus University, Faculty of Technology, Department of Mathematics.ORCID iD: 0000-0002-0510-6782
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).ORCID iD: 0000-0002-2901-935X
2020 (English)In: 2020 Swedish Workshop on Data Science (SweDS), Luleå, 2020,, IEEE, 2020, p. 1-8Conference paper, Published paper (Refereed)
Abstract [en]

t-Distributed Statistical Neighbor Embedding (t-SNE) is a dimensionality reduction technique that has gained much popularity for its increased capability of creating low-dimensional embeddings that preserve well-separated clusters from high-dimensional spaces. Despite its strengths, the running times for t-SNE are usually high and do not scale well with the size of datasets, which limits its applicability to scenarios that involve, for example, Big Data and interactive visualization. Downsampling the dataset into more manageable sizes is a possible straightforward workaround, but it is not clear from the literature how much the quality of the embedding suffers from the downsampling, and whether uniform random sampling is indeed the best possible solution. In this paper, we report on a thorough series of experiments performed to draw conclusions about the quality of embeddings from running t-SNE on samples of data using different sampling techniques: uniform random sampling, random walk sampling, our proposed affinity-based random walk sampling, and the so-called hubness sampling. Throughout our testing, the affinity-based variant of random walk sampling distinguished itself as a promising alternative to uniform random sampling.

Place, publisher, year, edition, pages
IEEE, 2020. p. 1-8
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
URN: urn:nbn:se:lnu:diva-93051DOI: 10.1109/SweDS51247.2020.9275586Scopus ID: 2-s2.0-85099096458ISBN: 978-1-7281-9204-8 (electronic)ISBN: 978-1-7281-9205-5 (print)OAI: oai:DiVA.org:lnu-93051DiVA, id: diva2:1416203
Conference
2020 Swedish Workshop on Data Science (SweDS), 29-30 Oct., Luleå, 2020,
Available from: 2020-03-22 Created: 2020-03-22 Last updated: 2021-05-06Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Nordqvist, JonasMartins, Rafael Messias

Search in DiVA

By author/editor
Nordqvist, JonasMartins, Rafael Messias
By organisation
Department of MathematicsDepartment of computer science and media technology (CM)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 259 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf