lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
WET: Word embedding-topic distribution vectors for MOOC video lectures dataset
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).ORCID iD: 0000-0002-0199-2377
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).ORCID iD: 0000-0003-0512-6350
Norwegian University of Science and Technology, Norway.
2020 (English)In: Data in Brief, E-ISSN 2352-3409, Vol. 28, p. 1-6, article id 105090Article in journal (Refereed) Published
Abstract [en]

In this article, we present a dataset containing word embeddings and document topic distribution vectors generated from MOOCs video lecture transcripts. Transcripts of 12,032 video lectures from 200 courses were collected from Coursera learning platform. This large corpus of transcripts was used as input to two well-known NLP techniques, namely Word2Vec and Latent Dirichlet Allocation (LDA) to generate word embeddings and topic vectors, respectively. We used Word2Vec and LDA implementation in the Gensim package in Python. The data presented in this article are related to the research article entitled “Integrating word embeddings and document topics with deep learning in a video classification framework” [1]. The dataset is hosted in the Mendeley Data repository [2].

Place, publisher, year, edition, pages
Elsevier, 2020. Vol. 28, p. 1-6, article id 105090
Keywords [en]
Word embedding, Document topics, Video lecture transcript, MOOC, LDA, Word2Vec
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
URN: urn:nbn:se:lnu:diva-90820DOI: 10.1016/j.dib.2019.105090ISI: 000520402100267PubMedID: 31921958Scopus ID: 2-s2.0-85077356594OAI: oai:DiVA.org:lnu-90820DiVA, id: diva2:1384454
Available from: 2020-01-09 Created: 2020-01-09 Last updated: 2021-05-07Bibliographically approved

Open Access in DiVA

fulltext(990 kB)519 downloads
File information
File name FULLTEXT01.pdfFile size 990 kBChecksum SHA-512
691c82d8d1991892d774915a0c6c5292cd8a0c0a25fabf3642961c6f1d614d714f47417cf0f6396b15756bc7df2ff2bcdf03f34421d9fa3dd002dde41f492983
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMedScopus

Authority records

Kastrati, ZenunKurti, Arianit

Search in DiVA

By author/editor
Kastrati, ZenunKurti, Arianit
By organisation
Department of computer science and media technology (CM)
In the same journal
Data in Brief
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 519 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 438 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf