lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The Nordic Tweet Stream: A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data
(DISA-DH)ORCID iD: 0000-0003-3123-6932
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM). (DISA-DH)ORCID iD: 0000-0001-9775-4594
Linnaeus University, Faculty of Arts and Humanities, Department of Languages. (DISA-DH)ORCID iD: 0000-0002-5613-7618
Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).ORCID iD: 0000-0002-2901-935X
2018 (English)In: DHN 2018 Digital Humanities in the Nordic Countries 3rd Conference: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, March 7-9, 2018 / [ed] Eetu Mäkelä, Mikko Tolonen, Jouni Tuominen, CEUR-WS.org , 2018, p. 349-362Conference paper, Published paper (Refereed)
Abstract [en]

This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer scientists and a group of sociolinguists interestedin language variability and in the global spread of English. Our research integratestwo types of empirical data: We not only rely on traditional structured corpusdata but also use unstructured data sources that are often big and rich inmetadata, such as Twitter streams. The NTS downloads tweets and associatedmetadata from Denmark, Finland, Iceland, Norway and Sweden. We first introducesome technical aspects in creating a dynamic real-time monitor corpus, andthe following case study illustrates how the corpus could be used as empiricalevidence in sociolinguistic studies focusing on the global spread of English tomultilingual settings. The results show that English is the most frequently usedlanguage, accounting for almost a third. These results can be used to assess howwidespread English use is in the Nordic region and offer a big data perspectivethat complement previous small-scale studies. The future objectives include annotatingthe material, making it available for the scholarly community, and expandingthe geographic scope of the data stream outside Nordic region.

Place, publisher, year, edition, pages
CEUR-WS.org , 2018. p. 349-362
Series
CEUR Workshop Proceedings, ISSN 1613-0073 ; 2084
Keywords [en]
Real-time language data, Nordic Tweet Stream, Twitter
National Category
General Language Studies and Linguistics Specific Languages
Research subject
Humanities, English
Identifiers
URN: urn:nbn:se:lnu:diva-78277Scopus ID: 2-s2.0-85045342911OAI: oai:DiVA.org:lnu-78277DiVA, id: diva2:1255220
Conference
Digital Humanities in the Nordic Countries 3rd Conference, Helsinki, Finland, March 7-9, 2018
Projects
DISAAvailable from: 2018-10-11 Created: 2018-10-11 Last updated: 2019-05-24Bibliographically approved

Open Access in DiVA

fulltext(503 kB)16 downloads
File information
File name FULLTEXT01.pdfFile size 503 kBChecksum SHA-512
2437c5a72838da2e0f74e4035df832eee3305a0d52a5fa14c5a40db2ffe84cd67258c6751a567458a2759ad6bba8c4f8b0961f375caa4d53adf1b334ca78c177
Type fulltextMimetype application/pdf

Other links

ScopusFulltext

Authority records BETA

Laitinen, MikkoLundberg, JonasLevin, MagnusMartins, Rafael Messias

Search in DiVA

By author/editor
Laitinen, MikkoLundberg, JonasLevin, MagnusMartins, Rafael Messias
By organisation
Department of computer science and media technology (CM)Department of Languages
General Language Studies and LinguisticsSpecific Languages

Search outside of DiVA

GoogleGoogle Scholar
Total: 16 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 74 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf