lnu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
The Nordic Tweet Stream: A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data
(DISA-DH)ORCID-id: 0000-0003-3123-6932
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM). (DISA-DH)ORCID-id: 0000-0001-9775-4594
Linnéuniversitetet, Fakulteten för konst och humaniora (FKH), Institutionen för språk (SPR). (DISA-DH)ORCID-id: 0000-0002-5613-7618
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM).ORCID-id: 0000-0002-2901-935X
2018 (engelsk)Inngår i: DHN 2018 Digital Humanities in the Nordic Countries 3rd Conference: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, March 7-9, 2018 / [ed] Eetu Mäkelä, Mikko Tolonen, Jouni Tuominen, CEUR-WS.org , 2018, s. 349-362Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer scientists and a group of sociolinguists interestedin language variability and in the global spread of English. Our research integratestwo types of empirical data: We not only rely on traditional structured corpusdata but also use unstructured data sources that are often big and rich inmetadata, such as Twitter streams. The NTS downloads tweets and associatedmetadata from Denmark, Finland, Iceland, Norway and Sweden. We first introducesome technical aspects in creating a dynamic real-time monitor corpus, andthe following case study illustrates how the corpus could be used as empiricalevidence in sociolinguistic studies focusing on the global spread of English tomultilingual settings. The results show that English is the most frequently usedlanguage, accounting for almost a third. These results can be used to assess howwidespread English use is in the Nordic region and offer a big data perspectivethat complement previous small-scale studies. The future objectives include annotatingthe material, making it available for the scholarly community, and expandingthe geographic scope of the data stream outside Nordic region.

sted, utgiver, år, opplag, sider
CEUR-WS.org , 2018. s. 349-362
Serie
CEUR Workshop Proceedings, ISSN 1613-0073 ; 2084
Emneord [en]
Real-time language data, Nordic Tweet Stream, Twitter
HSV kategori
Forskningsprogram
Humaniora, Engelska med språkvetenskaplig inriktning
Identifikatorer
URN: urn:nbn:se:lnu:diva-78277Scopus ID: 2-s2.0-85045342911OAI: oai:DiVA.org:lnu-78277DiVA, id: diva2:1255220
Konferanse
Digital Humanities in the Nordic Countries 3rd Conference, Helsinki, Finland, March 7-9, 2018
Prosjekter
DISATilgjengelig fra: 2018-10-11 Laget: 2018-10-11 Sist oppdatert: 2019-05-24bibliografisk kontrollert

Open Access i DiVA

fulltext(503 kB)27 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 503 kBChecksum SHA-512
2437c5a72838da2e0f74e4035df832eee3305a0d52a5fa14c5a40db2ffe84cd67258c6751a567458a2759ad6bba8c4f8b0961f375caa4d53adf1b334ca78c177
Type fulltextMimetype application/pdf

Andre lenker

ScopusFulltext

Personposter BETA

Laitinen, MikkoLundberg, JonasLevin, MagnusMartins, Rafael Messias

Søk i DiVA

Av forfatter/redaktør
Laitinen, MikkoLundberg, JonasLevin, MagnusMartins, Rafael Messias
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 27 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 120 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf