lnu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Towards a language independent Twitter bot detector
DISA. (DISA-DH)ORCID-id: 0000-0001-9775-4594
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för matematik (MA). DISA. (DISA-DH)ORCID-id: 0000-0002-0510-6782
University of Eastern Finland, Finland. (DISA-DH)ORCID-id: 0000-0003-3123-6932
2019 (engelsk)Inngår i: Proceedings of 4th Conference of The Association Digital Humanities in the Nordic Countries: Copenhagen, March 6-8 2019 / [ed] Navarretta Costanza et al., Copenhagen: Digital Humanities in the Nordic countries , 2019Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This article describes our work in developing an application that recognizes automatically generated tweets. The objective of this machine learning application is to increase data accuracy in sociolinguistic studies that utilize Twitter by reducing skewed sampling and inaccuracies in linguistic data. Most previous machine learning attempts to exclude bot material have been language dependent since they make use of monolingual Twitter text in their training phase. In this paper, we present a language independent approach which classifies each single tweet to be either autogenerated (AGT) or human-generated (HGT). We define an AGT as a tweet where all or parts of the natural language content is generated automatically by a bot or other type of program. In other words, while AGT/HGT refer to an individual message, the term bot refers to non-personal and automated accounts that post content to online social networks. Our approach classifies a tweet using only metadata that comes with every tweet, and we utilize those metadata parameters that are both language and country independent. The empirical part shows good success rates. Using a bilingual training set of Finnish and Swedish tweets, we correctly classified about 98.2% of all tweets in a test set using a third language (English).

sted, utgiver, år, opplag, sider
Copenhagen: Digital Humanities in the Nordic countries , 2019.
Emneord [en]
Twitter, bots, bot detection, supervised machine learning
HSV kategori
Forskningsprogram
Humaniora, Engelska med språkvetenskaplig inriktning; Data- och informationsvetenskap, Datavetenskap
Identifikatorer
URN: urn:nbn:se:lnu:diva-81663OAI: oai:DiVA.org:lnu-81663DiVA, id: diva2:1302270
Konferanse
4th Conference of The Association Digital Humanities in the Nordic Countries, Copenhagen, March 6-8 2019
Tilgjengelig fra: 2019-04-04 Laget: 2019-04-04 Sist oppdatert: 2019-04-29bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Fulltext

Personposter BETA

Lundberg, JonasNordqvist, JonasLaitinen, Mikko

Søk i DiVA

Av forfatter/redaktør
Lundberg, JonasNordqvist, JonasLaitinen, Mikko
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric

urn-nbn
Totalt: 248 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf