lnu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Towards a language independent Twitter bot detector
DISA. (DISA-DH)ORCID-id: 0000-0001-9775-4594
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för matematik (MA). DISA. (DISA-DH)ORCID-id: 0000-0002-0510-6782
University of Eastern Finland, Finland. (DISA-DH)ORCID-id: 0000-0003-3123-6932
2019 (Engelska)Ingår i: Proceedings of 4th Conference of The Association Digital Humanities in the Nordic Countries: Copenhagen, March 6-8 2019 / [ed] Navarretta Costanza et al., Copenhagen: Digital Humanities in the Nordic countries , 2019Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This article describes our work in developing an application that recognizes automatically generated tweets. The objective of this machine learning application is to increase data accuracy in sociolinguistic studies that utilize Twitter by reducing skewed sampling and inaccuracies in linguistic data. Most previous machine learning attempts to exclude bot material have been language dependent since they make use of monolingual Twitter text in their training phase. In this paper, we present a language independent approach which classifies each single tweet to be either autogenerated (AGT) or human-generated (HGT). We define an AGT as a tweet where all or parts of the natural language content is generated automatically by a bot or other type of program. In other words, while AGT/HGT refer to an individual message, the term bot refers to non-personal and automated accounts that post content to online social networks. Our approach classifies a tweet using only metadata that comes with every tweet, and we utilize those metadata parameters that are both language and country independent. The empirical part shows good success rates. Using a bilingual training set of Finnish and Swedish tweets, we correctly classified about 98.2% of all tweets in a test set using a third language (English).

Ort, förlag, år, upplaga, sidor
Copenhagen: Digital Humanities in the Nordic countries , 2019.
Nyckelord [en]
Twitter, bots, bot detection, supervised machine learning
Nationell ämneskategori
Jämförande språkvetenskap och allmän lingvistik Data- och informationsvetenskap
Forskningsämne
Humaniora, Engelska med språkvetenskaplig inriktning; Data- och informationsvetenskap, Datavetenskap
Identifikatorer
URN: urn:nbn:se:lnu:diva-81663OAI: oai:DiVA.org:lnu-81663DiVA, id: diva2:1302270
Konferens
4th Conference of The Association Digital Humanities in the Nordic Countries, Copenhagen, March 6-8 2019
Tillgänglig från: 2019-04-04 Skapad: 2019-04-04 Senast uppdaterad: 2019-04-29Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Fulltext

Personposter BETA

Lundberg, JonasNordqvist, JonasLaitinen, Mikko

Sök vidare i DiVA

Av författaren/redaktören
Lundberg, JonasNordqvist, JonasLaitinen, Mikko
Av organisationen
Institutionen för matematik (MA)
Jämförande språkvetenskap och allmän lingvistikData- och informationsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 248 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf