lnu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Identifying the Authors' National Variety of English in Social Media Texts
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM), Institutionen för datavetenskap (DV). Lund University, Sweden. (ISOVIS)ORCID-id: 0000-0002-8998-3618
XPLAIN, Greece.
Lund University, Sweden.ORCID-id: 0000-0002-7240-9003
Linnéuniversitetet, Fakulteten för teknik (FTK), Institutionen för datavetenskap och medieteknik (DM), Institutionen för datavetenskap (DV). (ISOVIS)ORCID-id: 0000-0002-0519-2537
2017 (engelsk)Inngår i: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2017 / [ed] Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Ivelina Nikolova, and Irina Temnikova, Stroudsburg, PA: Association for Computational Linguistics, 2017, s. 671-678Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and databased features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selection process. The classification accuracy achieved, when the 31 highest ranked features were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed.

sted, utgiver, år, opplag, sider
Stroudsburg, PA: Association for Computational Linguistics, 2017. s. 671-678
Emneord [en]
NLP, social media texts, national variety, English, annotations, classification
HSV kategori
Forskningsprogram
Datavetenskap, Informations- och programvisualisering; Data- och informationsvetenskap, Datavetenskap
Identifikatorer
URN: urn:nbn:se:lnu:diva-66856DOI: 10.26615/978-954-452-049-6_086Scopus ID: 2-s2.0-85045752980ISBN: 978-954-452-048-9 (tryckt)ISBN: 978-954-452-049-6 (digital)OAI: oai:DiVA.org:lnu-66856DiVA, id: diva2:1120992
Konferanse
The 11th Biennial Conference on Recent Advances In Natural Language Processing (RANLP '17), 2-6 September 2017, Varna, Bulgaria
Prosjekter
StaViCTA
Forskningsfinansiär
Swedish Research Council, 2012-5659Tilgjengelig fra: 2017-07-07 Laget: 2017-07-07 Sist oppdatert: 2019-06-11bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Personposter BETA

Simaki, VasilikiKerren, Andreas

Søk i DiVA

Av forfatter/redaktør
Simaki, VasilikiParadis, CaritaKerren, Andreas
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 157 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf