Öppna denna publikation i ny flik eller fönster >>2021 (Engelska)Ingår i: Proceedings of the 2021 Swedish Workshop on Data Science (SweDS) / [ed] Rafael M. Martins, Morgan Ericsson, Danny Weyns, Kostiantyn Kucher, IEEE, 2021, s. 1-8Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]
In this paper, we present our methodology for supervised stance classification of sparse and imbalanced social media data. We test our framework on a manually labeled dataset of 5700 messages about immigration in the Swedish language posted on the Flashback forum, a controversial online discussion platform. Our proposed approach currently achieves a macro- averaged F1-score of 0.72 for test data on a two-class problem compared against 0.27 for a baseline four-class model. Since effective classification of imbalanced and sparse textual data in under-resourced languages presents certain methodological challenges, our study contributes to a discussion on the best pathways to achieve highest model performance given the character of the data and unavailability of large training datasets for this task. Moreover, this work exemplifies the application of ML methodology to social media data, which can be particularly relevant for social scientists working in this area and interested in leveraging the possibilities of machine learning in their research field. This methodology and the obtained results provide a foundation for further in-depth analyses of social media texts in the Swedish language following a data-driven approach.
Ort, förlag, år, upplaga, sidor
IEEE, 2021
Nyckelord
social media, sentiment classification, stance classification, supervised learning, Swedish text data classification
Nationell ämneskategori
Språkbehandling och datorlingvistik Freds- och konfliktforskning Övrig annan samhällsvetenskap
Forskningsämne
Samhällsvetenskap; Data- och informationsvetenskap, Datavetenskap
Identifikatorer
urn:nbn:se:lnu:diva-108362 (URN)10.1109/SweDS53855.2021.9637718 (DOI)000833296400001 ()2-s2.0-85123826996 (Scopus ID)9781665418300 (ISBN)
Konferens
2021 Swedish Workshop on Data Science (SweDS), Växjö, Sweden, December 2-3, 2021
Projekt
DISA
2021-12-032021-12-032025-02-20Bibliografiskt granskad