Åpne denne publikasjonen i ny fane eller vindu >>2021 (engelsk)Inngår i: Proceedings of the 2021 Swedish Workshop on Data Science (SweDS) / [ed] Rafael M. Martins, Morgan Ericsson, Danny Weyns, Kostiantyn Kucher, IEEE, 2021, s. 1-8Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]
In this paper, we present our methodology for supervised stance classification of sparse and imbalanced social media data. We test our framework on a manually labeled dataset of 5700 messages about immigration in the Swedish language posted on the Flashback forum, a controversial online discussion platform. Our proposed approach currently achieves a macro- averaged F1-score of 0.72 for test data on a two-class problem compared against 0.27 for a baseline four-class model. Since effective classification of imbalanced and sparse textual data in under-resourced languages presents certain methodological challenges, our study contributes to a discussion on the best pathways to achieve highest model performance given the character of the data and unavailability of large training datasets for this task. Moreover, this work exemplifies the application of ML methodology to social media data, which can be particularly relevant for social scientists working in this area and interested in leveraging the possibilities of machine learning in their research field. This methodology and the obtained results provide a foundation for further in-depth analyses of social media texts in the Swedish language following a data-driven approach.
sted, utgiver, år, opplag, sider
IEEE, 2021
Emneord
social media, sentiment classification, stance classification, supervised learning, Swedish text data classification
HSV kategori
Forskningsprogram
Samhällsvetenskap; Data- och informationsvetenskap, Datavetenskap
Identifikatorer
urn:nbn:se:lnu:diva-108362 (URN)10.1109/SweDS53855.2021.9637718 (DOI)000833296400001 ()2-s2.0-85123826996 (Scopus ID)9781665418300 (ISBN)
Konferanse
2021 Swedish Workshop on Data Science (SweDS), Växjö, Sweden, December 2-3, 2021
Prosjekter
DISA
2021-12-032021-12-032025-02-20bibliografisk kontrollert