Stance Classification of Social Media Texts for Under-Resourced Scenarios in Social Sciences
2022 (English)In: Data, E-ISSN 2306-5729, Vol. 7, no 11, article id 159Article in journal (Refereed) Published
Abstract [en]
In this work, we explore the performance of supervised stance classification methods for social media texts in under-resourced languages and using limited amounts of labeled data. In particular, we focus specifically on the possibilities and limitations of the application of classic machine learning versus deep learning in social sciences. To achieve this goal, we use a training dataset of 5.7K messages posted on Flashback Forum, a Swedish discussion platform, further supplemented with the previously published ABSAbank-Imm annotated dataset, and evaluate the performance of various model parameters and configurations to achieve the best training results given the character of the data. Our experiments indicate that classic machine learning models achieve results that are on par or even outperform those of neural networks and, thus, could be given priority when considering machine learning approaches for similar knowledge domains, tasks, and data. At the same time, the modern pre-trained language models provide useful and convenient pipelines for obtaining vectorized data representations that can be combined with classic machine learning algorithms. We discuss the implications of their use in such scenarios and outline the directions for further research.
Place, publisher, year, edition, pages
MDPI, 2022. Vol. 7, no 11, article id 159
Keywords [en]
text mining, machine learning, deep learning, neural networks, stance classification, computational social science, social media, supervised learning, sentiment classification, Swedish language data
National Category
Language Technology (Computational Linguistics) Social Sciences Interdisciplinary
Research subject
Social Sciences; Computer and Information Sciences Computer Science; Computer and Information Sciences Computer Science, Computer Science
Identifiers
URN: urn:nbn:se:lnu:diva-117570DOI: 10.3390/data7110159ISI: 000895323600001Scopus ID: 2-s2.0-85149444003OAI: oai:DiVA.org:lnu-117570DiVA, id: diva2:1711488
Funder
European Commission, INEA/CEF/ICT/A2020/2394203
Note
This paper is an extended version of our paper published in the Proceedings of the Swedish Workshop on Data Science (SweDS ’21), Växjö, Sweden, December 2–3, 2021.
2022-11-172022-11-172023-05-11Bibliographically approved