lnu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Publications (8 of 8) Show all publications
Simaki, V., Paradis, C. & Kerren, A. (2018). Evaluating stance-annotated sentences from the Brexit Blog Corpus: A quantitative linguistic analysis. ICAME Journal/International Computer Archive of Modern English, 42(1), 133-166
Open this publication in new window or tab >>Evaluating stance-annotated sentences from the Brexit Blog Corpus: A quantitative linguistic analysis
2018 (English)In: ICAME Journal/International Computer Archive of Modern English, ISSN 0801-5775, E-ISSN 1502-5462, Vol. 42, no 1, p. 133-166Article in journal (Refereed) Published
Abstract [en]

This paper offers a formally driven quantitative analysis of stance-annotated sentences in the Brexit Blog Corpus (BBC). Our goal is to highlight linguistic features that determine the formal profiles of six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty) in a subset of the BBC. The study has two parts: firstly, it examines a large number of formal linguistic features that occur in the sentences in order to describe the specific characteristics of each category, and secondly, it compares characteristics in the entire data set in order to determine linguistic similarities throughout the data set. We show that among the six stance categories in the corpus, contrariety and necessity are the most discriminative ones, with the former using longer sentences, more conjunctions, more repetitions and shorter forms than the sentences expressing other stances. The latter has longer lexical forms but shorter sentences, which are syntactically more complex. We show that stance in our data set is expressed in sentences with around 21 words per sentence. The sentences consist mainly of alphabetical characters forming a varied vocabulary without special forms, such as digits or special characters.

Place, publisher, year, edition, pages
De Gruyter Open, 2018
Keywords
stance-taking, corpus annotation, political blog text, statistical analysis, formal features
National Category
Language Technology (Computational Linguistics) Specific Languages
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-70768 (URN)10.1515/icame-2018-0007 (DOI)
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2018-02-12 Created: 2018-02-12 Last updated: 2018-10-17Bibliographically approved
Simaki, V., Paradis, C., Skeppstedt, M., Sahlgren, M., Kucher, K. & Kerren, A. (2017). Annotating speaker stance in discourse: the Brexit Blog Corpus. Corpus linguistics and linguistic theory
Open this publication in new window or tab >>Annotating speaker stance in discourse: the Brexit Blog Corpus
Show others...
2017 (English)In: Corpus linguistics and linguistic theory, ISSN 1613-7027, E-ISSN 1613-7035Article in journal (Refereed) Epub ahead of print
Abstract [en]

The aim of this study is to explore the possibility of identifying speaker stance in discourse, provide an analytical resource for it and an evaluation of the level of agreement across speakers. We also explore to what extent language users agree about what kind of stances are expressed in natural language use or whether their interpretations diverge. In order to perform this task, a comprehensive cognitive-functional framework of ten stance categories was developed based on previous work on speaker stance in the literature. A corpus of opinionated texts was compiled, the Brexit Blog Corpus (BBC). An analytical protocol and interface (ALVA) for the annotations was set up and the data were independently annotated by two annotators. The annotation procedure, the annotation agreements and the co-occurrence of more than one stance in the utterances are described and discussed. The careful, analytical annotation process has returned satisfactory inter- and intra-annotation agreement scores, resulting in a gold standard corpus, the final version of the BBC. 

Keywords
text annotation, blog post texts, modality, evaluation, positioning
National Category
Language Technology (Computational Linguistics) General Language Studies and Linguistics
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-67319 (URN)10.1515/cllt-2016-0060 (DOI)
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Note

TO BE PUBLISHED!

Available from: 2017-08-21 Created: 2017-08-21 Last updated: 2019-08-28
Skeppstedt, M., Simaki, V., Paradis, C. & Kerren, A. (2017). Detection of Stance and Sentiment Modifiers in Political Blogs. In: Alexey Karpov, Rodmonga Potapova, and Iosif Mporas (Ed.), Speech and Computer: 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings. Paper presented at 19th International Conference on Speech and Computer (SPECOM '17), 12-16 September 2017, Hatfield, Hertfordshire, UK (pp. 302-311). Springer International Publishing
Open this publication in new window or tab >>Detection of Stance and Sentiment Modifiers in Political Blogs
2017 (English)In: Speech and Computer: 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings / [ed] Alexey Karpov, Rodmonga Potapova, and Iosif Mporas, Springer International Publishing , 2017, p. 302-311Conference paper, Published paper (Refereed)
Abstract [en]

The automatic detection of seven types of modifiers was studied: Certainty, Uncertainty, Hypotheticality, Prediction, Recommendation, Concession/Contrast and Source. A classifier aimed at detecting local cue words that signal the categories was the most successful method for five of the categories. For Prediction and Hypotheticality, however, better results were obtained with a classifier trained on tokens and bi-grams present in the entire sentence. Unsupervised cluster features were shown useful for the categories Source and Uncertainty, when a subset of the training data available was used. However, when all of the 2,095 sentences that had been actively selected and manually annotated were used as training data, the cluster features had a very limited effect. Some of the classification errors made by the models would be possible to avoid by extending the training data set, while other features and feature representations, as well as the incorporation of pragmatic knowledge, would be required for other error types. 

Place, publisher, year, edition, pages
Springer International Publishing, 2017
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 10458
Keywords
stance modifiers, sentiment modifiers, active learning, unsupervised features, resource-aware natural language processing
National Category
Language Technology (Computational Linguistics) Computer Sciences
Research subject
Computer Science, Information and software visualization; Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-64582 (URN)10.1007/978-3-319-66429-3_29 (DOI)2-s2.0-85029498983 (Scopus ID)978-3-319-66428-6 (ISBN)978-3-319-66429-3 (ISBN)
Conference
19th International Conference on Speech and Computer (SPECOM '17), 12-16 September 2017, Hatfield, Hertfordshire, UK
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2017-05-31 Created: 2017-05-31 Last updated: 2019-08-29Bibliographically approved
Simaki, V., Simakis, P., Paradis, C. & Kerren, A. (2017). Identifying the Authors' National Variety of English in Social Media Texts. In: Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Ivelina Nikolova, and Irina Temnikova (Ed.), Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2017: . Paper presented at The 11th Biennial Conference on Recent Advances In Natural Language Processing (RANLP '17), 2-6 September 2017, Varna, Bulgaria (pp. 671-678). Stroudsburg, PA: Association for Computational Linguistics
Open this publication in new window or tab >>Identifying the Authors' National Variety of English in Social Media Texts
2017 (English)In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2017 / [ed] Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Ivelina Nikolova, and Irina Temnikova, Stroudsburg, PA: Association for Computational Linguistics, 2017, p. 671-678Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and databased features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selection process. The classification accuracy achieved, when the 31 highest ranked features were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed.

Place, publisher, year, edition, pages
Stroudsburg, PA: Association for Computational Linguistics, 2017
Keywords
NLP, social media texts, national variety, English, annotations, classification
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science, Information and software visualization; Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-66856 (URN)10.26615/978-954-452-049-6_086 (DOI)2-s2.0-85045752980 (Scopus ID)978-954-452-048-9 (ISBN)978-954-452-049-6 (ISBN)
Conference
The 11th Biennial Conference on Recent Advances In Natural Language Processing (RANLP '17), 2-6 September 2017, Varna, Bulgaria
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2017-07-07 Created: 2017-07-07 Last updated: 2019-06-11Bibliographically approved
Simaki, V., Aravantinou, C., Mporas, I., Kondyli, M. & Megalooikonomou, V. (2017). Sociolinguistic Features for Author Gender Identification: From Qualitative Evidence to Quantitative Analysis. Journal of Quantitative Linguistics, 24(1), 65-84
Open this publication in new window or tab >>Sociolinguistic Features for Author Gender Identification: From Qualitative Evidence to Quantitative Analysis
Show others...
2017 (English)In: Journal of Quantitative Linguistics, ISSN 0929-6174, E-ISSN 1744-5035, Vol. 24, no 1, p. 65-84Article in journal (Refereed) Published
Abstract [en]

Theoretical and empirical studies prove the strong relationship between social factors and the individual linguistic attitudes. Different social categories, such as gender, age, education, profession and social status, are strongly related with the linguistic diversity of people's everyday spoken and written interaction. In this paper, sociolinguistic studies addressed to gender differentiation are overviewed in order to identify how various linguistic characteristics differ between women and men. Thereafter, it is examined if and how these qualitative features can become quantitative metrics for the task of gender identification from texts on web blogs. The evaluation results showed that the "syntactic complexity", the "tag questions", the "period length", the "adjectives" and the "vocabulary richness" characteristics seem to be significantly distinctive with respect to the author's gender.

Place, publisher, year, edition, pages
Routledge, 2017
National Category
Language Technology (Computational Linguistics) Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-62067 (URN)10.1080/09296174.2016.1226430 (DOI)000396571200004 ()2-s2.0-84990196911 (Scopus ID)
Available from: 2017-04-03 Created: 2017-04-03 Last updated: 2019-08-29Bibliographically approved
Simaki, V., Paradis, C. & Kerren, A. (2017). Stance Classification in Texts from Blogs on the 2016 British Referendum. In: Alexey Karpov, Rodmonga Potapova, and Iosif Mporas (Ed.), Speech and Computer: 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings. Paper presented at 19th International Conference on Speech and Computer (SPECOM '17), 12-16 September 2017, Hatfield, Hertfordshire, UK (pp. 700-709). Springer International Publishing
Open this publication in new window or tab >>Stance Classification in Texts from Blogs on the 2016 British Referendum
2017 (English)In: Speech and Computer: 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings / [ed] Alexey Karpov, Rodmonga Potapova, and Iosif Mporas, Springer International Publishing , 2017, p. 700-709Conference paper, Published paper (Refereed)
Abstract [en]

The problem of identifying and correctly attributing speaker stance in human communication is addressed in this paper. The data set consists of political blogs dealing with the 2016 British referendum. A cognitive-functional framework is adopted with data annotated for six notional stance categories: concession/contrariness, hypotheticality, need/ requirement, prediction, source of knowledge, and uncertainty. We show that these categories can be implemented in a text classification task and automatically detected. To this end, we propose a large set of lexical and syntactic linguistic features. These features were tested and classification experiments were implemented using different algorithms. We achieved accuracy of up to 30% for the six-class experiments, which is not fully satisfactory. As a second step, we calculated the pair-wise combinations of the stance categories. The concession/contrariness and need/requirement binary classification achieved the best results with up to 71% accuracy. 

Place, publisher, year, edition, pages
Springer International Publishing, 2017
Series
Lecture Notes in Artificial Intelligence, ISSN 0302-9743 ; 10458
Keywords
stance-taking, text classification, political blogs, BREXIT
National Category
Language Technology (Computational Linguistics) Specific Languages
Research subject
Computer and Information Sciences Computer Science, Computer Science; Humanities, Linguistics
Identifiers
urn:nbn:se:lnu:diva-64580 (URN)10.1007/978-3-319-66429-3_70 (DOI)2-s2.0-85029468464 (Scopus ID)978-3-319-66428-6 (ISBN)978-3-319-66429-3 (ISBN)
Conference
19th International Conference on Speech and Computer (SPECOM '17), 12-16 September 2017, Hatfield, Hertfordshire, UK
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2017-05-31 Created: 2017-05-31 Last updated: 2019-08-29Bibliographically approved
Martins, R. M., Simaki, V., Kucher, K., Paradis, C. & Kerren, A. (2017). StanceXplore: Visualization for the Interactive Exploration of Stance in Social Media. In: : . Paper presented at 2nd Workshop on Visualization for the Digital Humanities (VIS4DH '17) at IEEE VIS '17, October 2017, Phoenix, Arizona, USA.
Open this publication in new window or tab >>StanceXplore: Visualization for the Interactive Exploration of Stance in Social Media
Show others...
2017 (English)Conference paper, Published paper (Refereed)
Abstract [en]

The use of interactive visualization techniques in Digital Humanities research can be a useful addition when traditional automated machine learning techniques face difficulties, as is often the case with the exploration of large volumes of dynamic—and in many cases, noisy and conflicting—textual data from social media. Recently, the field of stance analysis has been moving from a predominantly binary approach—either pro or con—to a multifaceted one, where each unit of text may be classified as one (or more) of multiple possible stance categories. This change adds more layers of complexity to an already hard problem, but also opens up new opportunities for obtaining richer and more relevant results from the analysis of stancetaking in social media. In this paper we propose StanceXplore, a new visualization for the interactive exploration of stance in social media. Our goal is to offer DH researchers the chance to explore stance-classified text corpora from different perspectives at the same time, using coordinated multiple views including user-defined topics, content similarity and dissimilarity, and geographical and temporal distribution. As a case study, we explore the activity of Twitter users in Sweden, analyzing their behavior in terms of topics discussed and the stances taken. Each textual unit (tweet) is labeled with one of eleven stance categories from a cognitive-functional stance framework based on recent work. We illustrate how StanceXplore can be used effectively to investigate multidimensional patterns and trends in stance-taking related to cultural events, their geographical distribution, and the confidence of the stance classifier. 

Keywords
Stance Visualization, Sentiment Analysis, Digital Humanities, Visual Analytics, Social Media Text
National Category
Human Computer Interaction Computer Sciences Language Technology (Computational Linguistics)
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-67320 (URN)
Conference
2nd Workshop on Visualization for the Digital Humanities (VIS4DH '17) at IEEE VIS '17, October 2017, Phoenix, Arizona, USA
Projects
StaViCTADISA-DH
Funder
Swedish Research Council, 2012-5659
Available from: 2017-08-21 Created: 2017-08-21 Last updated: 2019-04-08Bibliographically approved
Simaki, V., Mporas, I. & Megalooikonomou, V. (2016). Evaluation and sociolinguistic analysis of text features for gender and age identification. American Journal of Engineering and Applied Sciences, 9(4), 868-876
Open this publication in new window or tab >>Evaluation and sociolinguistic analysis of text features for gender and age identification
2016 (English)In: American Journal of Engineering and Applied Sciences, ISSN 1941-7020, E-ISSN 1941-7039, Vol. 9, no 4, p. 868-876Article in journal (Refereed) Published
Abstract [en]

The paper presents an interdisciplinary study in the field of automatic gender and age identification, under the scope of sociolinguistic knowledge on gendered and age linguistic choices that social media users make. The authors investigated and gathered standard and novel text features used in text mining approaches on the author’s demographic information and profiling and they examined their efficacy in gender and age detection tasks on a corpus consisted of social media texts. An analysis of the most informative features is attempted according to the nature of each feature and the information derived after the characteristics’ score of importance is discussed. © 2016 Vasiliki Simaki, Iosif Mporas and Vasileios Megalooikonomou.

Place, publisher, year, edition, pages
Science Publications, 2016
Keywords
Age identification, Feature ranking, Gender detection, ReliefF algorithm, Sociolinguistics, Text mining
National Category
General Language Studies and Linguistics Information Systems, Social aspects
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-87446 (URN)10.3844/ajeassp.2016.868.876 (DOI)2-s2.0-85008701961 (Scopus ID)
Note

Export Date: 10 May 2017; Article; Correspondence Address: Simaki, V.; Centre for Languages and Literature, Lund UniversitySweden; email: vasiliki.simaki@englund.lu.se

Available from: 2019-08-09 Created: 2019-08-09 Last updated: 2019-09-04Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-8998-3618

Search in DiVA

Show all publications