lnu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 45) Show all publications
Simaki, V., Paradis, C. & Kerren, A. (2018). Evaluating stance-annotated sentences from the Brexit Blog Corpus: A quantitative linguistic analysis. ICAME Journal/International Computer Archive of Modern English, 42(1), 133-166
Open this publication in new window or tab >>Evaluating stance-annotated sentences from the Brexit Blog Corpus: A quantitative linguistic analysis
2018 (English)In: ICAME Journal/International Computer Archive of Modern English, ISSN 0801-5775, E-ISSN 1502-5462, Vol. 42, no 1, p. 133-166Article in journal (Refereed) Published
Abstract [en]

This paper offers a formally driven quantitative analysis of stance-annotated sentences in the Brexit Blog Corpus (BBC). Our goal is to highlight linguistic features that determine the formal profiles of six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty) in a subset of the BBC. The study has two parts: firstly, it examines a large number of formal linguistic features that occur in the sentences in order to describe the specific characteristics of each category, and secondly, it compares characteristics in the entire data set in order to determine linguistic similarities throughout the data set. We show that among the six stance categories in the corpus, contrariety and necessity are the most discriminative ones, with the former using longer sentences, more conjunctions, more repetitions and shorter forms than the sentences expressing other stances. The latter has longer lexical forms but shorter sentences, which are syntactically more complex. We show that stance in our data set is expressed in sentences with around 21 words per sentence. The sentences consist mainly of alphabetical characters forming a varied vocabulary without special forms, such as digits or special characters.

Place, publisher, year, edition, pages
De Gruyter Open, 2018
Keywords
stance-taking, corpus annotation, political blog text, statistical analysis, formal features
National Category
Language Technology (Computational Linguistics) Specific Languages
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-70768 (URN)10.1515/icame-2018-0007 (DOI)
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2018-02-12 Created: 2018-02-12 Last updated: 2018-10-17Bibliographically approved
Simaki, V., Paradis, C., Skeppstedt, M., Sahlgren, M., Kucher, K. & Kerren, A. (2017). Annotating speaker stance in discourse: the Brexit Blog Corpus. Corpus linguistics and linguistic theory
Open this publication in new window or tab >>Annotating speaker stance in discourse: the Brexit Blog Corpus
Show others...
2017 (English)In: Corpus linguistics and linguistic theory, ISSN 1613-7027, E-ISSN 1613-7035Article in journal (Refereed) Epub ahead of print
Abstract [en]

The aim of this study is to explore the possibility of identifying speaker stance in discourse, provide an analytical resource for it and an evaluation of the level of agreement across speakers. We also explore to what extent language users agree about what kind of stances are expressed in natural language use or whether their interpretations diverge. In order to perform this task, a comprehensive cognitive-functional framework of ten stance categories was developed based on previous work on speaker stance in the literature. A corpus of opinionated texts was compiled, the Brexit Blog Corpus (BBC). An analytical protocol and interface (ALVA) for the annotations was set up and the data were independently annotated by two annotators. The annotation procedure, the annotation agreements and the co-occurrence of more than one stance in the utterances are described and discussed. The careful, analytical annotation process has returned satisfactory inter- and intra-annotation agreement scores, resulting in a gold standard corpus, the final version of the BBC. 

Keywords
text annotation, blog post texts, modality, evaluation, positioning
National Category
Language Technology (Computational Linguistics) General Language Studies and Linguistics
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-67319 (URN)10.1515/cllt-2016-0060 (DOI)
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Note

TO BE PUBLISHED!

Available from: 2017-08-21 Created: 2017-08-21 Last updated: 2019-04-17
Skeppstedt, M., Sahlgren, M., Paradis, C. & Kerren, A. (2016). Active Learning for Detection of Stance Components. In: Proceedings of the Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES '16) at COLING '16: . Paper presented at Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES '16), Osaka, Japan, December 12, 2016 (pp. 50-59). Association for Computational Linguistics
Open this publication in new window or tab >>Active Learning for Detection of Stance Components
2016 (English)In: Proceedings of the Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES '16) at COLING '16, Association for Computational Linguistics, 2016, p. 50-59Conference paper, Published paper (Refereed)
Abstract [en]

Automatic detection of five language components, which are all relevant for expressing opinions and for stance taking, was studied: positive sentiment, negative sentiment, speculation, contrast and condition. A resource-aware approach was taken, which included manual annotation of 500 training samples and the use of limited lexical resources. Active learning was compared to random selection of training data, as well as to a lexicon-based method. Active learning was successful for the categories speculation, contrast and condition, but not for the two sentiment categories, for which results achieved when using active learning were similar to those achieved when applying a random selection of training data. This difference is likely due to a larger variation in how sentiment is expressed than in how speakers express the other three categories. This larger variation was also shown by the lower recall results achieved by the lexicon-based approach for sentiment than for the categories speculation, contrast and condition. 

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2016
Keywords
active learning, stance, sentiment, annotation, classifier
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:lnu:diva-57761 (URN)978-4-87974-723-5 (ISBN)
Conference
Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES '16), Osaka, Japan, December 12, 2016
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2016-11-01 Created: 2016-11-01 Last updated: 2018-01-13Bibliographically approved
Skeppstedt, M., Paradis, C. & Kerren, A. (2016). Marker Words for Negation and Speculation in Health Records and Consumer Reviews. In: Mariana Neves, Fabio Rinaldi, Goran Nenadic, and Dietrich Rebholz-Schuhmann (Ed.), Proceedings of the 7th International Symposium on Semantic Mining in Biomedicine (SMBM '16): . Paper presented at 7th International Symposium on Semantic Mining in Biomedicine (SMBM '16), Potsdam, Germany, August 4-5, 2016 (pp. 64-69). CEUR-WS.org, 1650
Open this publication in new window or tab >>Marker Words for Negation and Speculation in Health Records and Consumer Reviews
2016 (English)In: Proceedings of the 7th International Symposium on Semantic Mining in Biomedicine (SMBM '16) / [ed] Mariana Neves, Fabio Rinaldi, Goran Nenadic, and Dietrich Rebholz-Schuhmann, CEUR-WS.org , 2016, Vol. 1650, p. 64-69Conference paper, Published paper (Refereed)
Abstract [en]

Conditional random fields were trained to detect marker words for negation and speculation in two corpora belonging to two very different domains: clinical text and consumer review text. For the corpus of clinical text, marker words for speculation and negation were detected with results in line with previously reported interannotator agreement scores. This was also the case for speculation markers in the consumer review corpus, while detection of negation markers was unsuccessful in this genre. Also a setup in which models were trained on markers in consumer reviews, and applied on the clinical text genre, yielded low results. This shows that neither the trained models, nor the choice of appropriate machine learning algorithms and features, were transferable across the two text genres.

Place, publisher, year, edition, pages
CEUR-WS.org, 2016
Series
CEUR Workshop Proceedings, ISSN 1613-0073 ; 1650
Keywords
marker words, health records, consumer reviews, corpus, machine learning, natural language processing
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:lnu:diva-55120 (URN)
Conference
7th International Symposium on Semantic Mining in Biomedicine (SMBM '16), Potsdam, Germany, August 4-5, 2016
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2016-08-03 Created: 2016-08-03 Last updated: 2018-01-10Bibliographically approved
Kucher, K., Kerren, A., Paradis, C. & Sahlgren, M. (2016). Methodology and Applications of Visual Stance Analysis: An Interactive Demo. In: International Symposium on Digital Humanities, Växjö 7-8 November 2016: Book of Abstracts. Paper presented at International Symposium on Digital Humanities, Växjö, Sweden, November 7-8, 2016 (pp. 56-57). Linnaeus University
Open this publication in new window or tab >>Methodology and Applications of Visual Stance Analysis: An Interactive Demo
2016 (English)In: International Symposium on Digital Humanities, Växjö 7-8 November 2016: Book of Abstracts, Linnaeus University , 2016, p. 56-57Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Analysis of stance in textual data can reveal the attitudes of speakers, ranging from general agreement/disagreement with other speakers to fine-grained indications of wishes and emotions. The implementation of an automatic stance classifier and corresponding visualization techniques facilitates the analysis of human communication and social media texts. Furthermore, scholars in Digital Humanities could also benefit from such an approach by applying it for literature studies. For example, a researcher could explore the usage of such stance categories as certainty or prediction in a novel. Analysis of such abstract categories in longer texts would be complicated or even impossible with simpler tools such as regular expression search.

Our research on automatic and visual stance analysis is concerned with multiple theoretical and practical challenges in linguistics, computational linguistics, and information visualization. In this interactive demo, we demonstrate our web-based visual analytics system called ALVA, which is designed to support the text data annotation and stance classifier training stages. 

Place, publisher, year, edition, pages
Linnaeus University, 2016
Keywords
Digital humanities, Stance, Visualization, Interaction, NLP, Visual analytics, Annotation, Classifier training
National Category
Computer Sciences Human Computer Interaction Language Technology (Computational Linguistics)
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-57763 (URN)
Conference
International Symposium on Digital Humanities, Växjö, Sweden, November 7-8, 2016
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2016-11-01 Created: 2016-11-01 Last updated: 2018-01-13
Skeppstedt, M., Sahlgren, M., Paradis, C. & Kerren, A. (2016). Unshared Task: (Dis)agreement in Online Debates. In: Proceedings of the 3rd Workshop on Argument Mining (ArgMining '16) at ACL '16: . Paper presented at 3rd Workshop on Argument Mining (ArgMining '16), Berlin, Germany, August 7-12, 2016 (pp. 154-159). Association for Computational Linguistics, Article ID W16-2818.
Open this publication in new window or tab >>Unshared Task: (Dis)agreement in Online Debates
2016 (English)In: Proceedings of the 3rd Workshop on Argument Mining (ArgMining '16) at ACL '16, Association for Computational Linguistics, 2016, p. 154-159, article id W16-2818Conference paper, Published paper (Refereed)
Abstract [en]

Topic-independent expressions for conveying agreement and disagreement were annotated in a corpus of web forum debates, in order to evaluate a classifier trained to detect these two categories. Among the 175 expressions annotated in the evaluation set, 163 were unique, which shows that there is large variation in expressions used. This variation might be one of the reasons why the task of automatically detecting the categories was difficult. F-scores of 0.44 and 0.37 were achieved by a classifier trained on 2,000 debate sentences for detecting sentence-level agreement and disagreement.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2016
Keywords
argumentation mining, online debates, classifier, agreement, disagreement, stance, corpus, annotation
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:lnu:diva-55116 (URN)
Conference
3rd Workshop on Argument Mining (ArgMining '16), Berlin, Germany, August 7-12, 2016
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2016-08-02 Created: 2016-08-02 Last updated: 2018-01-10Bibliographically approved
Kucher, K., Schamp-Bjerede, T., Kerren, A., Paradis, C. & Sahlgren, M. (2016). Visual Analysis of Online Social Media to Open Up the Investigation of Stance Phenomena. Information Visualization, 15(2), 93-116
Open this publication in new window or tab >>Visual Analysis of Online Social Media to Open Up the Investigation of Stance Phenomena
Show others...
2016 (English)In: Information Visualization, ISSN 1473-8716, E-ISSN 1473-8724, Vol. 15, no 2, p. 93-116Article in journal (Refereed) Published
Abstract [en]

Online social media are a perfect text source for stance analysis. Stance in human communication is concerned with speaker attitudes, beliefs, feelings and opinions. Expressions of stance are associated with the speakers' view of what they are talking about and what is up for discussion and negotiation in the intersubjective exchange. Taking stance is thus crucial for the social construction of meaning. Increased knowledge of stance can be useful for many application fields such as business intelligence, security analytics, or social media monitoring. In order to process large amounts of text data for stance analyses, linguists need interactive tools to explore the textual sources as well as the processed data based on computational linguistics techniques. Both original texts and derived data are important for refining the analyses iteratively. In this work, we present a visual analytics tool for online social media text data that can be used to open up the investigation of stance phenomena. Our approach complements traditional linguistic analysis techniques and is based on the analysis of utterances associated with two stance categories: sentiment and certainty. Our contributions include (1) the description of a novel web-based solution for analyzing the use and patterns of stance meanings and expressions in human communication over time; and (2) specialized techniques used for visualizing analysis provenance and corpus overview/navigation. We demonstrate our approach by means of text media on a highly controversial scandal with regard to expressions of anger and provide an expert review from linguists who have been using our tool.

Place, publisher, year, edition, pages
Sage Publications, 2016
Keywords
Visual analytics, visualization, text visualization, interaction, time-series, stance analysis, sentiment analysis, text analytics, visual linguistics, online social media, text and document data
National Category
Computer Sciences Human Computer Interaction Language Technology (Computational Linguistics)
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-41676 (URN)10.1177/1473871615575079 (DOI)000371645100001 ()2-s2.0-84964050221 (Scopus ID)
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2015-04-02 Created: 2015-04-02 Last updated: 2018-02-06Bibliographically approved
Kucher, K., Kerren, A., Paradis, C. & Sahlgren, M. (2016). Visual Analysis of Text Annotations for Stance Classification with ALVA. In: Tobias Isenberg & Filip Sadlo (Ed.), EuroVis Posters 2016: . Paper presented at The 18th EG/VGTC Conference on Visualization (EuroVis '16), Groningen, The Netherlands, 6-10 June,2016 (pp. 49-51). Eurographics - European Association for Computer Graphics
Open this publication in new window or tab >>Visual Analysis of Text Annotations for Stance Classification with ALVA
2016 (English)In: EuroVis Posters 2016 / [ed] Tobias Isenberg & Filip Sadlo, Eurographics - European Association for Computer Graphics, 2016, p. 49-51Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

The automatic detection and classification of stance taking in text data using natural language processing and machine learning methods create an opportunity to gain insight about the writers’ feelings and attitudes towards their own and other people’s utterances. However, this task presents multiple challenges related to the training data collection as well as the actual classifier training. In order to facilitate the process of training a stance classifier, we propose a visual analytics approach called ALVA for text data annotation and visualization. Our approach supports the annotation process management and supplies annotators with a clean user interface for labeling utterances with several stance categories. The analysts are provided with a visualization of stance annotations which facilitates the analysis of categories used by the annotators. ALVA is already being used by our domain experts in linguistics and computational linguistics in order to improve the understanding of stance phenomena and to build a stance classifier for applications such as social media monitoring. 

Place, publisher, year, edition, pages
Eurographics - European Association for Computer Graphics, 2016
Keywords
Visualization, Text visualization, Interaction, Text annotations, Stance analysis, NLP, Text analytics
National Category
Computer Sciences Human Computer Interaction Language Technology (Computational Linguistics)
Research subject
Computer Science, Information and software visualization; Computer and Information Sciences Computer Science, Computer Science; Humanities, Linguistics
Identifiers
urn:nbn:se:lnu:diva-52287 (URN)10.2312/eurp.20161139 (DOI)9783038680154 (ISBN)
Conference
The 18th EG/VGTC Conference on Visualization (EuroVis '16), Groningen, The Netherlands, 6-10 June,2016
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2016-04-28 Created: 2016-04-28 Last updated: 2018-01-10Bibliographically approved
Paradis, C. & Hommerberg, C. (2016). We drink with our eyes first: The web of sensory perceptions, aesthetic experiences and mixed imagery in wine reviews. In: Raymond W. Gibbs Jr. (Ed.), Mixing metaphor: (pp. 179-201). Amsterdam & Philadelpia: John Benjamins Publishing Company
Open this publication in new window or tab >>We drink with our eyes first: The web of sensory perceptions, aesthetic experiences and mixed imagery in wine reviews
2016 (English)In: Mixing metaphor / [ed] Raymond W. Gibbs Jr., Amsterdam & Philadelpia: John Benjamins Publishing Company, 2016, p. 179-201Chapter in book (Refereed)
Abstract [en]

This chapter analyzes the language resources that writers have at their disposal to describe their experience of the web of sensory perceptions that are evoked in the wine tasting practice. The task of the writer is to provide a mental understanding of the sensations as well as a prehension of the experiences. We show that this involves the weaving together of the senses, starting with the sight of the wine, followed by a description that is iconic with the wine tasting procedure. The descriptors are systematically used cross-modally both through ontological crossovers and through longer stretches of mixed imageryWe also show how the socio-cultural context of wine consumption correlates with the types of imagery used in wine descriptions.

Place, publisher, year, edition, pages
Amsterdam & Philadelpia: John Benjamins Publishing Company, 2016
Series
Metaphor in Language, Cognition & Communication, ISSN 2210-4836 ; 6
National Category
Specific Languages
Research subject
Humanities, English
Identifiers
urn:nbn:se:lnu:diva-34013 (URN)10.1075/milcc.6.09par (DOI)9789027202109 (ISBN)
Available from: 2014-04-27 Created: 2014-04-27 Last updated: 2018-01-11Bibliographically approved
Skeppstedt, M., Schamp-Bjerede, T., Sahlgren, M., Paradis, C. & Kerren, A. (2015). Detecting Speculations, Contrasts and Conditionals in Consumer Reviews. In: Alexandra Balahur, Erik van der Goot, Piek Vossen, and Andrés Montoyo (Ed.), Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA '15): Short Paper Track. Paper presented at 6th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA '15), Lisbon, Portugal, 2015 (pp. 162-168). Association for Computational Linguistics
Open this publication in new window or tab >>Detecting Speculations, Contrasts and Conditionals in Consumer Reviews
Show others...
2015 (English)In: Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA '15): Short Paper Track / [ed] Alexandra Balahur, Erik van der Goot, Piek Vossen, and Andrés Montoyo, Association for Computational Linguistics , 2015, p. 162-168Conference paper, Published paper (Refereed)
Abstract [en]

A support vector classifier was compared to a lexicon-based approach for the task of detecting the stance categories speculation, contrast and conditional in English consumer reviews. Around 3,000 training instances were required to achieve a stable performance of an F-score of 90 for speculation. This outperformed the lexicon-based approach, for which an F-score of just above 80 was achieved. The machine learning results for the other two categories showed a lower average (an approximate F-score of 60 for contrast and 70 for conditional), as well as a larger variance, and were only slightly better than lexicon matching. Therefore, while machine learning was successful for detecting speculation, a well-curated lexicon might be a more suitable approach for detecting contrast and conditional. 

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2015
Keywords
consumer reviews, support vector classifier, stance
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:lnu:diva-45649 (URN)978-1-941643-32-7 (ISBN)
Conference
6th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA '15), Lisbon, Portugal, 2015
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2015-08-10 Created: 2015-08-10 Last updated: 2018-01-11Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-7240-9003

Search in DiVA

Show all publications