lnu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 15) Show all publications
Skeppstedt, M., Kucher, K., Stede, M. & Kerren, A. (2018). Topics2Themes: Computer-Assisted Argument Extraction by Visual Analysis of Important Topics. In: : . Paper presented at 3rd Workshop on Visualization as Added Value in the Development, Use and Evaluation of Language Resources (VisLR III) at LREC '18, 12 May, 2018, Miyazaki, Japan.
Open this publication in new window or tab >>Topics2Themes: Computer-Assisted Argument Extraction by Visual Analysis of Important Topics
2018 (English)Conference paper, Published paper (Refereed)
Abstract [en]

The large collections of opinionated text that are continuously being created online, e.g., in the form of forum posts or tweets, contain arguments that might help us to better understand why opinions are held. While the task of manually extracting arguments from these large collections is an intractable one, a tool for computer-assisted extraction can (i) automatically select a subset of the text collection that contains re-occurring arguments to minimise the amount of text that the human coder has to read, and (ii) present the selected texts in a way that facilitates manual coding of arguments. We propose a tool called Topics2Themes that uses topic modelling to automatically extract important topics as well as the terms and texts most closely associated with each topic. We also provide a graphical user interface for manual argument coding, in which the user can search for arguments in the texts selected, create a theme for each type of argument detected and connect it to the texts in which it is found. Topics, terms, texts and themes are displayed as elements in four separate lists, and associations between the elements are visualised through connecting links. It is also possible to focus on one particular element through the sorting functionality provided, e.g., when a topic is selected, the terms, texts and themes associated with this topic are sorted as the top-ranked elements in their respective lists. The text collection can thereby be explored from different angles, which can be used to facilitate the argument coding and gain an overview and understanding of the arguments found in the texts. 

Keyword
argument extraction, topic modelling, text analysis, argument visualization, stance visualization, text visualization, information visualization, interaction
National Category
Language Technology (Computational Linguistics) Computer Sciences
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-70911 (URN)
Conference
3rd Workshop on Visualization as Added Value in the Development, Use and Evaluation of Language Resources (VisLR III) at LREC '18, 12 May, 2018, Miyazaki, Japan
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659Swedish Research Council, 2016-06681
Note

TO BE PUBLISHED!

Available from: 2018-02-14 Created: 2018-02-14 Last updated: 2018-02-15
Skeppstedt, M., Kerren, A. & Stede, M. (2018). Vaccine Hesitancy in Discussion Forums: Computer-Assisted Argument Mining with Topic Models. In: Adrien Ugon, Daniel Karlsson, Gunnar O. Klein, and Anne Moen (Ed.), Building Continents of Knowledge in Oceans of Data: The Future of Co-Created eHealth. Paper presented at 29th Medical Informatics Europe Conference (MIE '18), April 24-26, 2018, Gothenburg, Sweden (pp. 366-370). IOS Press
Open this publication in new window or tab >>Vaccine Hesitancy in Discussion Forums: Computer-Assisted Argument Mining with Topic Models
2018 (English)In: Building Continents of Knowledge in Oceans of Data: The Future of Co-Created eHealth / [ed] Adrien Ugon, Daniel Karlsson, Gunnar O. Klein, and Anne Moen, IOS Press, 2018, p. 366-370Conference paper, Published paper (Refereed)
Abstract [en]

Arguments used when vaccination is debated on Internet discussion forums might give us valuable insights into reasons behind vaccine hesitancy. In this study, we applied automatic topic modelling on a collection of 943 discussion posts in which vaccine was debated, and six distinct discussion topics were detected by the algorithm. When manually coding the posts ranked as most typical for these six topics, a set of semantically coherent arguments were identified for each extracted topic. This indicates that topic modelling is a useful method for automatically identifying vaccine-related discussion topics and for identifying debate posts where these topics are discussed. This functionality could facilitate manual coding of salient arguments, and thereby form an important component in a system for computer-assisted coding of vaccine-related discussions. 

Place, publisher, year, edition, pages
IOS Press, 2018
Series
Studies in Health Technology and Informatics, ISSN 0926-9630, E-ISSN 1879-8365 ; 247
Keyword
vaccine hesitancy, topic modelling, argument mining
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-70919 (URN)10.3233/978-1-61499-852-5-366 (DOI)978-1-61499-851-8 (ISBN)978-1-61499-852-5 (ISBN)
Conference
29th Medical Informatics Europe Conference (MIE '18), April 24-26, 2018, Gothenburg, Sweden
Projects
StaViCTA
Funder
Swedish Research Council, 2016-06681Swedish Research Council, 2012-5659
Available from: 2018-02-15 Created: 2018-02-15 Last updated: 2018-04-24
Simaki, V., Paradis, C., Skeppstedt, M., Sahlgren, M., Kucher, K. & Kerren, A. (2017). Annotating speaker stance in discourse: the Brexit Blog Corpus. Corpus linguistics and linguistic theory
Open this publication in new window or tab >>Annotating speaker stance in discourse: the Brexit Blog Corpus
Show others...
2017 (English)In: Corpus linguistics and linguistic theory, ISSN 1613-7027, E-ISSN 1613-7035Article in journal (Refereed) Epub ahead of print
Abstract [en]

The aim of this study is to explore the possibility of identifying speaker stance in discourse, provide an analytical resource for it and an evaluation of the level of agreement across speakers. We also explore to what extent language users agree about what kind of stances are expressed in natural language use or whether their interpretations diverge. In order to perform this task, a comprehensive cognitive-functional framework of ten stance categories was developed based on previous work on speaker stance in the literature. A corpus of opinionated texts was compiled, the Brexit Blog Corpus (BBC). An analytical protocol and interface (ALVA) for the annotations was set up and the data were independently annotated by two annotators. The annotation procedure, the annotation agreements and the co-occurrence of more than one stance in the utterances are described and discussed. The careful, analytical annotation process has returned satisfactory inter- and intra-annotation agreement scores, resulting in a gold standard corpus, the final version of the BBC. 

Keyword
text annotation, blog post texts, modality, evaluation, positioning
National Category
Language Technology (Computational Linguistics) General Language Studies and Linguistics
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-67319 (URN)10.1515/cllt-2016-0060 (DOI)
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Note

TO BE PUBLISHED!

Available from: 2017-08-21 Created: 2017-08-21 Last updated: 2018-02-27
Skeppstedt, M., Kerren, A. & Stede, M. (2017). Automatic detection of stance towards vaccination in online discussion forums. In: Jitendra Jonnagaddala, Hong-Jie Dai, and Yung-Chun Chang (Ed.), Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017): . Paper presented at 1st International Workshop on Digital Disease Detection using Social Media (DDDSM), Taipei, Taiwan, 27 November, 2017 (pp. 1-8). Association for Computational Linguistics
Open this publication in new window or tab >>Automatic detection of stance towards vaccination in online discussion forums
2017 (English)In: Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) / [ed] Jitendra Jonnagaddala, Hong-Jie Dai, and Yung-Chun Chang, Association for Computational Linguistics, 2017, p. 1-8Conference paper, Published paper (Refereed)
Abstract [en]

A classifier for automatic detection of stance towards vaccination in online forums was trained and evaluated. Debate posts from six discussion threads on the British parental website Mumsnet were manually annotated for stance against or for vaccination, or as undecided. A support vector machine, trained to detect the three classes, achieved a macro F-score of 0.44, while a macro F-score of 0.62 was obtained by the same type of classifier on the binary classification task of distinguishing stance against vaccination from stance for vaccination. These results show that vaccine stance detection in online forums is a difficult task, at least for the type of model investigated and for the relatively small training corpus that was used. Fu- ture work will therefore include an expansion of the training data and an evaluation of other types of classifiers and features. 

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2017
Keyword
stance, online forums, classifier, support vector machine, vaccination
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-68982 (URN)978-1-948087-07-0 (ISBN)
Conference
1st International Workshop on Digital Disease Detection using Social Media (DDDSM), Taipei, Taiwan, 27 November, 2017
Projects
StaViCTANavigating in streams of opinions
Funder
Swedish Research Council, 2016-06681Swedish Research Council, 2012-5659
Available from: 2017-11-24 Created: 2017-11-24 Last updated: 2018-02-09Bibliographically approved
Skeppstedt, M., Simaki, V., Paradis, C. & Kerren, A. (2017). Detection of Stance and Sentiment Modifiers in Political Blogs. In: Alexey Karpov, Rodmonga Potapova, and Iosif Mporas (Ed.), Speech and Computer: 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings. Paper presented at 19th International Conference on Speech and Computer (SPECOM '17), 12-16 September 2017, Hatfield, Hertfordshire, UK (pp. 302-311). Springer International Publishing
Open this publication in new window or tab >>Detection of Stance and Sentiment Modifiers in Political Blogs
2017 (English)In: Speech and Computer: 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings / [ed] Alexey Karpov, Rodmonga Potapova, and Iosif Mporas, Springer International Publishing , 2017, p. 302-311Conference paper, Published paper (Refereed)
Abstract [en]

The automatic detection of seven types of modifiers was studied: Certainty, Uncertainty, Hypotheticality, Prediction, Recommendation, Concession/Contrast and Source. A classifier aimed at detecting local cue words that signal the categories was the most successful method for five of the categories. For Prediction and Hypotheticality, however, better results were obtained with a classifier trained on tokens and bi-grams present in the entire sentence. Unsupervised cluster features were shown useful for the categories Source and Uncertainty, when a subset of the training data available was used. However, when all of the 2,095 sentences that had been actively selected and manually annotated were used as training data, the cluster features had a very limited effect. Some of the classification errors made by the models would be possible to avoid by extending the training data set, while other features and feature representations, as well as the incorporation of pragmatic knowledge, would be required for other error types. 

Place, publisher, year, edition, pages
Springer International Publishing, 2017
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 10458
Keyword
stance modifiers, sentiment modifiers, active learning, unsupervised features, resource-aware natural language processing
National Category
Language Technology (Computational Linguistics) Computer Sciences
Research subject
Computer Science, Information and software visualization; Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-64582 (URN)10.1007/978-3-319-66429-3_29 (DOI)978-3-319-66428-6 (ISBN)978-3-319-66429-3 (ISBN)
Conference
19th International Conference on Speech and Computer (SPECOM '17), 12-16 September 2017, Hatfield, Hertfordshire, UK
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2017-05-31 Created: 2017-05-31 Last updated: 2018-01-13Bibliographically approved
Skeppstedt, M., Kucher, K., Paradis, C. & Kerren, A. (2017). Language Processing Components of the StaViCTA Project. In: Roussanka Loukanova and Kristina Liefke (Ed.), Proceedings of the Workshop on Logic and Algorithms in Computational Linguistics 2017 (LACompLing 2017): . Paper presented at Workshop on Logic and Algorithms in Computational Linguistics (LACompLing '17), 16–19 August 2017, Stockholm, Sweden (pp. 137-138). Stockholm University ; KTH
Open this publication in new window or tab >>Language Processing Components of the StaViCTA Project
2017 (English)In: Proceedings of the Workshop on Logic and Algorithms in Computational Linguistics 2017 (LACompLing 2017) / [ed] Roussanka Loukanova and Kristina Liefke, Stockholm University ; KTH , 2017, p. 137-138Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

The StaViCTA project is concerned with visualising the expression of stance in written text, and is therefore dependent on components for stance detection. These components are to (i) download and extract text from any HTML page and segment it into sentences, (ii) classify each sentence with respect to twelve different, notionally motivated, stance categories, and (iii) provide a RESTful HTTP API for communication with the visualisation components. The stance categories are certainty, uncertainty, contrast, recommendation, volition, prediction, agreement, disagreement, tact, rudeness, hypotheticality, and source of knowledge. 

Place, publisher, year, edition, pages
Stockholm University ; KTH, 2017
Keyword
Annotation, stance, visualization, visual analytics, NLP, machine learning, classifier, tools
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science, Information and software visualization
Identifiers
urn:nbn:se:lnu:diva-66071 (URN)
Conference
Workshop on Logic and Algorithms in Computational Linguistics (LACompLing '17), 16–19 August 2017, Stockholm, Sweden
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2017-07-03 Created: 2017-07-03 Last updated: 2018-01-13Bibliographically approved
Skeppstedt, M., Sahlgren, M., Paradis, C. & Kerren, A. (2016). Active Learning for Detection of Stance Components. In: Proceedings of the Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES '16) at COLING '16: . Paper presented at Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES '16), Osaka, Japan, December 12, 2016 (pp. 50-59). Association for Computational Linguistics
Open this publication in new window or tab >>Active Learning for Detection of Stance Components
2016 (English)In: Proceedings of the Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES '16) at COLING '16, Association for Computational Linguistics, 2016, p. 50-59Conference paper, Published paper (Refereed)
Abstract [en]

Automatic detection of five language components, which are all relevant for expressing opinions and for stance taking, was studied: positive sentiment, negative sentiment, speculation, contrast and condition. A resource-aware approach was taken, which included manual annotation of 500 training samples and the use of limited lexical resources. Active learning was compared to random selection of training data, as well as to a lexicon-based method. Active learning was successful for the categories speculation, contrast and condition, but not for the two sentiment categories, for which results achieved when using active learning were similar to those achieved when applying a random selection of training data. This difference is likely due to a larger variation in how sentiment is expressed than in how speakers express the other three categories. This larger variation was also shown by the lower recall results achieved by the lexicon-based approach for sentiment than for the categories speculation, contrast and condition. 

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2016
Keyword
active learning, stance, sentiment, annotation, classifier
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:lnu:diva-57761 (URN)978-4-87974-723-5 (ISBN)
Conference
Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES '16), Osaka, Japan, December 12, 2016
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2016-11-01 Created: 2016-11-01 Last updated: 2018-01-13Bibliographically approved
Ahltorp, M., Skeppstedt, M., Kitajima, S., Henriksson, A., Rzepka, R. & Araki, K. (2016). Expansion of medical vocabularies using distributional semantics on Japanese patient blogs. Journal of Biomedical Semantics, 7, Article ID 58.
Open this publication in new window or tab >>Expansion of medical vocabularies using distributional semantics on Japanese patient blogs
Show others...
2016 (English)In: Journal of Biomedical Semantics, ISSN 2041-1480, E-ISSN 2041-1480, Vol. 7, article id 58Article in journal (Refereed) Published
Abstract [en]

Background: Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in most languages. Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English and belonging to a variety of medical genres. The aim of this study was therefore to evaluate medical vocabulary expansion using a corpus very different from those previously used, in terms of grammar and orthographics, as well as in terms of text genre. This was carried out by applying a method based on distributional semantics to the task of extracting medical vocabulary terms from a large corpus of Japanese patient blogs. Methods: Distributional properties of terms were modelled with random indexing, followed by agglomerative hierarchical clustering of 3x100 seed terms from existing vocabularies, belonging to three semantic categories: Medical Finding, Pharmaceutical Drug and Body Part. By automatically extracting unknown terms close to the centroids of the created clusters, candidates for new terms to include in the vocabulary were suggested. The method was evaluated for its ability to retrieve the remaining n terms in existing medical vocabularies. Results: Removing case particles and using a context window size of 1 + 1 was a successful strategy for Medical Finding and Pharmaceutical Drug, while retaining case particles and using a window size of 8 + 8 was better for Body Part. For a 10n long candidate list, the use of different cluster sizes affected the result for Pharmaceutical Drug, while the effect was only marginal for the other two categories. For a list of top n candidates for Body Part, however, clusters with a size of up to two terms were slightly more useful than larger clusters. For Pharmaceutical Drug, the best settings resulted in a recall of 25 % for a candidate list of top n terms and a recall of 68 % for top 10n. For a candidate list of top 10n candidates, the second best results were obtained for Medical Finding: a recall of 58 %, compared to 46 % for Body Part. Only taking the top n candidates into account, however, resulted in a recall of 23 % for Body Part, compared to 16 % for Medical Finding. Conclusions: Different settings for corpus pre-processing, window sizes and cluster sizes were suitable for different semantic categories and for different lengths of candidate lists, showing the need to adapt parameters, not only to the language and text genre used, but also to the semantic category for which the vocabulary is to be expanded. The results show, however, that the investigated choices for pre-processing and parameter settings were successful, and that a Japanese blog corpus, which in many ways differs from those used in previous studies, can be a useful resource for medical vocabulary expansion.

Keyword
Japanese language processing, Medical vocabulary expansion, Distributional semantics, Random indexing, Agglomerative hierarchical clustering
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-57652 (URN)10.1186/s13326-016-0093-x (DOI)000384592300001 ()2-s2.0-84992035077 (Scopus ID)
Available from: 2016-10-27 Created: 2016-10-27 Last updated: 2018-01-14Bibliographically approved
Skeppstedt, M., Paradis, C. & Kerren, A. (2016). Marker Words for Negation and Speculation in Health Records and Consumer Reviews. In: Mariana Neves, Fabio Rinaldi, Goran Nenadic, and Dietrich Rebholz-Schuhmann (Ed.), Proceedings of the 7th International Symposium on Semantic Mining in Biomedicine (SMBM '16): . Paper presented at 7th International Symposium on Semantic Mining in Biomedicine (SMBM '16), Potsdam, Germany, August 4-5, 2016 (pp. 64-69). CEUR-WS.org, 1650
Open this publication in new window or tab >>Marker Words for Negation and Speculation in Health Records and Consumer Reviews
2016 (English)In: Proceedings of the 7th International Symposium on Semantic Mining in Biomedicine (SMBM '16) / [ed] Mariana Neves, Fabio Rinaldi, Goran Nenadic, and Dietrich Rebholz-Schuhmann, CEUR-WS.org , 2016, Vol. 1650, p. 64-69Conference paper, Published paper (Refereed)
Abstract [en]

Conditional random fields were trained to detect marker words for negation and speculation in two corpora belonging to two very different domains: clinical text and consumer review text. For the corpus of clinical text, marker words for speculation and negation were detected with results in line with previously reported interannotator agreement scores. This was also the case for speculation markers in the consumer review corpus, while detection of negation markers was unsuccessful in this genre. Also a setup in which models were trained on markers in consumer reviews, and applied on the clinical text genre, yielded low results. This shows that neither the trained models, nor the choice of appropriate machine learning algorithms and features, were transferable across the two text genres.

Place, publisher, year, edition, pages
CEUR-WS.org, 2016
Series
CEUR Workshop Proceedings, ISSN 1613-0073 ; 1650
Keyword
marker words, health records, consumer reviews, corpus, machine learning, natural language processing
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:lnu:diva-55120 (URN)
Conference
7th International Symposium on Semantic Mining in Biomedicine (SMBM '16), Potsdam, Germany, August 4-5, 2016
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2016-08-03 Created: 2016-08-03 Last updated: 2018-01-10Bibliographically approved
Skeppstedt, M., Paradis, C. & Kerren, A. (2016). PAL, a tool for Pre-annotation and Active Learning. Journal for Language Technology and Computational Linguistics, 31(1), 81-100
Open this publication in new window or tab >>PAL, a tool for Pre-annotation and Active Learning
2016 (English)In: Journal for Language Technology and Computational Linguistics, ISSN 0175-1336, E-ISSN 2190-6858, Vol. 31, no 1, p. 81-100Article in journal (Refereed) Published
Abstract [en]

Many natural language processing systems rely on machine learning models that are trained on large amounts of manually annotated text data. The lack of sufficient amounts of annotated data is, however, a common obstacle for such systems, since manual annotation of text is often expensive and time-consuming.

The aim of “PAL, a tool for Pre-annotation and Active Learning” is to provide a ready-made package that can be used to simplify annotation and to reduce the amount of annotated data required to train a machine learning classifier. The package provides support for two techniques that have been shown to be successful in previous studies, namely active learning and pre-annotation.

The output of the pre-annotation is provided in the annotation format of the annotation tool BRAT, but PAL is a stand-alone package that can be adapted to other formats. 

Place, publisher, year, edition, pages
GSCL, 2016
Keyword
NLP, annotation, pre-annotation, active learning, machine learning, text data
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science, Information and software visualization; Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-63836 (URN)
Projects
StaViCTA
Funder
Swedish Research Council, 2012-5659
Available from: 2017-05-15 Created: 2017-05-15 Last updated: 2018-01-13Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-6164-7762

Search in DiVA

Show all publications