lnu.sePublications
Change search
Refine search result
1 - 15 of 15
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Ahltorp, Magnus
    et al.
    Stockholm.
    Skeppstedt, Maria
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai, Stockholm.
    Kitajima, Shiho
    Hokkaido Univ, Japan.
    Henriksson, Aron
    Stockholm University.
    Rzepka, Rafal
    Hokkaido Univ, Japan.
    Araki, Kenji
    Hokkaido Univ, Japan.
    Expansion of medical vocabularies using distributional semantics on Japanese patient blogs2016In: Journal of Biomedical Semantics, ISSN 2041-1480, E-ISSN 2041-1480, Vol. 7, article id 58Article in journal (Refereed)
    Abstract [en]

    Background: Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in most languages. Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English and belonging to a variety of medical genres. The aim of this study was therefore to evaluate medical vocabulary expansion using a corpus very different from those previously used, in terms of grammar and orthographics, as well as in terms of text genre. This was carried out by applying a method based on distributional semantics to the task of extracting medical vocabulary terms from a large corpus of Japanese patient blogs. Methods: Distributional properties of terms were modelled with random indexing, followed by agglomerative hierarchical clustering of 3x100 seed terms from existing vocabularies, belonging to three semantic categories: Medical Finding, Pharmaceutical Drug and Body Part. By automatically extracting unknown terms close to the centroids of the created clusters, candidates for new terms to include in the vocabulary were suggested. The method was evaluated for its ability to retrieve the remaining n terms in existing medical vocabularies. Results: Removing case particles and using a context window size of 1 + 1 was a successful strategy for Medical Finding and Pharmaceutical Drug, while retaining case particles and using a window size of 8 + 8 was better for Body Part. For a 10n long candidate list, the use of different cluster sizes affected the result for Pharmaceutical Drug, while the effect was only marginal for the other two categories. For a list of top n candidates for Body Part, however, clusters with a size of up to two terms were slightly more useful than larger clusters. For Pharmaceutical Drug, the best settings resulted in a recall of 25 % for a candidate list of top n terms and a recall of 68 % for top 10n. For a candidate list of top 10n candidates, the second best results were obtained for Medical Finding: a recall of 58 %, compared to 46 % for Body Part. Only taking the top n candidates into account, however, resulted in a recall of 23 % for Body Part, compared to 16 % for Medical Finding. Conclusions: Different settings for corpus pre-processing, window sizes and cluster sizes were suitable for different semantic categories and for different lengths of candidate lists, showing the need to adapt parameters, not only to the language and text genre used, but also to the semantic category for which the vocabulary is to be expanded. The results show, however, that the investigated choices for pre-processing and parameter settings were successful, and that a Japanese blog corpus, which in many ways differs from those used in previous studies, can be a useful resource for medical vocabulary expansion.

  • 2.
    Alfalahi, Alyaa
    et al.
    Stockholm University.
    Skeppstedt, Maria
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai AB, Sweden.
    Ahlblom, Rickard
    Stockholm University.
    Baskalayci, Roza
    Stockholm University.
    Henriksson, Aron
    Stockholm University.
    Asker, Lars
    Stockholm University.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Expanding a Dictionary of Marker Words for Uncertainty and Negation Using Distributional Semantics2015In: Proceedings of the 6th International Workshop on Health Text Mining and Information Analysis (Louhi '15): Short Paper Track / [ed] Cyril Grouin, Thierry Hamon, Aurélie Névéol, and Pierre Zweigenbaum, Association for Computational Linguistics , 2015, p. 90-96Conference paper (Refereed)
    Abstract [en]

    Approaches to determining the factuality of diagnoses and findings in clinical text tend to rely on dictionaries of marker words for uncertainty and negation. Here, a method for semi-automatically expanding a dictionary of marker words using distributional semantics is presented and evaluated. It is shown that ranking candidates for inclusion according to their proximity to cluster centroids of semantically similar seed words is more successful than ranking them according to proximity to each individual seed word. 

  • 3.
    Rahman, Mofizur
    et al.
    Stockholm University.
    Asker, Lars
    Stockholm University.
    Skeppstedt, Maria
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Proposing distributional semantics as a tool for medical vocabulary expansion2015In: International Workshop on Embeddings and Semantics (IWES '15) / [ed] Parth Gupta, Rafael E. Banchs, and Paolo Rosso, 2015Conference paper (Refereed)
    Abstract [en]

    A tool that extends a given vocabulary by automatically extracting new term candidates from a corpus could facilitate vocabulary expansion, as well as ensure that extracted terms correspond to those actually used in a specific text genre. We here propose a user interface for such a tool, and evaluate the feasibility of using Random Indexing for positioning new term candidates in a given taxonomy. 

  • 4.
    Simaki, Vasiliki
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Lund University.
    Paradis, Carita
    Lund University.
    Skeppstedt, Maria
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Sahlgren, Magnus
    Swedish Research Institute (RISE SICS).
    Kucher, Kostiantyn
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Annotating speaker stance in discourse: the Brexit Blog Corpus2017In: Corpus linguistics and linguistic theory, ISSN 1613-7027, E-ISSN 1613-7035Article in journal (Refereed)
    Abstract [en]

    The aim of this study is to explore the possibility of identifying speaker stance in discourse, provide an analytical resource for it and an evaluation of the level of agreement across speakers. We also explore to what extent language users agree about what kind of stances are expressed in natural language use or whether their interpretations diverge. In order to perform this task, a comprehensive cognitive-functional framework of ten stance categories was developed based on previous work on speaker stance in the literature. A corpus of opinionated texts was compiled, the Brexit Blog Corpus (BBC). An analytical protocol and interface (ALVA) for the annotations was set up and the data were independently annotated by two annotators. The annotation procedure, the annotation agreements and the co-occurrence of more than one stance in the utterances are described and discussed. The careful, analytical annotation process has returned satisfactory inter- and intra-annotation agreement scores, resulting in a gold standard corpus, the final version of the BBC. 

  • 5.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. University of Potsdam, Germany.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Stede, Manfred
    University of Potsdam, Germany.
    Automatic detection of stance towards vaccination in online discussion forums2017In: Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) / [ed] Jitendra Jonnagaddala, Hong-Jie Dai, and Yung-Chun Chang, Association for Computational Linguistics, 2017, p. 1-8Conference paper (Refereed)
    Abstract [en]

    A classifier for automatic detection of stance towards vaccination in online forums was trained and evaluated. Debate posts from six discussion threads on the British parental website Mumsnet were manually annotated for stance against or for vaccination, or as undecided. A support vector machine, trained to detect the three classes, achieved a macro F-score of 0.44, while a macro F-score of 0.62 was obtained by the same type of classifier on the binary classification task of distinguishing stance against vaccination from stance for vaccination. These results show that vaccine stance detection in online forums is a difficult task, at least for the type of model investigated and for the relatively small training corpus that was used. Fu- ture work will therefore include an expansion of the training data and an evaluation of other types of classifiers and features. 

  • 6.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science. Potsdam University, Germany.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science.
    Stede, Manfred
    Potsdam University, Germany.
    Vaccine Hesitancy in Discussion Forums: Computer-Assisted Argument Mining with Topic Models2018In: Building Continents of Knowledge in Oceans of Data: The Future of Co-Created eHealth / [ed] Adrien Ugon, Daniel Karlsson, Gunnar O. Klein, and Anne Moen, IOS Press, 2018, p. 366-370Conference paper (Refereed)
    Abstract [en]

    Arguments used when vaccination is debated on Internet discussion forums might give us valuable insights into reasons behind vaccine hesitancy. In this study, we applied automatic topic modelling on a collection of 943 discussion posts in which vaccine was debated, and six distinct discussion topics were detected by the algorithm. When manually coding the posts ranked as most typical for these six topics, a set of semantically coherent arguments were identified for each extracted topic. This indicates that topic modelling is a useful method for automatically identifying vaccine-related discussion topics and for identifying debate posts where these topics are discussed. This functionality could facilitate manual coding of salient arguments, and thereby form an important component in a system for computer-assisted coding of vaccine-related discussions. 

  • 7.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Kucher, Kostiantyn
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Language Processing Components of the StaViCTA Project2017In: Proceedings of the Workshop on Logic and Algorithms in Computational Linguistics 2017 (LACompLing 2017) / [ed] Roussanka Loukanova and Kristina Liefke, Stockholm University ; KTH , 2017, p. 137-138Conference paper (Refereed)
    Abstract [en]

    The StaViCTA project is concerned with visualising the expression of stance in written text, and is therefore dependent on components for stance detection. These components are to (i) download and extract text from any HTML page and segment it into sentences, (ii) classify each sentence with respect to twelve different, notionally motivated, stance categories, and (iii) provide a RESTful HTTP API for communication with the visualisation components. The stance categories are certainty, uncertainty, contrast, recommendation, volition, prediction, agreement, disagreement, tact, rudeness, hypotheticality, and source of knowledge. 

  • 8.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Potsdam University, Germany.
    Kucher, Kostiantyn
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Stede, Manfred
    Potsdam University, Germany.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Topics2Themes: Computer-Assisted Argument Extraction by Visual Analysis of Important Topics2018Conference paper (Refereed)
    Abstract [en]

    The large collections of opinionated text that are continuously being created online, e.g., in the form of forum posts or tweets, contain arguments that might help us to better understand why opinions are held. While the task of manually extracting arguments from these large collections is an intractable one, a tool for computer-assisted extraction can (i) automatically select a subset of the text collection that contains re-occurring arguments to minimise the amount of text that the human coder has to read, and (ii) present the selected texts in a way that facilitates manual coding of arguments. We propose a tool called Topics2Themes that uses topic modelling to automatically extract important topics as well as the terms and texts most closely associated with each topic. We also provide a graphical user interface for manual argument coding, in which the user can search for arguments in the texts selected, create a theme for each type of argument detected and connect it to the texts in which it is found. Topics, terms, texts and themes are displayed as elements in four separate lists, and associations between the elements are visualised through connecting links. It is also possible to focus on one particular element through the sorting functionality provided, e.g., when a topic is selected, the terms, texts and themes associated with this topic are sorted as the top-ranked elements in their respective lists. The text collection can thereby be explored from different angles, which can be used to facilitate the argument coding and gain an overview and understanding of the arguments found in the texts. 

  • 9.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai AB.
    Paradis, Carita
    Lund University .
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Marker Words for Negation and Speculation in Health Records and Consumer Reviews2016In: Proceedings of the 7th International Symposium on Semantic Mining in Biomedicine (SMBM '16) / [ed] Mariana Neves, Fabio Rinaldi, Goran Nenadic, and Dietrich Rebholz-Schuhmann, CEUR-WS.org , 2016, Vol. 1650, p. 64-69Conference paper (Refereed)
    Abstract [en]

    Conditional random fields were trained to detect marker words for negation and speculation in two corpora belonging to two very different domains: clinical text and consumer review text. For the corpus of clinical text, marker words for speculation and negation were detected with results in line with previously reported interannotator agreement scores. This was also the case for speculation markers in the consumer review corpus, while detection of negation markers was unsuccessful in this genre. Also a setup in which models were trained on markers in consumer reviews, and applied on the clinical text genre, yielded low results. This shows that neither the trained models, nor the choice of appropriate machine learning algorithms and features, were transferable across the two text genres.

  • 10.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    PAL, a tool for Pre-annotation and Active Learning2016In: Journal for Language Technology and Computational Linguistics, ISSN 0175-1336, E-ISSN 2190-6858, Vol. 31, no 1, p. 81-100Article in journal (Refereed)
    Abstract [en]

    Many natural language processing systems rely on machine learning models that are trained on large amounts of manually annotated text data. The lack of sufficient amounts of annotated data is, however, a common obstacle for such systems, since manual annotation of text is often expensive and time-consuming.

    The aim of “PAL, a tool for Pre-annotation and Active Learning” is to provide a ready-made package that can be used to simplify annotation and to reduce the amount of annotated data required to train a machine learning classifier. The package provides support for two techniques that have been shown to be successful in previous studies, namely active learning and pre-annotation.

    The output of the pre-annotation is provided in the annotation format of the annotation tool BRAT, but PAL is a stand-alone package that can be adapted to other formats. 

  • 11.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai AB.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Sahlgren, Magnus
    Gavagai AB, Sweden.
    Finding Infrequent Phenomena in Large Corpora Using Distributional Semantics2015In: Symposium on Methods and Linguistic Theories (MaLT '15), Bamberg, Germany, 27-28 November 2015, 2015Conference paper (Refereed)
  • 12.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Sahlgren, Magnus
    Swedish Institute of Computer Science.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Active Learning for Detection of Stance Components2016In: Proceedings of the Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES '16) at COLING '16, Association for Computational Linguistics, 2016, p. 50-59Conference paper (Refereed)
    Abstract [en]

    Automatic detection of five language components, which are all relevant for expressing opinions and for stance taking, was studied: positive sentiment, negative sentiment, speculation, contrast and condition. A resource-aware approach was taken, which included manual annotation of 500 training samples and the use of limited lexical resources. Active learning was compared to random selection of training data, as well as to a lexicon-based method. Active learning was successful for the categories speculation, contrast and condition, but not for the two sentiment categories, for which results achieved when using active learning were similar to those achieved when applying a random selection of training data. This difference is likely due to a larger variation in how sentiment is expressed than in how speakers express the other three categories. This larger variation was also shown by the lower recall results achieved by the lexicon-based approach for sentiment than for the categories speculation, contrast and condition. 

  • 13.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai AB.
    Sahlgren, Magnus
    Gavagai AB.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Unshared Task: (Dis)agreement in Online Debates2016In: Proceedings of the 3rd Workshop on Argument Mining (ArgMining '16) at ACL '16, Association for Computational Linguistics, 2016, p. 154-159, article id W16-2818Conference paper (Refereed)
    Abstract [en]

    Topic-independent expressions for conveying agreement and disagreement were annotated in a corpus of web forum debates, in order to evaluate a classifier trained to detect these two categories. Among the 175 expressions annotated in the evaluation set, 163 were unique, which shows that there is large variation in expressions used. This variation might be one of the reasons why the task of automatically detecting the categories was difficult. F-scores of 0.44 and 0.37 were achieved by a classifier trained on 2,000 debate sentences for detecting sentence-level agreement and disagreement.

  • 14.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai AB.
    Schamp-Bjerede, Teri
    Lund University.
    Sahlgren, Magnus
    Gavagai AB, Sweden.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Detecting Speculations, Contrasts and Conditionals in Consumer Reviews2015In: Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA '15): Short Paper Track / [ed] Alexandra Balahur, Erik van der Goot, Piek Vossen, and Andrés Montoyo, Association for Computational Linguistics , 2015, p. 162-168Conference paper (Refereed)
    Abstract [en]

    A support vector classifier was compared to a lexicon-based approach for the task of detecting the stance categories speculation, contrast and conditional in English consumer reviews. Around 3,000 training instances were required to achieve a stable performance of an F-score of 90 for speculation. This outperformed the lexicon-based approach, for which an F-score of just above 80 was achieved. The machine learning results for the other two categories showed a lower average (an approximate F-score of 60 for contrast and 70 for conditional), as well as a larger variance, and were only slightly better than lexicon matching. Therefore, while machine learning was successful for detecting speculation, a well-curated lexicon might be a more suitable approach for detecting contrast and conditional. 

  • 15.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Simaki, Vasiliki
    Linnaeus University, Faculty of Technology, Department of Computer Science. Lund University.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Detection of Stance and Sentiment Modifiers in Political Blogs2017In: Speech and Computer: 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings / [ed] Alexey Karpov, Rodmonga Potapova, and Iosif Mporas, Springer International Publishing , 2017, p. 302-311Conference paper (Refereed)
    Abstract [en]

    The automatic detection of seven types of modifiers was studied: Certainty, Uncertainty, Hypotheticality, Prediction, Recommendation, Concession/Contrast and Source. A classifier aimed at detecting local cue words that signal the categories was the most successful method for five of the categories. For Prediction and Hypotheticality, however, better results were obtained with a classifier trained on tokens and bi-grams present in the entire sentence. Unsupervised cluster features were shown useful for the categories Source and Uncertainty, when a subset of the training data available was used. However, when all of the 2,095 sentences that had been actively selected and manually annotated were used as training data, the cluster features had a very limited effect. Some of the classification errors made by the models would be possible to avoid by extending the training data set, while other features and feature representations, as well as the incorporation of pragmatic knowledge, would be required for other error types. 

1 - 15 of 15
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf