lnu.sePublications
Change search
Refine search result
1 - 44 of 44
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Ahltorp, Magnus
    et al.
    Stockholm.
    Skeppstedt, Maria
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai, Stockholm.
    Kitajima, Shiho
    Hokkaido Univ, Japan.
    Henriksson, Aron
    Stockholm University.
    Rzepka, Rafal
    Hokkaido Univ, Japan.
    Araki, Kenji
    Hokkaido Univ, Japan.
    Expansion of medical vocabularies using distributional semantics on Japanese patient blogs2016In: Journal of Biomedical Semantics, ISSN 2041-1480, E-ISSN 2041-1480, Vol. 7, article id 58Article in journal (Refereed)
    Abstract [en]

    Background: Research on medical vocabulary expansion from large corpora has primarily been conducted using text written in English or similar languages, due to a limited availability of large biomedical corpora in most languages. Medical vocabularies are, however, essential also for text mining from corpora written in other languages than English and belonging to a variety of medical genres. The aim of this study was therefore to evaluate medical vocabulary expansion using a corpus very different from those previously used, in terms of grammar and orthographics, as well as in terms of text genre. This was carried out by applying a method based on distributional semantics to the task of extracting medical vocabulary terms from a large corpus of Japanese patient blogs. Methods: Distributional properties of terms were modelled with random indexing, followed by agglomerative hierarchical clustering of 3x100 seed terms from existing vocabularies, belonging to three semantic categories: Medical Finding, Pharmaceutical Drug and Body Part. By automatically extracting unknown terms close to the centroids of the created clusters, candidates for new terms to include in the vocabulary were suggested. The method was evaluated for its ability to retrieve the remaining n terms in existing medical vocabularies. Results: Removing case particles and using a context window size of 1 + 1 was a successful strategy for Medical Finding and Pharmaceutical Drug, while retaining case particles and using a window size of 8 + 8 was better for Body Part. For a 10n long candidate list, the use of different cluster sizes affected the result for Pharmaceutical Drug, while the effect was only marginal for the other two categories. For a list of top n candidates for Body Part, however, clusters with a size of up to two terms were slightly more useful than larger clusters. For Pharmaceutical Drug, the best settings resulted in a recall of 25 % for a candidate list of top n terms and a recall of 68 % for top 10n. For a candidate list of top 10n candidates, the second best results were obtained for Medical Finding: a recall of 58 %, compared to 46 % for Body Part. Only taking the top n candidates into account, however, resulted in a recall of 23 % for Body Part, compared to 16 % for Medical Finding. Conclusions: Different settings for corpus pre-processing, window sizes and cluster sizes were suitable for different semantic categories and for different lengths of candidate lists, showing the need to adapt parameters, not only to the language and text genre used, but also to the semantic category for which the vocabulary is to be expanded. The results show, however, that the investigated choices for pre-processing and parameter settings were successful, and that a Japanese blog corpus, which in many ways differs from those used in previous studies, can be a useful resource for medical vocabulary expansion.

  • 2.
    Alfalahi, Alyaa
    et al.
    Stockholm University.
    Skeppstedt, Maria
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai AB, Sweden.
    Ahlblom, Rickard
    Stockholm University.
    Baskalayci, Roza
    Stockholm University.
    Henriksson, Aron
    Stockholm University.
    Asker, Lars
    Stockholm University.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Expanding a Dictionary of Marker Words for Uncertainty and Negation Using Distributional Semantics2015In: Proceedings of the 6th International Workshop on Health Text Mining and Information Analysis (Louhi '15): Short Paper Track / [ed] Cyril Grouin, Thierry Hamon, Aurélie Névéol, and Pierre Zweigenbaum, Association for Computational Linguistics , 2015, p. 90-96Conference paper (Refereed)
    Abstract [en]

    Approaches to determining the factuality of diagnoses and findings in clinical text tend to rely on dictionaries of marker words for uncertainty and negation. Here, a method for semi-automatically expanding a dictionary of marker words using distributional semantics is presented and evaluated. It is shown that ranking candidates for inclusion according to their proximity to cluster centroids of semantically similar seed words is more successful than ranking them according to proximity to each individual seed word. 

  • 3. Cerrato, Loredana
    et al.
    Ekeklint, Susanne
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
    Evaluating users reactions to human-like interfaces: Prosodic and paralinguistic features as new evaluation measures for users’ satisfaction2002In: From brows to trust: evaluating embodied conversational agents, Kluwer academic publishers , 2002, , p. 101-124Chapter in book (Other academic)
    Abstract [en]

    An increasing number of dialogue systems are deployed to provide

    public services in our everyday lives. They are becoming more

    service-minded and several of them provide different channels for

    interaction. The rationale is to make automatic services available in

    new environments and more attractive to use. From a developer

    perspective, this affects the complexity of the requirements

    elicitation activity, as new combinations and variations in end-user

    interaction need to considered. The aim of our investigation is to

    propose new parameters and metrics to evaluate multimodal dialogue

    systems endowed with embodied conversational agents (ECAs). These new

    metrics focus on the users, rather than on the system. Our assumption

    is that the intentional use of prosodic variation and the production

    of communicative non-verbal behaviour by users can give an indication

    of their attitude towards the system and might also help to evaluate

    the users' overall experience of the interaction. To test our

    hypothesis we carried out analyses on different Swedish corpora of

    interactions between users and multimodal dialogue systems. We

    analysed the prosodic variation in the way the users ended their

    interactions with the system and we observed the production of

    non-verbal communicative expressions by users. Our study supports the

    idea that the observation of users' prosodic variation and production

    of communicative non-verbal behaviour during the interaction with

    dialogue systems could be used as an indication of whether or not the

    users are satisfied with the system performance.

  • 4. Eryigit, Gülsen
    et al.
    Nivre, Joakim
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering. Datalogi.
    Oflazer, Kemal
    Dependency Parsing of Turkish2008In: Computational Linguistics, ISSN 0891-2017, Vol. 34, no 3, p. 357-389Article in journal (Refereed)
  • 5.
    Galvao, Gabriela
    Växjö University, Faculty of Humanities and Social Sciences, School of Humanities.
    Linguistic interference in translated academic texts:: A case study of Portuguese interference in abstracts translated into English2009Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    AbstractThis study deals with linguistic interference in abstracts of scientific papers translated fromPortuguese into English collected from the online scientific database SciELO. The aim of thisstudy is to analyze linguistic interference phenomena in 50 abstracts from the field ofhumanities, history, social sciences, technology and natural sciences. The types ofinterference discussed are syntactic/grammatical, lexical/semantic and pragmatic interference.This study is mainly qualitative. Therefore, the qualitative method was used, in order to findout what kinds of interference phenomena occur in the abstracts, analyze the possible reasonsfor their occurrence and present some suggestions to avoid the problems discussed. Besides, aquantitative analysis was carried out to interpret the results (figures and percentages) of thestudy. The analysis is aimed at providing some guidance for future translations. This studyconcluded that translations from a Romance language (in this case Portuguese) into aGermanic language (English) tend to be more objective and/or sometimes lose originalmeanings attributed in the source text. Another important finding was that abstracts from thehumanities, history and social sciences present more cases of interference phenomena than theones belonging to technology and natural sciences. These findings imply that many abstractswithin these areas have high probability to be subject to the phenomena discussed and,consequently, have parts of their original meaning lost or misinterpreted in the target texts.Keywords: abstracts, bilingualism, cross-linguistic influence, linguistic interference, linguistictransfer, non-native speakers of English, Portuguese-English interference, source text, targettext, translation.

  • 6.
    Hall, Johan
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
    MaltParser -- An Architecture for Inductive Labeled Dependency Parsing2006Licentiate thesis, monograph (Other academic)
    Abstract [en]

    This licentiate thesis presents a software architecture for inductive labeled dependency parsing of unrestricted natural language text, which achieves a strict modularization of parsing algorithm, feature model and learning method such that these parameters can be varied independently. The architecture is based on the theoretical framework of inductive dependency parsing by Nivre \citeyear{nivre06c} and has been realized in MaltParser, a system that supports several parsing algorithms and learning methods, for which complex feature models can be defined in a special description language. Special attention is given in this thesis to learning methods based on support vector machines (SVM).

    The implementation is validated in three sets of experiments using data from three languages (Chinese, English and Swedish). First, we check if the implementation realizes the underlying architecture. The experiments show that the MaltParser system outperforms the baseline and satisfies the basic constraints of well-formedness. Furthermore, the experiments show that it is possible to vary parsing algorithm, feature model and learning method independently. Secondly, we focus on the special properties of the SVM interface. It is possible to reduce the learning and parsing time without sacrificing accuracy by dividing the training data into smaller sets, according to the part-of-speech of the next token in the current parser configuration. Thirdly, the last set of experiments present a broad empirical study that compares SVM to memory-based learning (MBL) with five different feature models, where all combinations have gone through parameter optimization for both learning methods. The study shows that SVM outperforms MBL for more complex and lexicalized feature models with respect to parsing accuracy. There are also indications that SVM, with a splitting strategy, can achieve faster parsing than MBL. The parsing accuracy achieved is the highest reported for the Swedish data set and very close to the state of the art for Chinese and English.

  • 7.
    Hall, Johan
    et al.
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
    Nilsson, Jens
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
    CoNLL-X SharedTask: Multi-lingual Dependency Parsing2006Report (Other academic)
    Abstract [en]

    The goal of this report is to summarize our experiments and present the final result of our participation in the CoNLL-X Shared Task 2006. The topic of this year's shared task was multi-lingual dependency parsing.

    The organizers have prepared 13 existing dependency treebanks so that they all comply to the same markup format. The training and test data for the languages differ in size, granularity and quality, but they have tried to even out differences in the markup format. No additional information is allowed to be used besides the provided training data, forcing the parser to be fully automatic and data-driven. Ideally, the same parser should be trainable for all languages, possibly by adjusting parameters.

    The main goal is to assign labeled dependency structure for all languages on held out test data, approximately 5 000 tokens for each language. The main metric for comparison of the different parsers of the participants is therefore labeled attachment score.

  • 8.
    Hall, Johan
    et al.
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
    Nilsson, Jens
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
    Converting Dependency Treebanks to MALT-XML2005Report (Other academic)
    Abstract [en]

    In data-driven approaches to natural language processing, a common problem is the lack of data for many languages. Within the project Stochastic Dependency Grammars for Natural Language Parsing at Växjö University, we (Joakim Nivre, Johan Hall and Jens Nilsson) are developing a deterministic data-driven dependency parser, which is language independent. In this project we intend to enlarge the data resources for our parser. For the moment, we have only tested our parser on small Swedish treebank converted to dependency structure, and on English using Penn Treebank converted to dependency trees. Since we do not have more Swedish dependency treebanks at hand, we want to broaden our view towards treebanks for other languages, especially the bigger ones, to investigate the influence of data size. Primarily, we are focusing on the Danish Dependency Treebank (DDT) and the Prague Dependency Treebank (PDT). These treebanks are not in a format that we can use for our parser and therefore we have to convert them to MALT-XML, a format which our parser can handle.

  • 9.
    Kucher, Kostiantyn
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Paradis, Carita
    Lund University.
    Sahlgren, Magnus
    Gavagai AB.
    Methodology and Applications of Visual Stance Analysis: An Interactive Demo2016In: International Symposium on Digital Humanities, Växjö 7-8 November 2016: Book of Abstracts, Linnaeus University , 2016, p. 56-57Conference paper (Refereed)
    Abstract [en]

    Analysis of stance in textual data can reveal the attitudes of speakers, ranging from general agreement/disagreement with other speakers to fine-grained indications of wishes and emotions. The implementation of an automatic stance classifier and corresponding visualization techniques facilitates the analysis of human communication and social media texts. Furthermore, scholars in Digital Humanities could also benefit from such an approach by applying it for literature studies. For example, a researcher could explore the usage of such stance categories as certainty or prediction in a novel. Analysis of such abstract categories in longer texts would be complicated or even impossible with simpler tools such as regular expression search.

    Our research on automatic and visual stance analysis is concerned with multiple theoretical and practical challenges in linguistics, computational linguistics, and information visualization. In this interactive demo, we demonstrate our web-based visual analytics system called ALVA, which is designed to support the text data annotation and stance classifier training stages. 

  • 10.
    Kucher, Kostiantyn
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Paradis, Carita
    Lund University.
    Sahlgren, Magnus
    Gavagai AB.
    Visual Analysis of Stance Markers in Online Social Media2014In: Poster Abstracts of IEEE VIS 2014, 2014Conference paper (Refereed)
    Abstract [en]

    Stance in human communication is a linguistic concept relating to expressions of subjectivity such as the speakers’ attitudes and emotions. Taking stance is crucial for the social construction of meaning and can be useful for many application fields such as business intelligence, security analytics, or social media monitoring. In order to process large amounts of text data for stance analyses, linguists need interactive tools to explore the textual sources as well as the results of computational linguistics techniques. Both aspects are important for refining the analyses iteratively. In this work, we present a visual analytics tool for online social media text data and corresponding time-series that can be used to investigate stance phenomena and to refine the so-called stance markers collection. 

  • 11.
    Kucher, Kostiantyn
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Paradis, Carita
    Lund University .
    Sahlgren, Magnus
    Gavagai AB.
    Visual Analysis of Text Annotations for Stance Classification with ALVA2016In: EuroVis Posters 2016 / [ed] Tobias Isenberg & Filip Sadlo, Eurographics - European Association for Computer Graphics, 2016, p. 49-51Conference paper (Refereed)
    Abstract [en]

    The automatic detection and classification of stance taking in text data using natural language processing and machine learning methods create an opportunity to gain insight about the writers’ feelings and attitudes towards their own and other people’s utterances. However, this task presents multiple challenges related to the training data collection as well as the actual classifier training. In order to facilitate the process of training a stance classifier, we propose a visual analytics approach called ALVA for text data annotation and visualization. Our approach supports the annotation process management and supplies annotators with a clean user interface for labeling utterances with several stance categories. The analysts are provided with a visualization of stance annotations which facilitates the analysis of categories used by the annotators. ALVA is already being used by our domain experts in linguistics and computational linguistics in order to improve the understanding of stance phenomena and to build a stance classifier for applications such as social media monitoring. 

  • 12.
    Kucher, Kostiantyn
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science.
    DoSVis: Document Stance Visualization2018In: Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP '18) / [ed] Alexandru C. Telea, Andreas Kerren, and José Braz, SciTePress, 2018, Vol. 3, p. 168-175Conference paper (Refereed)
    Abstract [en]

    Text visualization techniques often make use of automatic text classification methods. One of such methods is stance analysis, which is concerned with detecting various aspects of the writer’s attitude towards utterances expressed in the text. Existing text visualization approaches for stance classification results are usually adapted to textual data consisting of individual utterances or short messages, and they are often designed for social media or debate monitoring tasks. In this paper, we propose a visualization approach called DoSVis (Document Stance Visualization) that focuses instead on individual text documents of a larger length. DoSVis provides an overview of multiple stance categories detected by our classifier at the utterance level as well as a detailed text view annotated with classification results, thus supporting both distant and close reading tasks. We describe our approach by discussing several application scenarios involving business reports and works of literature. 

  • 13.
    Kucher, Kostiantyn
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science.
    Paradis, Carita
    Lund University.
    Sahlgren, Magnus
    Swedish Research Institute (RISE SICS).
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science.
    Active Learning and Visual Analytics for Stance Classification with ALVA2017In: ACM Transactions on Interactive Intelligent Systems (TiiS), ISSN 2160-6455, Vol. 7, no 3, article id 14Article in journal (Refereed)
    Abstract [en]

    The automatic detection and classification of stance (e.g., certainty or agreement) in text data using natural language processing and machine learning methods create an opportunity to gain insight into the speakers' attitudes towards their own and other people's utterances. However, identifying stance in text presents many challenges related to training data collection and classifier training. In order to facilitate the entire process of training a stance classifier, we propose a visual analytics approach, called ALVA, for text data annotation and visualization. ALVA's interplay with the stance classifier follows an active learning strategy in order to select suitable candidate utterances for manual annotation. Our approach supports annotation process management and provides the annotators with a clean user interface for labeling utterances with multiple stance categories. ALVA also contains a visualization method to help analysts of the annotation and training process gain a better understanding of the categories used by the annotators. The visualization uses a novel visual representation, called CatCombos, which groups individual annotation items by the combination of stance categories. Additionally, our system makes a visualization of a vector space model available that is itself based on utterances. ALVA is already being used by our domain experts in linguistics and computational linguistics in order to improve the understanding of stance phenomena and to build a stance classifier for applications such as social media monitoring.

  • 14.
    Kucher, Kostiantyn
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Schamp-Bjerede, Teri
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Paradis, Carita
    Lund University.
    Sahlgren, Magnus
    Gavagai AB.
    Visual Analysis of Online Social Media to Open Up the Investigation of Stance Phenomena2016In: Information Visualization, ISSN 1473-8716, E-ISSN 1473-8724, Vol. 15, no 2, p. 93-116Article in journal (Refereed)
    Abstract [en]

    Online social media are a perfect text source for stance analysis. Stance in human communication is concerned with speaker attitudes, beliefs, feelings and opinions. Expressions of stance are associated with the speakers' view of what they are talking about and what is up for discussion and negotiation in the intersubjective exchange. Taking stance is thus crucial for the social construction of meaning. Increased knowledge of stance can be useful for many application fields such as business intelligence, security analytics, or social media monitoring. In order to process large amounts of text data for stance analyses, linguists need interactive tools to explore the textual sources as well as the processed data based on computational linguistics techniques. Both original texts and derived data are important for refining the analyses iteratively. In this work, we present a visual analytics tool for online social media text data that can be used to open up the investigation of stance phenomena. Our approach complements traditional linguistic analysis techniques and is based on the analysis of utterances associated with two stance categories: sentiment and certainty. Our contributions include (1) the description of a novel web-based solution for analyzing the use and patterns of stance meanings and expressions in human communication over time; and (2) specialized techniques used for visualizing analysis provenance and corpus overview/navigation. We demonstrate our approach by means of text media on a highly controversial scandal with regard to expressions of anger and provide an expert review from linguists who have been using our tool.

  • 15.
    Martins, Rafael Messias
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Simaki, Vasiliki
    Linnaeus University, Faculty of Technology, Department of Computer Science. Lund University.
    Kucher, Kostiantyn
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    StanceXplore: Visualization for the Interactive Exploration of Stance in Social Media2017Conference paper (Refereed)
    Abstract [en]

    The use of interactive visualization techniques in Digital Humanities research can be a useful addition when traditional automated machine learning techniques face difficulties, as is often the case with the exploration of large volumes of dynamic—and in many cases, noisy and conflicting—textual data from social media. Recently, the field of stance analysis has been moving from a predominantly binary approach—either pro or con—to a multifaceted one, where each unit of text may be classified as one (or more) of multiple possible stance categories. This change adds more layers of complexity to an already hard problem, but also opens up new opportunities for obtaining richer and more relevant results from the analysis of stancetaking in social media. In this paper we propose StanceXplore, a new visualization for the interactive exploration of stance in social media. Our goal is to offer DH researchers the chance to explore stance-classified text corpora from different perspectives at the same time, using coordinated multiple views including user-defined topics, content similarity and dissimilarity, and geographical and temporal distribution. As a case study, we explore the activity of Twitter users in Sweden, analyzing their behavior in terms of topics discussed and the stances taken. Each textual unit (tweet) is labeled with one of eleven stance categories from a cognitive-functional stance framework based on recent work. We illustrate how StanceXplore can be used effectively to investigate multidimensional patterns and trends in stance-taking related to cultural events, their geographical distribution, and the confidence of the stance classifier. 

  • 16. Megyesi, Beata
    et al.
    Dahlqvist, Bengt
    Pettersson, Eva
    Gustafson-Capkova, Sofia
    Nivre, Joakim
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering. Datalogi.
    Supporting Research Environment for Less Explored Languages: A Case Study of Swedish and Turkish2008In: Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein, Acta Universitatis Upsaliensis, Uppsala , 2008, p. 96-110Chapter in book (Other (popular science, discussion, etc.))
  • 17.
    Memeti, Suejb
    Linnaeus University, Faculty of Science and Engineering, School of Computer Science, Physics and Mathematics.
    Automatic Java Code Generator for Regular Expressions and Finite Automata2012Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
  • 18.
    Nilsson, Jens
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
    Tree Transformations in Inductive Dependency Parsing2007Licentiate thesis, monograph (Other academic)
    Abstract [en]

    This licentiate thesis deals with automatic syntactic analysis, or parsing, of natural languages. A parser constructs the syntactic analysis, which it learns by looking at correctly analyzed sentences, known as training data. The general topic concerns manipulations of the training data in order to improve the parsing accuracy.

    Several studies using constituency-based theories for natural languages in such automatic and data-driven syntactic parsing have shown that training data, annotated according to a linguistic theory, often needs to be adapted in various ways in order to achieve an adequate, automatic analysis. A linguistically sound constituent structure is not necessarily well-suited for learning and parsing using existing data-driven methods. Modifications to the constituency-based trees in the training data, and corresponding modifications to the parser output, have successfully been applied to increase the parser accuracy. The topic of this thesis is to investigate whether similar modifications in the form of tree transformations to training data, annotated with dependency-based structures, can improve accuracy for data-driven dependency parsers. In order to do this, two types of tree transformations are in focus in this thesis.

    %This is a topic that so far has been less studied.

    The first one concerns non-projectivity. The full potential of dependency parsing can only be realized if non-projective constructions are allowed, which pose a problem for projective dependency parsers. On the other hand, non-projective parsers tend, among other things, to be slower. In order to maintain the benefits of projective parsing, a tree transformation technique to recover non-projectivity while using a projective parser is presented here.

    The second type of transformation concerns linguistic phenomena that are possible but hard for a parser to learn, given a certain choice of dependency analysis. This study has concentrated on two such phenomena, coordination and verb groups, for which tree transformations are applied in order to improve parsing accuracy, in case the original structure does not coincide with a structure that is easy to learn.

    Empirical evaluations are performed using treebank data from various languages, and using more than one dependency parser. The results show that the benefit of these tree transformations used in preprocessing and postprocessing to a large extent is language, treebank and parser independent.

  • 19.
    Nilsson, Jens
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
    Tree Transformations in Inductive Dependency Parsing2007Licentiate thesis, monograph (Other academic)
    Abstract [en]

    This licentiate thesis deals with automatic syntactic analysis, or parsing, of natural languages. A parser constructs the syntactic analysis, which it learns by looking at correctly analyzed sentences, known as training data. The general topic concerns manipulations of the training data in order to improve the parsing accuracy.

    Several studies using constituency-based theories for natural languages in such automatic and data-driven syntactic parsing have shown that training data, annotated according to a linguistic theory, often needs to be adapted in various ways in order to achieve an adequate, automatic analysis. A linguistically sound constituent structure is not necessarily well-suited for learning and parsing using existing data-driven methods. Modifications to the constituency-based trees in the training data, and corresponding modifications to the parser output, have successfully been applied to increase the parser accuracy. The topic of this thesis is to investigate whether similar modifications in the form of tree transformations to training data, annotated with dependency-based structures, can improve accuracy for data-driven dependency parsers. In order to do this, two types of tree transformations are in focus in this thesis.

    The first one concerns non-projectivity. The full potential of dependency parsing can only be realized if non-projective constructions are allowed, which pose a problem for projective dependency parsers. On the other hand, non-projective parsers tend, among other things, to be slower. In order to maintain the benefits of projective parsing, a tree transformation technique to recover non-projectivity while using a projective parser is presented here.

    The second type of transformation concerns linguistic phenomena that are possible but hard for a parser to learn, given a certain choice of dependency analysis. This study has concentrated on two such phenomena, coordination and verb groups, for which tree transformations are applied in order to improve parsing accuracy, in case the original structure does not coincide with a structure that is easy to learn.

    Empirical evaluations are performed using treebank data from various languages, and using more than one dependency parser. The results show that the benefit of these tree transformations used in preprocessing and postprocessing to a large extent is language, treebank and parser independent.

  • 20.
    Nilsson, Jens
    et al.
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
    Hall, Johan
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
    Reconstruction of the Swedish Treebank Talbanken.2005Report (Other academic)
    Abstract [en]

    Data-driven parsing techniques have a number of advantages over rule-based parsing techniques, such as fast development time, broad-coverage and robustness. Treebanks, collections of syntactically annotated sentences, are important resources for data-driven parsers. When developing a parser for Swedish one needs a treebank containing Swedish sentences, but currently there is a lack of Swedish treebanks of substantial size. This holds for the other Nordic languages too, with Danish as an exception. The absence of Swedish treebanks is remarkable considering that two corpora of Swedish text augmented with syntactic annotation have been created, one as early as 1974 named Talbanken (Einarsson 1976), and another in the 80's named Syntag (Järborg 1980). Unfortunately, the annotation formats of these resources make them cumbersome to use for modern treebank tools and parsers. In a way, Sweden can be regarded as a pioneer in this area, but thereafter the work with creating new treebanks has decreased considerably.

  • 21.
    Nivre, Joakim
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering. datalogi.
    Algorithms for Deterministic Incremental Dependency Parsing2008In: Computational Linguistics, ISSN 0891-2017, Vol. 34, no 4, p. 513-553Article in journal (Refereed)
  • 22.
    Nivre, Joakim
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering. Datalogi.
    Inductive Dependency Parsing2006Book (Other (popular science, discussion, etc.))
    Abstract [en]

    This book provides an in-depth description of the framework of inductive dependency parsing, a methodology for robust and efficient syntactic analysis of unrestricted natural language text. This methodology is based on two essential components: dependency-based syntactic representations and a data-driven approach to syntactic parsing. More precisely, it is based on a deterministic parsing algorithm in combination with inductive machine learning to predict the next parser action.

    The book includes a theoretical analysis of all central models and algorithms, as well as a thorough empirical evaluation of memory-based dependency parsing, using data from Swedish and English. Offering the reader a one-stop reference to dependency-based parsing of natural language, it is intended for researchers and system developers in the language technology field, and is also suited for graduate or advanced undergraduate education.

  • 23.
    Nivre, Joakim
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering. Datalogi.
    Treebanks2008In: Corpus Linguistics: An International Handbook, Mouton de Gruyter, Berlin , 2008, p. 225-241Chapter in book (Other (popular science, discussion, etc.))
  • 24.
    Nivre, Joakim
    et al.
    Växjö University, Faculty of Mathematics/Science/Technology, School of Mathematics and Systems Engineering.
    Megyesi, Beata
    Gustafson-Capkova, Sofia
    Salomonsson, Filip
    Dahlqvist, Bengt
    Cultivating a Swedish Treebank2008In: Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein, Acta Universitatis Upsaliensis, Uppsala , 2008, p. 111-120Chapter in book (Other (popular science, discussion, etc.))
  • 25.
    Rahimi, Afshin
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Sahlgren, Magnus
    Gavagai AB.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Paradis, Carita
    Lund University.
    The StaViCTA Group Report for RepLab 2014: Reputation Dimensions Task2014In: Working Notes for CLEF 2014 Conference: Sheffield, UK, September 15-18, 2014 / [ed] Linda Cappellato, Nicola Ferro, Martin Halvey, Wessel Kraaij, CEUR-WS.org , 2014, p. 1519-1527Conference paper (Refereed)
    Abstract [en]

    In this paper we present our experiments on the RepLab 2014 Reputation Dimension task. RepLab is a competitive challenge for Reputation Management Systems. RepLab 2014’s reputation dimensions task focuses on categorization of Twitter messages with regard to standard reputation dimensions (such as performance, leadership, or innovation). Our approach only relies on the textual content of tweets and ignores both metadata and the content of URLs within tweets. We carried out several experiments focusing on different feature sets including bag of n-grams, distributional semantics features, and deep neural network representations. The results show that bag of bigram features with minimum frequency thresholding work quite well in reputation dimension task especially with regards to average F1 measure over all dimensions where two of our four submitted runs achieve highest and second highest scores. Our experiments also show that semi-supervised recursive autoencoders outperform other feature sets used in our experiments with regards to accuracy measure and is a promising subject of future research for improvements. 

  • 26.
    Rahman, Mofizur
    et al.
    Stockholm University.
    Asker, Lars
    Stockholm University.
    Skeppstedt, Maria
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Proposing distributional semantics as a tool for medical vocabulary expansion2015In: International Workshop on Embeddings and Semantics (IWES '15) / [ed] Parth Gupta, Rafael E. Banchs, and Paolo Rosso, 2015Conference paper (Refereed)
    Abstract [en]

    A tool that extends a given vocabulary by automatically extracting new term candidates from a corpus could facilitate vocabulary expansion, as well as ensure that extracted terms correspond to those actually used in a specific text genre. We here propose a user interface for such a tool, and evaluate the feasibility of using Random Indexing for positioning new term candidates in a given taxonomy. 

  • 27.
    Simaki, Vasiliki
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Lund University.
    Aravantinou, Christina
    Univ Patras, Greece.
    Mporas, Iosif
    Univ Hertfordshire,England.
    Kondyli, Marianna
    Univ Patras, Greece.
    Megalooikonomou, Vasileios
    Univ Patras, Greece.
    Sociolinguistic Features for Author Gender Identification: From Qualitative Evidence to Quantitative Analysis2017In: Journal of Quantitative Linguistics, ISSN 0929-6174, E-ISSN 1744-5035, Vol. 24, no 1, p. 65-84Article in journal (Refereed)
    Abstract [en]

    Theoretical and empirical studies prove the strong relationship between social factors and the individual linguistic attitudes. Different social categories, such as gender, age, education, profession and social status, are strongly related with the linguistic diversity of people's everyday spoken and written interaction. In this paper, sociolinguistic studies addressed to gender differentiation are overviewed in order to identify how various linguistic characteristics differ between women and men. Thereafter, it is examined if and how these qualitative features can become quantitative metrics for the task of gender identification from texts on web blogs. The evaluation results showed that the "syntactic complexity", the "tag questions", the "period length", the "adjectives" and the "vocabulary richness" characteristics seem to be significantly distinctive with respect to the author's gender.

  • 28.
    Simaki, Vasiliki
    et al.
    Lancaster University, UK.
    Panagiotis, Simakis
    XPLAIN, Greece.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM).
    Detection of Stance-Related Characteristics in Social Media Text2018Conference paper (Refereed)
  • 29.
    Simaki, Vasiliki
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science. Lund University.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science.
    Evaluating stance-annotated sentences from the Brexit Blog Corpus: A quantitative linguistic analysis2018In: ICAME Journal/International Computer Archive of Modern English, ISSN 0801-5775, E-ISSN 1502-5462, Vol. 42, no 1, p. 133-166Article in journal (Refereed)
    Abstract [en]

    This paper offers a formally driven quantitative analysis of stance-annotated sentences in the Brexit Blog Corpus (BBC). Our goal is to highlight linguistic features that determine the formal profiles of six stance categories (contrariety, hypotheticality, necessity, prediction, source of knowledge and uncertainty) in a subset of the BBC. The study has two parts: firstly, it examines a large number of formal linguistic features that occur in the sentences in order to describe the specific characteristics of each category, and secondly, it compares characteristics in the entire data set in order to determine linguistic similarities throughout the data set. We show that among the six stance categories in the corpus, contrariety and necessity are the most discriminative ones, with the former using longer sentences, more conjunctions, more repetitions and shorter forms than the sentences expressing other stances. The latter has longer lexical forms but shorter sentences, which are syntactically more complex. We show that stance in our data set is expressed in sentences with around 21 words per sentence. The sentences consist mainly of alphabetical characters forming a varied vocabulary without special forms, such as digits or special characters.

  • 30.
    Simaki, Vasiliki
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Lund University.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Stance Classification in Texts from Blogs on the 2016 British Referendum2017In: Speech and Computer: 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings / [ed] Alexey Karpov, Rodmonga Potapova, and Iosif Mporas, Springer International Publishing , 2017, p. 700-709Conference paper (Refereed)
    Abstract [en]

    The problem of identifying and correctly attributing speaker stance in human communication is addressed in this paper. The data set consists of political blogs dealing with the 2016 British referendum. A cognitive-functional framework is adopted with data annotated for six notional stance categories: concession/contrariness, hypotheticality, need/ requirement, prediction, source of knowledge, and uncertainty. We show that these categories can be implemented in a text classification task and automatically detected. To this end, we propose a large set of lexical and syntactic linguistic features. These features were tested and classification experiments were implemented using different algorithms. We achieved accuracy of up to 30% for the six-class experiments, which is not fully satisfactory. As a second step, we calculated the pair-wise combinations of the stance categories. The concession/contrariness and need/requirement binary classification achieved the best results with up to 71% accuracy. 

  • 31.
    Simaki, Vasiliki
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Lund University.
    Paradis, Carita
    Lund University.
    Skeppstedt, Maria
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Sahlgren, Magnus
    Swedish Research Institute (RISE SICS).
    Kucher, Kostiantyn
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Annotating speaker stance in discourse: the Brexit Blog Corpus2017In: Corpus linguistics and linguistic theory, ISSN 1613-7027, E-ISSN 1613-7035Article in journal (Refereed)
    Abstract [en]

    The aim of this study is to explore the possibility of identifying speaker stance in discourse, provide an analytical resource for it and an evaluation of the level of agreement across speakers. We also explore to what extent language users agree about what kind of stances are expressed in natural language use or whether their interpretations diverge. In order to perform this task, a comprehensive cognitive-functional framework of ten stance categories was developed based on previous work on speaker stance in the literature. A corpus of opinionated texts was compiled, the Brexit Blog Corpus (BBC). An analytical protocol and interface (ALVA) for the annotations was set up and the data were independently annotated by two annotators. The annotation procedure, the annotation agreements and the co-occurrence of more than one stance in the utterances are described and discussed. The careful, analytical annotation process has returned satisfactory inter- and intra-annotation agreement scores, resulting in a gold standard corpus, the final version of the BBC. 

  • 32.
    Simaki, Vasiliki
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Lund University.
    Simakis, Panagiotis
    XPLAIN, Greece.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Identifying the Authors' National Variety of English in Social Media Texts2017In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP 2017 / [ed] Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Ivelina Nikolova, and Irina Temnikova, Association for Computational Linguistics, 2017, p. 671-678Conference paper (Refereed)
    Abstract [en]

    In this paper, we present a study for the identification of authors’ national variety of English in texts from social media. In data from Facebook and Twitter, information about the author’s social profile is annotated, and the national English variety (US, UK, AUS, CAN, NNS) that each author uses is attributed. We tested four feature types: formal linguistic features, POS features, lexicon-based features related to the different varieties, and databased features from each English variety. We used various machine learning algorithms for the classification experiments, and we implemented a feature selection process. The classification accuracy achieved, when the 31 highest ranked features were used, was up to 77.32%. The experimental results are evaluated, and the efficacy of the ranked features discussed.

  • 33.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. University of Potsdam, Germany.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Stede, Manfred
    University of Potsdam, Germany.
    Automatic detection of stance towards vaccination in online discussion forums2017In: Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017) / [ed] Jitendra Jonnagaddala, Hong-Jie Dai, and Yung-Chun Chang, Association for Computational Linguistics, 2017, p. 1-8Conference paper (Refereed)
    Abstract [en]

    A classifier for automatic detection of stance towards vaccination in online forums was trained and evaluated. Debate posts from six discussion threads on the British parental website Mumsnet were manually annotated for stance against or for vaccination, or as undecided. A support vector machine, trained to detect the three classes, achieved a macro F-score of 0.44, while a macro F-score of 0.62 was obtained by the same type of classifier on the binary classification task of distinguishing stance against vaccination from stance for vaccination. These results show that vaccine stance detection in online forums is a difficult task, at least for the type of model investigated and for the relatively small training corpus that was used. Fu- ture work will therefore include an expansion of the training data and an evaluation of other types of classifiers and features. 

  • 34.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Potsdam University, Germany.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Stede, Manfred
    Potsdam University, Germany.
    Vaccine Hesitancy in Discussion Forums: Computer-Assisted Argument Mining with Topic Models2018Conference paper (Refereed)
    Abstract [en]

    Arguments used when vaccination is debated on Internet discussion forums might give us valuable insights into reasons behind vaccine hesitancy. In this study, we applied automatic topic modelling on a collection of 943 discussion posts in which vaccine was debated, and six distinct discussion topics were detected by the algorithm. When manually coding the posts ranked as most typical for these six topics, a set of semantically coherent arguments were identified for each extracted topic. This indicates that topic modelling is a useful method for automatically identifying vaccine-related discussion topics and for identifying debate posts where these topics are discussed. This functionality could facilitate manual coding of salient arguments, and thereby form an important component in a system for computer-assisted coding of vaccine-related discussions. 

  • 35.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Kucher, Kostiantyn
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Language Processing Components of the StaViCTA Project2017In: Proceedings of the Workshop on Logic and Algorithms in Computational Linguistics 2017 (LACompLing 2017) / [ed] Roussanka Loukanova and Kristina Liefke, Stockholm University ; KTH , 2017, p. 137-138Conference paper (Refereed)
    Abstract [en]

    The StaViCTA project is concerned with visualising the expression of stance in written text, and is therefore dependent on components for stance detection. These components are to (i) download and extract text from any HTML page and segment it into sentences, (ii) classify each sentence with respect to twelve different, notionally motivated, stance categories, and (iii) provide a RESTful HTTP API for communication with the visualisation components. The stance categories are certainty, uncertainty, contrast, recommendation, volition, prediction, agreement, disagreement, tact, rudeness, hypotheticality, and source of knowledge. 

  • 36.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Potsdam University, Germany.
    Kucher, Kostiantyn
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Stede, Manfred
    Potsdam University, Germany.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Topics2Themes: Computer-Assisted Argument Extraction by Visual Analysis of Important Topics2018Conference paper (Refereed)
    Abstract [en]

    The large collections of opinionated text that are continuously being created online, e.g., in the form of forum posts or tweets, contain arguments that might help us to better understand why opinions are held. While the task of manually extracting arguments from these large collections is an intractable one, a tool for computer-assisted extraction can (i) automatically select a subset of the text collection that contains re-occurring arguments to minimise the amount of text that the human coder has to read, and (ii) present the selected texts in a way that facilitates manual coding of arguments. We propose a tool called Topics2Themes that uses topic modelling to automatically extract important topics as well as the terms and texts most closely associated with each topic. We also provide a graphical user interface for manual argument coding, in which the user can search for arguments in the texts selected, create a theme for each type of argument detected and connect it to the texts in which it is found. Topics, terms, texts and themes are displayed as elements in four separate lists, and associations between the elements are visualised through connecting links. It is also possible to focus on one particular element through the sorting functionality provided, e.g., when a topic is selected, the terms, texts and themes associated with this topic are sorted as the top-ranked elements in their respective lists. The text collection can thereby be explored from different angles, which can be used to facilitate the argument coding and gain an overview and understanding of the arguments found in the texts. 

  • 37.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai AB.
    Paradis, Carita
    Lund University .
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Marker Words for Negation and Speculation in Health Records and Consumer Reviews2016In: Proceedings of the 7th International Symposium on Semantic Mining in Biomedicine (SMBM '16) / [ed] Mariana Neves, Fabio Rinaldi, Goran Nenadic, and Dietrich Rebholz-Schuhmann, CEUR-WS.org , 2016, Vol. 1650, p. 64-69Conference paper (Refereed)
    Abstract [en]

    Conditional random fields were trained to detect marker words for negation and speculation in two corpora belonging to two very different domains: clinical text and consumer review text. For the corpus of clinical text, marker words for speculation and negation were detected with results in line with previously reported interannotator agreement scores. This was also the case for speculation markers in the consumer review corpus, while detection of negation markers was unsuccessful in this genre. Also a setup in which models were trained on markers in consumer reviews, and applied on the clinical text genre, yielded low results. This shows that neither the trained models, nor the choice of appropriate machine learning algorithms and features, were transferable across the two text genres.

  • 38.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    PAL, a tool for Pre-annotation and Active Learning2016In: Journal for Language Technology and Computational Linguistics, ISSN 0175-1336, E-ISSN 2190-6858, Vol. 31, no 1, p. 81-100Article in journal (Refereed)
    Abstract [en]

    Many natural language processing systems rely on machine learning models that are trained on large amounts of manually annotated text data. The lack of sufficient amounts of annotated data is, however, a common obstacle for such systems, since manual annotation of text is often expensive and time-consuming.

    The aim of “PAL, a tool for Pre-annotation and Active Learning” is to provide a ready-made package that can be used to simplify annotation and to reduce the amount of annotated data required to train a machine learning classifier. The package provides support for two techniques that have been shown to be successful in previous studies, namely active learning and pre-annotation.

    The output of the pre-annotation is provided in the annotation format of the annotation tool BRAT, but PAL is a stand-alone package that can be adapted to other formats. 

  • 39.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai AB.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Sahlgren, Magnus
    Gavagai AB, Sweden.
    Finding Infrequent Phenomena in Large Corpora Using Distributional Semantics2015In: Symposium on Methods and Linguistic Theories (MaLT '15), Bamberg, Germany, 27-28 November 2015, 2015Conference paper (Refereed)
  • 40.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Sahlgren, Magnus
    Swedish Institute of Computer Science.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Active Learning for Detection of Stance Components2016In: Proceedings of the Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES '16) at COLING '16, Association for Computational Linguistics, 2016, p. 50-59Conference paper (Refereed)
    Abstract [en]

    Automatic detection of five language components, which are all relevant for expressing opinions and for stance taking, was studied: positive sentiment, negative sentiment, speculation, contrast and condition. A resource-aware approach was taken, which included manual annotation of 500 training samples and the use of limited lexical resources. Active learning was compared to random selection of training data, as well as to a lexicon-based method. Active learning was successful for the categories speculation, contrast and condition, but not for the two sentiment categories, for which results achieved when using active learning were similar to those achieved when applying a random selection of training data. This difference is likely due to a larger variation in how sentiment is expressed than in how speakers express the other three categories. This larger variation was also shown by the lower recall results achieved by the lexicon-based approach for sentiment than for the categories speculation, contrast and condition. 

  • 41.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai AB.
    Sahlgren, Magnus
    Gavagai AB.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Unshared Task: (Dis)agreement in Online Debates2016In: Proceedings of the 3rd Workshop on Argument Mining (ArgMining '16) at ACL '16, Association for Computational Linguistics, 2016, p. 154-159, article id W16-2818Conference paper (Refereed)
    Abstract [en]

    Topic-independent expressions for conveying agreement and disagreement were annotated in a corpus of web forum debates, in order to evaluate a classifier trained to detect these two categories. Among the 175 expressions annotated in the evaluation set, 163 were unique, which shows that there is large variation in expressions used. This variation might be one of the reasons why the task of automatically detecting the categories was difficult. F-scores of 0.44 and 0.37 were achieved by a classifier trained on 2,000 debate sentences for detecting sentence-level agreement and disagreement.

  • 42.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science. Gavagai AB.
    Schamp-Bjerede, Teri
    Lund University.
    Sahlgren, Magnus
    Gavagai AB, Sweden.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Detecting Speculations, Contrasts and Conditionals in Consumer Reviews2015In: Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA '15): Short Paper Track / [ed] Alexandra Balahur, Erik van der Goot, Piek Vossen, and Andrés Montoyo, Association for Computational Linguistics , 2015, p. 162-168Conference paper (Refereed)
    Abstract [en]

    A support vector classifier was compared to a lexicon-based approach for the task of detecting the stance categories speculation, contrast and conditional in English consumer reviews. Around 3,000 training instances were required to achieve a stable performance of an F-score of 90 for speculation. This outperformed the lexicon-based approach, for which an F-score of just above 80 was achieved. The machine learning results for the other two categories showed a lower average (an approximate F-score of 60 for contrast and 70 for conditional), as well as a larger variance, and were only slightly better than lexicon matching. Therefore, while machine learning was successful for detecting speculation, a well-curated lexicon might be a more suitable approach for detecting contrast and conditional. 

  • 43.
    Skeppstedt, Maria
    et al.
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Simaki, Vasiliki
    Linnaeus University, Faculty of Technology, Department of Computer Science. Lund University.
    Paradis, Carita
    Lund University.
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of Computer Science.
    Detection of Stance and Sentiment Modifiers in Political Blogs2017In: Speech and Computer: 19th International Conference, SPECOM 2017, Hatfield, UK, September 12-16, 2017, Proceedings / [ed] Alexey Karpov, Rodmonga Potapova, and Iosif Mporas, Springer International Publishing , 2017, p. 302-311Conference paper (Refereed)
    Abstract [en]

    The automatic detection of seven types of modifiers was studied: Certainty, Uncertainty, Hypotheticality, Prediction, Recommendation, Concession/Contrast and Source. A classifier aimed at detecting local cue words that signal the categories was the most successful method for five of the categories. For Prediction and Hypotheticality, however, better results were obtained with a classifier trained on tokens and bi-grams present in the entire sentence. Unsupervised cluster features were shown useful for the categories Source and Uncertainty, when a subset of the training data available was used. However, when all of the 2,095 sentences that had been actively selected and manually annotated were used as training data, the cluster features had a very limited effect. Some of the classification errors made by the models would be possible to avoid by extending the training data set, while other features and feature representations, as well as the incorporation of pragmatic knowledge, would be required for other error types. 

  • 44.
    Zimmer, Björn
    et al.
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science.
    Sahlgren, Magnus
    Swedi Swedish Research Institute (RISE SICS).
    Kerren, Andreas
    Linnaeus University, Faculty of Technology, Department of computer science and media technology (CM), Department of Computer Science.
    Visual Analysis of Relationships between Heterogeneous Networks and Texts: An Application on the IEEE VIS Publication Dataset2017In: Informatics, ISSN 2227-9709, Vol. 4, no 2, article id 11Article in journal (Refereed)
    Abstract [en]

    The visual exploration of large and complex network structures remains a challenge for many application fields. Moreover, a growing number of real world networks are multivariate and often interconnected with each other. Entities in a network may have relationships with elements of other related data sets, which do not necessarily have to be networks themselves, and these relationships may be defined by attributes that can vary greatly. In this work, we propose a comprehensive visual analytics approach that supports researchers to specify and subsequently explore attribute-based relationships across networks, text documents, and derived secondary data. Our approach provides an individual search functionality based on keywords and semantically similar terms over the entire text corpus to find related network nodes. For examining these nodes in the interconnected network views, we introduce a new interaction technique, called Hub2Go, which facilitates the navigation by guiding the user to the information of interest. To showcase our system, we use a large text corpus collected from research papers listed in the IEEE VIS publications dataset that consists of 2752 documents over a period of 25 years. Here, we analyze relationships between various heterogeneous networks, a Bag-of-Words index, and a word similarity matrix, all derived from the initial corpus and metadata. 

1 - 44 of 44
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf