lnu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 30) Show all publications
Lundberg, J., Nordqvist, J. & Laitinen, M. (2019). Towards a language independent Twitter bot detector. In: Navarretta Costanza et al. (Ed.), Proceedings of 4th Conference of The Association Digital Humanities in the Nordic Countries: Copenhagen, March 6-8 2019. Paper presented at 4th Conference of The Association Digital Humanities in the Nordic Countries, Copenhagen, March 6-8 2019. Copenhagen: Digital Humanities in the Nordic countries
Open this publication in new window or tab >>Towards a language independent Twitter bot detector
2019 (English)In: Proceedings of 4th Conference of The Association Digital Humanities in the Nordic Countries: Copenhagen, March 6-8 2019 / [ed] Navarretta Costanza et al., Copenhagen: Digital Humanities in the Nordic countries , 2019Conference paper, Published paper (Refereed)
Abstract [en]

This article describes our work in developing an application that recognizes automatically generated tweets. The objective of this machine learning application is to increase data accuracy in sociolinguistic studies that utilize Twitter by reducing skewed sampling and inaccuracies in linguistic data. Most previous machine learning attempts to exclude bot material have been language dependent since they make use of monolingual Twitter text in their training phase. In this paper, we present a language independent approach which classifies each single tweet to be either autogenerated (AGT) or human-generated (HGT). We define an AGT as a tweet where all or parts of the natural language content is generated automatically by a bot or other type of program. In other words, while AGT/HGT refer to an individual message, the term bot refers to non-personal and automated accounts that post content to online social networks. Our approach classifies a tweet using only metadata that comes with every tweet, and we utilize those metadata parameters that are both language and country independent. The empirical part shows good success rates. Using a bilingual training set of Finnish and Swedish tweets, we correctly classified about 98.2% of all tweets in a test set using a third language (English).

Place, publisher, year, edition, pages
Copenhagen: Digital Humanities in the Nordic countries, 2019
Keywords
Twitter, bots, bot detection, supervised machine learning
National Category
General Language Studies and Linguistics Computer and Information Sciences
Research subject
Humanities, English; Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-81663 (URN)
Conference
4th Conference of The Association Digital Humanities in the Nordic Countries, Copenhagen, March 6-8 2019
Available from: 2019-04-04 Created: 2019-04-04 Last updated: 2019-04-29Bibliographically approved
Lincke, A., Lundberg, J., Thunander, M., Milrad, M., Lundberg, J. & Jusufi, I. (2018). Diabetes Information in Social Media. In: Karsten Klein, Yi-Na Li, and Andreas Kerren (Ed.), Proceedings of the 11th International Symposium on Visual Information Communication and Interaction (VINCI '18): . Paper presented at 11th International Symposium on Visual Information Communication and Interaction (VINCI '18), 13-15 August 2018, Växjö, Sweden (pp. 104-105). ACM Publications
Open this publication in new window or tab >>Diabetes Information in Social Media
Show others...
2018 (English)In: Proceedings of the 11th International Symposium on Visual Information Communication and Interaction (VINCI '18) / [ed] Karsten Klein, Yi-Na Li, and Andreas Kerren, ACM Publications, 2018, p. 104-105Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Social media platforms have created new ways for people to communicate and express themselves. Thus, it is important to explore how e-health related information is generated and disseminated in these platforms. The aim of our current efforts is to investigate the content and flow of information when people in Sweden use Twitter to talk about diabetes related issues. To achieve our goals, we have used data mining and visualization techniques in order to explore, analyze and cluster Twitter data we have collected during a period of 10 months. Our initial results indicate that patients use Twitter to share diabetes related information and to communicate about their disease as an alternative way that complements the traditional channels used by health care professionals.

Place, publisher, year, edition, pages
ACM Publications, 2018
Keywords
Social media, Twitter data analysis, diabetes, visualization
National Category
Computer Sciences Human Computer Interaction
Research subject
Computer Science, Information and software visualization; Computer and Information Sciences Computer Science, Computer Science; Computer and Information Sciences Computer Science, Media Technology; Health and Caring Sciences, Health Informatics
Identifiers
urn:nbn:se:lnu:diva-78214 (URN)10.1145/3231622.3232508 (DOI)978-1-4503-6501-7 (ISBN)
Conference
11th International Symposium on Visual Information Communication and Interaction (VINCI '18), 13-15 August 2018, Växjö, Sweden
Available from: 2018-10-09 Created: 2018-10-09 Last updated: 2019-06-05Bibliographically approved
Laitinen, M., Lundberg, J., Levin, M. & Martins, R. M. (2018). The Nordic Tweet Stream: A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data. In: Eetu Mäkelä, Mikko Tolonen, Jouni Tuominen (Ed.), DHN 2018 Digital Humanities in the Nordic Countries 3rd Conference: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, March 7-9, 2018. Paper presented at Digital Humanities in the Nordic Countries 3rd Conference, Helsinki, Finland, March 7-9, 2018 (pp. 349-362). CEUR-WS.org
Open this publication in new window or tab >>The Nordic Tweet Stream: A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data
2018 (English)In: DHN 2018 Digital Humanities in the Nordic Countries 3rd Conference: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, March 7-9, 2018 / [ed] Eetu Mäkelä, Mikko Tolonen, Jouni Tuominen, CEUR-WS.org , 2018, p. 349-362Conference paper, Published paper (Refereed)
Abstract [en]

This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer scientists and a group of sociolinguists interestedin language variability and in the global spread of English. Our research integratestwo types of empirical data: We not only rely on traditional structured corpusdata but also use unstructured data sources that are often big and rich inmetadata, such as Twitter streams. The NTS downloads tweets and associatedmetadata from Denmark, Finland, Iceland, Norway and Sweden. We first introducesome technical aspects in creating a dynamic real-time monitor corpus, andthe following case study illustrates how the corpus could be used as empiricalevidence in sociolinguistic studies focusing on the global spread of English tomultilingual settings. The results show that English is the most frequently usedlanguage, accounting for almost a third. These results can be used to assess howwidespread English use is in the Nordic region and offer a big data perspectivethat complement previous small-scale studies. The future objectives include annotatingthe material, making it available for the scholarly community, and expandingthe geographic scope of the data stream outside Nordic region.

Place, publisher, year, edition, pages
CEUR-WS.org, 2018
Series
CEUR Workshop Proceedings, ISSN 1613-0073 ; 2084
Keywords
Real-time language data, Nordic Tweet Stream, Twitter
National Category
General Language Studies and Linguistics Specific Languages
Research subject
Humanities, English
Identifiers
urn:nbn:se:lnu:diva-78277 (URN)2-s2.0-85045342911 (Scopus ID)
Conference
Digital Humanities in the Nordic Countries 3rd Conference, Helsinki, Finland, March 7-9, 2018
Projects
DISA
Available from: 2018-10-11 Created: 2018-10-11 Last updated: 2019-05-24Bibliographically approved
Alissandrakis, A., Reski, N., Laitinen, M., Tyrkkö, J., Levin, M. & Lundberg, J. (2018). Visualizing dynamic text corpora using Virtual Reality. In: ICAME 39 : Tampere, 30 May – 3 June, 2018: Corpus Linguistics and Changing Society : Book of Abstracts. Paper presented at The 39th Annual Conference of the International Computer Archive for Modern and Medieval English (ICAME39): Corpus Linguistics and Changing Society. Tampere, 30 May - 3 June, 2018 (pp. 205-205). Tampere: University of Tampere
Open this publication in new window or tab >>Visualizing dynamic text corpora using Virtual Reality
Show others...
2018 (English)In: ICAME 39 : Tampere, 30 May – 3 June, 2018: Corpus Linguistics and Changing Society : Book of Abstracts, Tampere: University of Tampere , 2018, p. 205-205Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

In recent years, data visualization has become a major area in Digital Humanities research, and the same holds true also in linguistics. The rapidly increasing size of corpora, the emergence of dynamic real-time streams, and the availability of complex and enriched metadata have made it increasingly important to facilitate new and innovative approaches to presenting and exploring primary data. This demonstration showcases the uses of Virtual Reality (VR) in the visualization of geospatial linguistic data using data from the Nordic Tweet Stream (NTS) project (see Laitinen et al 2017). The NTS data for this demonstration comprises a full year of geotagged tweets (12,443,696 tweets from 273,648 user accounts) posted within the Nordic region (Denmark, Finland, Iceland, Norway, and Sweden). The dataset includes over 50 metadata parameters in addition to the tweets themselves.

We demonstrate the potential of using VR to efficiently find meaningful patterns in vast streams of data. The VR environment allows an easy overview of any of the features (textual or metadata) in a text corpus. Our focus will be on the language identification data, which provides a previously unexplored perspective into the use of English and other non-indigenous languages in the Nordic countries alongside the native languages of the region.

Our VR prototype utilizes the HTC Vive headset for a room-scale VR scenario, and it is being developed using the Unity3D game development engine. Each node in the VR space is displayed as a stacked cuboid, the equivalent of a bar chart in a three-dimensional space, summarizing all tweets at one geographic location for a given point in time (see: https://tinyurl.com/nts-vr). Each stacked cuboid represents information of the three most frequently used languages, appropriately color coded, enabling the user to get an overview of the language distribution at each location. The VR prototype further encourages users to move between different locations and inspect points of interest in more detail (overall location-related information, a detailed list of all languages detected, the most frequently used hashtags). An underlying map outlines country borders and facilitates orientation. In addition to spatial movement through the Nordic areas, the VR system provides an interface to explore the Twitter data based on time (days, weeks, months, or time of predefined special events), which enables users to explore data over time (see: https://tinyurl.com/nts-vr-time).

In addition to demonstrating how the VR methods aid data visualization and exploration, we will also briefly discuss the pedagogical implications of using VR to showcase linguistic diversity.

Place, publisher, year, edition, pages
Tampere: University of Tampere, 2018
Keywords
virtual reality, Nordic Tweet Stream, digital humanities
National Category
General Language Studies and Linguistics Human Computer Interaction Language Technology (Computational Linguistics)
Research subject
Computer Science, Information and software visualization; Humanities, Linguistics
Identifiers
urn:nbn:se:lnu:diva-75064 (URN)
Conference
The 39th Annual Conference of the International Computer Archive for Modern and Medieval English (ICAME39): Corpus Linguistics and Changing Society. Tampere, 30 May - 3 June, 2018
Projects
DISA-DHOpen Data Exploration in Virtual Reality (ODxVR)
Available from: 2018-06-05 Created: 2018-06-05 Last updated: 2018-07-23Bibliographically approved
Laitinen, M., Lundberg, J., Levin, M. & Lakaw, A. (2017). Revisiting weak ties: Using present-day social media data in variationist studies. In: Tanja Säily, Minna Palander-Collin, Arja Nurmi, Anita Auer (Ed.), Exploring Future Paths for Historical Sociolinguistics: (pp. 303-325). Amsterdam: John Benjamins Publishing Company
Open this publication in new window or tab >>Revisiting weak ties: Using present-day social media data in variationist studies
2017 (English)In: Exploring Future Paths for Historical Sociolinguistics / [ed] Tanja Säily, Minna Palander-Collin, Arja Nurmi, Anita Auer, Amsterdam: John Benjamins Publishing Company, 2017, p. 303-325Chapter in book (Refereed)
Abstract [en]

This article makes use of big and rich present-day data to revisit the social network model in sociolinguistics. This model predicts that mobile individuals with ties outside a home community and subsequent loose-knit networks tend to promote the diffusion of linguistic innovations. The model has been applied to a range of small ethnographic networks. We use a database of nearly 200,000 informants who send micro-blog messages in Twitter. We operationalize networks using two ratio variables; one of them is a truly weak tie and the other one a slightly stronger one. The results show that there is a straightforward increase of innovative behavior in the truly weak tie network, but the data indicate that innovations also spread under conditions of stronger networks, given that the network size is large enough. On the methodological level, our approach opens up new horizons in using big and often freely available data in sociolinguistics, both past and present.

Place, publisher, year, edition, pages
Amsterdam: John Benjamins Publishing Company, 2017
Series
Advances in historical sociolinguistics, ISSN 2214-1057 ; 7
Keywords
Big data, social networks, weak tie model
National Category
Specific Languages
Research subject
Humanities, English
Identifiers
urn:nbn:se:lnu:diva-68501 (URN)10.1075/ahs.7.12lai (DOI)9789027200860 (ISBN)
Projects
DISA-DH
Available from: 2017-10-30 Created: 2017-10-30 Last updated: 2018-05-17Bibliographically approved
Laitinen, M., Lundberg, J., Levin, M. & Lakaw, A. (2017). Utilizing Multilingual Language Data in (Nearly) Real Time: The Case of the Nordic Tweet Stream. Journal of universal computer science (Online), 23(11), 1038-1056
Open this publication in new window or tab >>Utilizing Multilingual Language Data in (Nearly) Real Time: The Case of the Nordic Tweet Stream
2017 (English)In: Journal of universal computer science (Online), ISSN 0948-695X, E-ISSN 0948-6968, Vol. 23, no 11, p. 1038-1056Article in journal (Refereed) Published
Abstract [en]

This paper presents the Nordic Tweet Stream, a cross-disciplinary digital humanities project that downloads Twitter messages from Denmark, Finland, Iceland, Norway and Sweden. The paper first introduces some of the technical aspects in creating a real-time monitor corpus that grows every day, and then two case studies illustrate how the corpus could be used as empirical evidence in studies focusing on the global spread of English. Our approach in the case studies is sociolinguistic, and we are interested in how widespread multilingualism which involves English is in the region, and what happens to ongoing grammatical change in digital environments. The results are based on 6.6 million tweets collected during the first four months of data streaming. They show that English was the most frequently used language, accounting for almost a third. This indicates that Nordic Twitter users choose English as a means of reaching wider audiences. The preference for English is the strongest in Denmark and the weakest in Finland. Tweeting mostly occurs late in the evening, and high-profile media events such as the Eurovision Song Contest produce considerable peaks in Twitter activity. The prevalent use of informal features such as univerbated verb forms (e.g., gotta for (HAVE) got to) supports previous findings of the speech-like nature of written Twitter data, but the results indicate that tweeters are pushing the limits even further.

Keywords
Twitter, corpus linguistics, language choice, oral discourse style
National Category
Computer and Information Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:lnu:diva-73133 (URN)000429070900004 ()2-s2.0-85045033557 (Scopus ID)
Available from: 2018-04-20 Created: 2018-04-20 Last updated: 2019-05-28Bibliographically approved
Iftikhar, M. U., Lundberg, J. & Weyns, D. (2016). A Model Interpreter for Timed Automata. In: Leveraging Applications of Formal Methods, Verification and Validation: Foundational Techniques, PT I. Paper presented at 7th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation (ISoLA), OCT 10-14, 2016, Corfu, GREECE (pp. 243-258). Springer
Open this publication in new window or tab >>A Model Interpreter for Timed Automata
2016 (English)In: Leveraging Applications of Formal Methods, Verification and Validation: Foundational Techniques, PT I, Springer, 2016, p. 243-258Conference paper, Published paper (Refereed)
Abstract [en]

In the model-centric approach to model-driven development, the models used are sufficiently detailed to be executed. Being able to execute the model directly, without any intermediate model-to-code translation, has a number of advantages. The model is always up-to-date and runtime updates of the model are possible. This paper presents a model interpreter for timed automata, a formalism often used for modeling and verification of real-time systems. The model interpreter supports real-time system features like simultaneous execution, system wide signals, a ticking clock, and time constraints. Many existing formal representations can be verified, and many existing DSMLs can be executed. It is the combination of being both verifiable and executable that makes our approach rather unique.

Place, publisher, year, edition, pages
Springer, 2016
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9952
Keywords
Model-driven development, Model interpretation, Timed automata, Virtual machine
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-59817 (URN)10.1007/978-3-319-47166-2_17 (DOI)000389939100017 ()2-s2.0-84993972025 (Scopus ID)978-3-319-47166-2 (ISBN)978-3-319-47165-5 (ISBN)
Conference
7th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation (ISoLA), OCT 10-14, 2016, Corfu, GREECE
Available from: 2017-01-16 Created: 2017-01-13 Last updated: 2018-05-17Bibliographically approved
Schordan, M., Beyer, D. & Lundberg, J. (2016). Evaluation and Reproducibility of Program Analysis and Verification (Track Introduction). In: Leveraging applications of formal methods, verification and validation: foundational techniques, pt I. Paper presented at 7th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, ISoLA, Corfu, Greece, 10 -14 October 2016 (pp. 191-194). Springer
Open this publication in new window or tab >>Evaluation and Reproducibility of Program Analysis and Verification (Track Introduction)
2016 (English)In: Leveraging applications of formal methods, verification and validation: foundational techniques, pt I, Springer, 2016, p. 191-194Conference paper, Published paper (Refereed)
Abstract [en]

Manual inspection of complex software is costly and error prone. Techniques and tools that do not require manual inspection are in dire need as our software systems grow at a rapid rate. This track is concerned with the methods of comparative evaluation of program analyses and the tools that implement them. It also addresses the question how program properties that have been verified can be represented such that they remain reproducible and reusable as intermediate results for other analyses and verification phases. In particular, it is of interest how different tools can be combined to achieve better results than with only one of those tools alone.

Place, publisher, year, edition, pages
Springer, 2016
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9952
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-67651 (URN)10.1007/978-3-319-47166-2_13 (DOI)000389939100013 ()978-3-319-47166-2 (ISBN)978-3-319-47165-5 (ISBN)
Conference
7th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, ISoLA, Corfu, Greece, 10 -14 October 2016
Available from: 2017-09-01 Created: 2017-09-01 Last updated: 2018-01-13Bibliographically approved
Hedenborg, M., Lundberg, J., Löwe, W. & Trapp, M. (2015). Approximating Context-Sensitive Program Information. In: Jens Knoop (Ed.), Proceedings Kolloquium Programmiersprachen (KPS 2015): . Paper presented at Programmiersprachen und Grundlagen der Programmierung KPS 2015.
Open this publication in new window or tab >>Approximating Context-Sensitive Program Information
2015 (English)In: Proceedings Kolloquium Programmiersprachen (KPS 2015) / [ed] Jens Knoop, 2015Conference paper, Published paper (Other academic)
Abstract [en]

Static program analysis is in general more precise if it is sensitive to execution contexts (execution paths). In this paper we propose χ-terms as a mean to capture and manipulate context-sensitive program information in a data-flow analysis. We introduce finite k-approximation and loop approximation that limit the size of the context-sensitive information. These approximated χ-terms form a lattice with a finite depth, thus guaranteeing every data-flow analysis to reach a fixed point. 

Keywords
Static program analysis, Data-flow analysis, Context-sensitive, χ-term
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-50722 (URN)
Conference
Programmiersprachen und Grundlagen der Programmierung KPS 2015
Available from: 2016-03-15 Created: 2016-03-15 Last updated: 2019-02-27Bibliographically approved
Trapp, M., Hedenborg, M., Lundberg, J. & Löwe, W. (2015). Capturing and Manipulating Context-sensitive Program Information. In: Wolf Zimmermann, Martin-Luther-Universität Halle-Wittenberg (Ed.), Software Engineering Workshops 2015: Gemeinsamer Tagungsband der Workshops der Tagung Software Engineering 2015, Dresden, 17.-18. März 2015. Paper presented at Software Engineering Workshops, 17-18 March 2015, Dresden (pp. 154-163). CEUR-WS.org, 1337
Open this publication in new window or tab >>Capturing and Manipulating Context-sensitive Program Information
2015 (English)In: Software Engineering Workshops 2015: Gemeinsamer Tagungsband der Workshops der Tagung Software Engineering 2015, Dresden, 17.-18. März 2015 / [ed] Wolf Zimmermann, Martin-Luther-Universität Halle-Wittenberg, CEUR-WS.org , 2015, Vol. 1337, p. 154-163Conference paper, Published paper (Refereed)
Abstract [en]

Designers of context-sensitive program analyses need to take special care of the memory consumption of the analysis results. In general, they need to sacrifice accuracy to cope with restricted memory resources. We introduce χ-terms as a general data structure to capture and manipulate context-sensitivity analysis results. A χ-term is a compact representation of arbitrary forward program analysis distinguishing the effects of different control-flow paths. While χ-terms can be represented by trees, we propose a memory efficient representation generalizing ordered binary decision diagrams (OBDDs).

Place, publisher, year, edition, pages
CEUR-WS.org, 2015
Series
CEUR workshop proceedings, ISSN 1613-0073
Keywords
Context-sensitive, SSA-graph, BDD, χ-term, Shannon-expansion
National Category
Computer Sciences
Identifiers
urn:nbn:se:lnu:diva-41040 (URN)2-s2.0-84924326216 (Scopus ID)
Conference
Software Engineering Workshops, 17-18 March 2015, Dresden
Available from: 2015-03-20 Created: 2015-03-20 Last updated: 2019-08-15Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-9775-4594

Search in DiVA

Show all publications