lnu.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 36) Show all publications
Hedenborg, M., Lundberg, J., Löwe, W. & Trapp, M. (2022). A Framework for Memory Efficient Context-Sensitive Program Analysis. Theory of Computing Systems, 66, 911-956
Open this publication in new window or tab >>A Framework for Memory Efficient Context-Sensitive Program Analysis
2022 (English)In: Theory of Computing Systems, ISSN 1432-4350, E-ISSN 1433-0490, Vol. 66, p. 911-956Article in journal (Refereed) Published
Abstract [en]

Static program analysis is in general more precise if it is sensitive to execution contexts (execution paths). But then it is also more expensive in terms of memory consumption. For languages with conditions and iterations, the number of contexts grows exponentially with the program size. This problem is not just a theoretical issue. Several papers evaluating inter-procedural context-sensitive data-flow analysis report severe memory problems, and the path-explosion problem is a major issue in program verification and model checking.

In this paper we propose χ-terms as a means to capture and manipulate context-sensitive program information in a data-flow analysis. χ-terms are implemented as directed acyclic graphs without any redundant subgraphs. We introduce the k-approximation and the l-loop-approximation that limit the size of the context-sensitive information at the cost of analysis precision. We prove that every context-insensitive data-flow analysis has a corresponding k, l-approximated context-sensitive analysis, and that these analyses are sound and guaranteed to reach a fixed point.

We also present detailed algorithms outlining a compact, redundancy-free, and DAG-based implementation of χ-terms.

Place, publisher, year, edition, pages
Springer, 2022
Keywords
Static program analysis, Data-flow analysis, Context-sensitivity
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science
Identifiers
urn:nbn:se:lnu:diva-115519 (URN)10.1007/s00224-022-10093-w (DOI)000826845200001 ()2-s2.0-85134543665 (Scopus ID)
Funder
Linnaeus University
Available from: 2022-07-18 Created: 2022-07-18 Last updated: 2023-04-11Bibliographically approved
Hedenborg, M., Lundberg, J. & Löwe, W. (2021). Memory efficient context-sensitive program analysis. Journal of Systems and Software, 177, Article ID 110952.
Open this publication in new window or tab >>Memory efficient context-sensitive program analysis
2021 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 177, article id 110952Article in journal (Refereed) Published
Abstract [en]

Static program analysis is in general more precise if it is sensitive to execution contexts (execution paths). But then it is also more expensive in terms of memory consumption. For languages with conditions and iterations, the number of contexts grows exponentially with the program size. This problem is not just a theoretical issue. Several papers evaluating inter-procedural context-sensitive data-flow analysis report severe memory problems, and the path-explosion problem is a major issue in program verification and model checking.

In this paper we propose χ-terms as a means to capture and manipulate context-sensitive program information in a data-flow analysis. χ-terms are implemented as directed acyclic graphs without any redundant subgraphs.

To show the efficiency of our approach we run experiments comparing the memory usage of χ-terms with four alternative data structures. Our experiments show that χ-terms clearly outperform all the alternatives in terms of memory efficiency.

Place, publisher, year, edition, pages
Elsevier, 2021
Keywords
Static program analysis, Data-flow analysis, Context-sensitivity
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-102450 (URN)10.1016/j.jss.2021.110952 (DOI)000641355800001 ()2-s2.0-85103928999 (Scopus ID)2021 (Local ID)2021 (Archive number)2021 (OAI)
Available from: 2021-04-26 Created: 2021-04-26 Last updated: 2021-05-27Bibliographically approved
Laitinen, M. & Lundberg, J. (2020). ELF, language change and social networks: Evidence from real-time social media data. In: Anna Mauranen, Svatlana Vetchinnikova (Ed.), Language Change: The Impact of English as a Lingua Franca (pp. 179-204). Cambridge: Cambridge University Press
Open this publication in new window or tab >>ELF, language change and social networks: Evidence from real-time social media data
2020 (English)In: Language Change: The Impact of English as a Lingua Franca / [ed] Anna Mauranen, Svatlana Vetchinnikova, Cambridge: Cambridge University Press, 2020, p. 179-204Chapter in book (Refereed)
Abstract [en]

This article extends ELF studies towards variationist and computational sociolinguistics. It uses social network theory to explore how ELF is embedded in the social structures in which it is used and explores the size and nature of social networks in ELF. The empirical part investigates if multilingual and often mobile ELF users have larger networks and more weak ties than others, and if they therefore could be more likely to act as innovators or early adopters of change than the other speaker groups. Our empirical material consists of real-time social media data from Twitter. The results show that, statistically speaking, social embedding of ELF creates conditions that favor change. ELF users have larger networks and more weak ties than the other groups examined here. With regard to methods, social embedding needs to be taken into account in future studies, and we illustrate that variationist and computational sociolinguistics offers a useful theoretical and methodological toolbox for this task.

Place, publisher, year, edition, pages
Cambridge: Cambridge University Press, 2020
Keywords
Social networks, English as a lingua franca, multilingualism, big data, Twitter, social embedding
National Category
Specific Languages
Research subject
Humanities, English
Identifiers
urn:nbn:se:lnu:diva-98925 (URN)10.1017/9781108675000.011 (DOI)2-s2.0-85195934244 (Scopus ID)9781108729819 (ISBN)
Available from: 2020-11-13 Created: 2020-11-13 Last updated: 2024-09-03Bibliographically approved
Laitinen, M., Fatemi, M. & Lundberg, J. (2020). Size matters: digital social networks and language change. Frontiers in Artificial Intelligence, 3, 1-15, Article ID 46.
Open this publication in new window or tab >>Size matters: digital social networks and language change
2020 (English)In: Frontiers in Artificial Intelligence, E-ISSN 2624-8212, Vol. 3, p. 1-15, article id 46Article in journal (Refereed) Published
Abstract [en]

Social networks play a role in language variation and change, and the social network theory has offered a powerful tool in modeling innovation diffusion. Networks are characterized by ties of varying strength which influence how novel information is accessed. It is widely held that weak-ties promote change, whereas strong ties lead to norm-enforcing communities that resist change. However, the model is primarily suited to investigate small ego networks, and its predictive power remains to be tested in large digital networks of mobile individuals. This article revisits the social network model in sociolinguistics and investigates network size as a crucial component in the theory. We specifically concentrate on whether the distinction between weak and strong ties levels in large networks over 100 nodes. The article presents two computational methods that can handle large and messy social media data and render them usable for analyzing networks, thus expanding the empirical and methodological basis from small-scale ethnographic observations. The first method aims to uncover broad quantitative patterns in data and utilizes a cohort-based approach to network size. The second is an algorithm-based approach that uses mutual interaction parameters on Twitter. Our results gained from both methods suggest that network size plays a role, and that the distinction between weak ties and slightly stronger ties levels out once the network size grows beyond roughly 120 nodes. This finding is closely similar to the findings in other fields of the study of social networks and calls for new research avenues in computational sociolinguistics.

Place, publisher, year, edition, pages
Frontiers Media S.A., 2020
Keywords
Social networks, Twitter, bot exclusion, data mining, weak ties, social network size
National Category
Computer Sciences
Research subject
Computer and Information Sciences Computer Science, Computer Science; Humanities, English
Identifiers
urn:nbn:se:lnu:diva-97468 (URN)10.3389/frai.2020.00046 (DOI)000751673300045 ()33733163 (PubMedID)2-s2.0-85102966150 (Scopus ID)
Projects
DISA
Available from: 2020-08-04 Created: 2020-08-04 Last updated: 2022-05-12Bibliographically approved
Lundberg, J. & Laitinen, M. (2020). Twitter trolls: A linguistic profile of anti-democratic discourse. Language sciences (Oxford), 79, 1-14, Article ID 101268.
Open this publication in new window or tab >>Twitter trolls: A linguistic profile of anti-democratic discourse
2020 (English)In: Language sciences (Oxford), ISSN 0388-0001, E-ISSN 1873-5746, Vol. 79, p. 1-14, article id 101268Article in journal (Refereed) Published
Abstract [en]

This article focuses on anti-democratic discourse and investigates the linguistic profile of Twitter trolls. The troll data consist of some 3.5 million messages in English obtained through Twitter in late 2018. These data originate from potentially state-backed information operations aimed at sowing discord in Western societies. The baseline data, against which the troll data are compared, contain circa 4.4 million messages in English drawn from the Nordic Tweet Stream corpus. A machine learning application that enables us to select genuine personal messages in this corpus is used to prune the data. The empirical part investigates frequency-based characteristics of the two datasets. We utilize a set of automatically-extracted word-list information and the observed frequencies of personal pronouns. Our empirical findings show considerable quantitative differences so that the troll data are shorter, make use of a smaller number of lexical types and tokens, and resemble more formal registers, while the personal messages are more spoken-like. The results could be used to improve automated detection systems whose purpose is to identify troll accounts.

Place, publisher, year, edition, pages
Elsevier, 2020
Keywords
Social media trolls, Twitter, anti-democratization, discourse style, personal pronouns, English as a lingua franca
National Category
Specific Languages Computer Sciences
Research subject
Humanities, English
Identifiers
urn:nbn:se:lnu:diva-93507 (URN)10.1016/j.langsci.2019.101268 (DOI)000534360300008 ()2-s2.0-85077746472 (Scopus ID)
Projects
DISA
Available from: 2020-04-16 Created: 2020-04-16 Last updated: 2021-05-07Bibliographically approved
Lundberg, J., Nordqvist, J. & Laitinen, M. (2019). Towards a language independent Twitter bot detector. In: Navarretta Costanza et al. (Ed.), Proceedings of 4th Conference of The Association Digital Humanities in the Nordic Countries: Copenhagen, March 6-8 2019. Paper presented at 4th Conference of The Association Digital Humanities in the Nordic Countries, Copenhagen, March 6-8, 2019 (pp. 308-319). Copenhagen: University of Copenhagen, 2364
Open this publication in new window or tab >>Towards a language independent Twitter bot detector
2019 (English)In: Proceedings of 4th Conference of The Association Digital Humanities in the Nordic Countries: Copenhagen, March 6-8 2019 / [ed] Navarretta Costanza et al., Copenhagen: University of Copenhagen , 2019, Vol. 2364, p. 308-319Conference paper, Published paper (Refereed)
Abstract [en]

This article describes our work in developing an application that recognizes automatically generated tweets. The objective of this machine learning application is to increase data accuracy in sociolinguistic studies that utilize Twitter by reducing skewed sampling and inaccuracies in linguistic data. Most previous machine learning attempts to exclude bot material have been language dependent since they make use of monolingual Twitter text in their training phase. In this paper, we present a language independent approach which classifies each single tweet to be either autogenerated (AGT) or human-generated (HGT). We define an AGT as a tweet where all or parts of the natural language content is generated automatically by a bot or other type of program. In other words, while AGT/HGT refer to an individual message, the term bot refers to non-personal and automated accounts that post content to online social networks. Our approach classifies a tweet using only metadata that comes with every tweet, and we utilize those metadata parameters that are both language and country independent. The empirical part shows good success rates. Using a bilingual training set of Finnish and Swedish tweets, we correctly classified about 98.2% of all tweets in a test set using a third language (English).

Place, publisher, year, edition, pages
Copenhagen: University of Copenhagen, 2019
Series
CEUR Workshop Proceedings, E-ISSN 1613-0073
Keywords
Twitter, bots, bot detection, supervised machine learning
National Category
General Language Studies and Linguistics Computer and Information Sciences
Research subject
Humanities, English; Computer and Information Sciences Computer Science, Computer Science
Identifiers
urn:nbn:se:lnu:diva-81663 (URN)2-s2.0-85066050393 (Scopus ID)
Conference
4th Conference of The Association Digital Humanities in the Nordic Countries, Copenhagen, March 6-8, 2019
Available from: 2019-04-04 Created: 2019-04-04 Last updated: 2020-10-23Bibliographically approved
Alissandrakis, A., Reski, N., Laitinen, M., Tyrkkö, J., Lundberg, J. & Levin, M. (2019). Visualizing rich corpus data using virtual reality. Studies in Variation, Contacts and Change in English, 20
Open this publication in new window or tab >>Visualizing rich corpus data using virtual reality
Show others...
2019 (English)In: Studies in Variation, Contacts and Change in English, E-ISSN 1797-4453, Vol. 20Article in journal (Refereed) Published
Abstract [en]

We demonstrate an approach that utilizes immersive virtual reality (VR) to explore and interact with corpus linguistics data. Our case study focuses on the language identification parameter in the Nordic Tweet Stream corpus, a dynamic corpus of Twitter data where each tweet originated within the Nordic countries. We demonstrate how VR can provide previously unexplored perspectives into the use of English and other non-indigenous languages in the Nordic countries alongside the native languages of the region and showcase its geospatial variation. We utilize a head-mounted display (HMD) for a room-scale VR scenario that allows 3D interaction by using hand gestures. In addition to spatial movement through the Nordic areas, the interface enables exploration of the Twitter data based on time (days, weeks, months, or time of predefined special events), making it particularly useful for diachronic investigations.

In addition to demonstrating how the VR methods aid data visualization and exploration, we briefly discuss the pedagogical implications of using VR to showcase linguistic diversity. Our empirical results detail students’ reactions to working in this environment. The discussion part examines the benefits, prospects and limitations of using VR in visualizing corpus data.

Place, publisher, year, edition, pages
Helsinki: VARIENG, 2019
Keywords
virtual reality, Nordic Tweet Stream, digital humanities, immersive analytics
National Category
Human Computer Interaction Language Technology (Computational Linguistics) General Language Studies and Linguistics
Research subject
Computer and Information Sciences Computer Science; Computer and Information Sciences Computer Science, Computer Science; Computer Science, Information and software visualization; Humanities, Linguistics
Identifiers
urn:nbn:se:lnu:diva-90516 (URN)
Projects
DISA-DHOpen Data Exploration in Virtual Reality (ODxVR)
Available from: 2019-12-12 Created: 2019-12-12 Last updated: 2023-04-26Bibliographically approved
Lincke, A., Lundberg, J., Thunander, M., Milrad, M., Lundberg, J. & Jusufi, I. (2018). Diabetes Information in Social Media. In: Karsten Klein, Yi-Na Li, and Andreas Kerren (Ed.), Proceedings of the 11th International Symposium on Visual Information Communication and Interaction (VINCI '18): . Paper presented at 11th International Symposium on Visual Information Communication and Interaction (VINCI '18), 13-15 August 2018, Växjö, Sweden (pp. 104-105). ACM Publications
Open this publication in new window or tab >>Diabetes Information in Social Media
Show others...
2018 (English)In: Proceedings of the 11th International Symposium on Visual Information Communication and Interaction (VINCI '18) / [ed] Karsten Klein, Yi-Na Li, and Andreas Kerren, ACM Publications, 2018, p. 104-105Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Social media platforms have created new ways for people to communicate and express themselves. Thus, it is important to explore how e-health related information is generated and disseminated in these platforms. The aim of our current efforts is to investigate the content and flow of information when people in Sweden use Twitter to talk about diabetes related issues. To achieve our goals, we have used data mining and visualization techniques in order to explore, analyze and cluster Twitter data we have collected during a period of 10 months. Our initial results indicate that patients use Twitter to share diabetes related information and to communicate about their disease as an alternative way that complements the traditional channels used by health care professionals.

Place, publisher, year, edition, pages
ACM Publications, 2018
Keywords
Social media, Twitter data analysis, diabetes, visualization
National Category
Computer Sciences Human Computer Interaction
Research subject
Computer Science, Information and software visualization; Computer and Information Sciences Computer Science, Computer Science; Computer and Information Sciences Computer Science, Media Technology; Health and Caring Sciences, Health Informatics
Identifiers
urn:nbn:se:lnu:diva-78214 (URN)10.1145/3231622.3232508 (DOI)2-s2.0-85055512544 (Scopus ID)978-1-4503-6501-7 (ISBN)
Conference
11th International Symposium on Visual Information Communication and Interaction (VINCI '18), 13-15 August 2018, Växjö, Sweden
Available from: 2018-10-09 Created: 2018-10-09 Last updated: 2020-10-26Bibliographically approved
Laitinen, M., Lundberg, J., Levin, M. & Martins, R. M. (2018). The Nordic Tweet Stream: A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data. In: Eetu Mäkelä, Mikko Tolonen, Jouni Tuominen (Ed.), DHN 2018 Digital Humanities in the Nordic Countries 3rd Conference: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, March 7-9, 2018. Paper presented at Digital Humanities in the Nordic Countries 3rd Conference, Helsinki, Finland, March 7-9, 2018 (pp. 349-362). CEUR-WS.org
Open this publication in new window or tab >>The Nordic Tweet Stream: A Dynamic Real-Time Monitor Corpus of Big and Rich Language Data
2018 (English)In: DHN 2018 Digital Humanities in the Nordic Countries 3rd Conference: Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, March 7-9, 2018 / [ed] Eetu Mäkelä, Mikko Tolonen, Jouni Tuominen, CEUR-WS.org , 2018, p. 349-362Conference paper, Published paper (Refereed)
Abstract [en]

This article presents the Nordic Tweet Stream (NTS), a cross-disciplinarycorpus project of computer scientists and a group of sociolinguists interestedin language variability and in the global spread of English. Our research integratestwo types of empirical data: We not only rely on traditional structured corpusdata but also use unstructured data sources that are often big and rich inmetadata, such as Twitter streams. The NTS downloads tweets and associatedmetadata from Denmark, Finland, Iceland, Norway and Sweden. We first introducesome technical aspects in creating a dynamic real-time monitor corpus, andthe following case study illustrates how the corpus could be used as empiricalevidence in sociolinguistic studies focusing on the global spread of English tomultilingual settings. The results show that English is the most frequently usedlanguage, accounting for almost a third. These results can be used to assess howwidespread English use is in the Nordic region and offer a big data perspectivethat complement previous small-scale studies. The future objectives include annotatingthe material, making it available for the scholarly community, and expandingthe geographic scope of the data stream outside Nordic region.

Place, publisher, year, edition, pages
CEUR-WS.org, 2018
Series
CEUR Workshop Proceedings, ISSN 1613-0073 ; 2084
Keywords
Real-time language data, Nordic Tweet Stream, Twitter
National Category
General Language Studies and Linguistics Specific Languages
Research subject
Humanities, English
Identifiers
urn:nbn:se:lnu:diva-78277 (URN)2-s2.0-85045342911 (Scopus ID)
Conference
Digital Humanities in the Nordic Countries 3rd Conference, Helsinki, Finland, March 7-9, 2018
Projects
DISA
Available from: 2018-10-11 Created: 2018-10-11 Last updated: 2021-05-27Bibliographically approved
Alissandrakis, A., Reski, N., Laitinen, M., Tyrkkö, J., Levin, M. & Lundberg, J. (2018). Visualizing dynamic text corpora using Virtual Reality. In: ICAME 39 : Tampere, 30 May – 3 June, 2018: Corpus Linguistics and Changing Society : Book of Abstracts. Paper presented at The 39th Annual Conference of the International Computer Archive for Modern and Medieval English (ICAME39): Corpus Linguistics and Changing Society. Tampere, 30 May - 3 June, 2018 (pp. 205-205). Tampere: University of Tampere
Open this publication in new window or tab >>Visualizing dynamic text corpora using Virtual Reality
Show others...
2018 (English)In: ICAME 39 : Tampere, 30 May – 3 June, 2018: Corpus Linguistics and Changing Society : Book of Abstracts, Tampere: University of Tampere , 2018, p. 205-205Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

In recent years, data visualization has become a major area in Digital Humanities research, and the same holds true also in linguistics. The rapidly increasing size of corpora, the emergence of dynamic real-time streams, and the availability of complex and enriched metadata have made it increasingly important to facilitate new and innovative approaches to presenting and exploring primary data. This demonstration showcases the uses of Virtual Reality (VR) in the visualization of geospatial linguistic data using data from the Nordic Tweet Stream (NTS) project (see Laitinen et al 2017). The NTS data for this demonstration comprises a full year of geotagged tweets (12,443,696 tweets from 273,648 user accounts) posted within the Nordic region (Denmark, Finland, Iceland, Norway, and Sweden). The dataset includes over 50 metadata parameters in addition to the tweets themselves.

We demonstrate the potential of using VR to efficiently find meaningful patterns in vast streams of data. The VR environment allows an easy overview of any of the features (textual or metadata) in a text corpus. Our focus will be on the language identification data, which provides a previously unexplored perspective into the use of English and other non-indigenous languages in the Nordic countries alongside the native languages of the region.

Our VR prototype utilizes the HTC Vive headset for a room-scale VR scenario, and it is being developed using the Unity3D game development engine. Each node in the VR space is displayed as a stacked cuboid, the equivalent of a bar chart in a three-dimensional space, summarizing all tweets at one geographic location for a given point in time (see: https://tinyurl.com/nts-vr). Each stacked cuboid represents information of the three most frequently used languages, appropriately color coded, enabling the user to get an overview of the language distribution at each location. The VR prototype further encourages users to move between different locations and inspect points of interest in more detail (overall location-related information, a detailed list of all languages detected, the most frequently used hashtags). An underlying map outlines country borders and facilitates orientation. In addition to spatial movement through the Nordic areas, the VR system provides an interface to explore the Twitter data based on time (days, weeks, months, or time of predefined special events), which enables users to explore data over time (see: https://tinyurl.com/nts-vr-time).

In addition to demonstrating how the VR methods aid data visualization and exploration, we will also briefly discuss the pedagogical implications of using VR to showcase linguistic diversity.

Place, publisher, year, edition, pages
Tampere: University of Tampere, 2018
Keywords
virtual reality, Nordic Tweet Stream, digital humanities
National Category
General Language Studies and Linguistics Human Computer Interaction Language Technology (Computational Linguistics)
Research subject
Computer Science, Information and software visualization; Humanities, Linguistics
Identifiers
urn:nbn:se:lnu:diva-75064 (URN)
Conference
The 39th Annual Conference of the International Computer Archive for Modern and Medieval English (ICAME39): Corpus Linguistics and Changing Society. Tampere, 30 May - 3 June, 2018
Projects
DISA-DHOpen Data Exploration in Virtual Reality (ODxVR)
Available from: 2018-06-05 Created: 2018-06-05 Last updated: 2018-07-23Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-9775-4594

Search in DiVA

Show all publications