In recent years, data visualization has become a major area in Digital Humanities research, and the same holds true also in linguistics. The rapidly increasing size of corpora, the emergence of dynamic real-time streams, and the availability of complex and enriched metadata have made it increasingly important to facilitate new and innovative approaches to presenting and exploring primary data. This demonstration showcases the uses of Virtual Reality (VR) in the visualization of geospatial linguistic data using data from the Nordic Tweet Stream (NTS) project (see Laitinen et al 2017). The NTS data for this demonstration comprises a full year of geotagged tweets (12,443,696 tweets from 273,648 user accounts) posted within the Nordic region (Denmark, Finland, Iceland, Norway, and Sweden). The dataset includes over 50 metadata parameters in addition to the tweets themselves.
We demonstrate the potential of using VR to efficiently find meaningful patterns in vast streams of data. The VR environment allows an easy overview of any of the features (textual or metadata) in a text corpus. Our focus will be on the language identification data, which provides a previously unexplored perspective into the use of English and other non-indigenous languages in the Nordic countries alongside the native languages of the region.
Our VR prototype utilizes the HTC Vive headset for a room-scale VR scenario, and it is being developed using the Unity3D game development engine. Each node in the VR space is displayed as a stacked cuboid, the equivalent of a bar chart in a three-dimensional space, summarizing all tweets at one geographic location for a given point in time (see: https://tinyurl.com/nts-vr). Each stacked cuboid represents information of the three most frequently used languages, appropriately color coded, enabling the user to get an overview of the language distribution at each location. The VR prototype further encourages users to move between different locations and inspect points of interest in more detail (overall location-related information, a detailed list of all languages detected, the most frequently used hashtags). An underlying map outlines country borders and facilitates orientation. In addition to spatial movement through the Nordic areas, the VR system provides an interface to explore the Twitter data based on time (days, weeks, months, or time of predefined special events), which enables users to explore data over time (see: https://tinyurl.com/nts-vr-time).
In addition to demonstrating how the VR methods aid data visualization and exploration, we will also briefly discuss the pedagogical implications of using VR to showcase linguistic diversity.