lnu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multimodal prominence ratings: Effects of screen size and audio device
Linnaeus University, Faculty of Arts and Humanities, Department of Swedish Language. (IMS)ORCID iD: 0000-0001-5324-3071
Lund University, Sweden.
KTH Royal Institute of Technology, Sweden.
2019 (English)In: Book of Abstracts MMSYM 2019, University of Leuven , 2019, p. 2-3Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

Prosodic prominence is a multimodal phenomenon involving both acoustic and kinematic dimensions. In order to study the multimodal nature of prominence, we need to collect prominence ratings based on audio-visual speech material from large groups of speakers. This is feasible by means of a web-based crowdsourcing set-up, allowing volunteers to participate using a private computer or mobile phone. However, this freedom also implies a certain reduction of experimental control due to variation in hardware used by the raters. 

In this pilot study we explore potential effects of two hardware features – screen size and audio device (headphones vs. loudspeakers) – on multimodal prominence ratings. To this end, 16 brief clips from Swedish television news (218 words in total) were rated by 31 native Swedish volunteers using a web-based set-up. In our GUI, orthographic representations of the text were displayed below the video player. Each word was to be rated as either non-prominent, moderately prominent, or strongly prominent, by means of clicking on the word in question until the desired prominence level was encoded through a specific color (yellow: moderate; red: strong). Participants were free to use a mobile phone, a tablet, or a computer, and headphones or loudspeakers, and we collected information about their hardware using a questionnaire. In addition, we automatically logged the screen size of the participant’s computer/phone. 

We applied two different approaches to analyze the participant’s rating behavior as a function of the hardware features under discussion. First, we calculated a selection of five variables from the raw prominence ratings: (i) the sum of all ratings (over all 218 words), (ii) the percentage of words rated as (moderately or strongly) prominent, (iii) among prominent words, the proportion of words rated as strongly prominent, and (iv-v) the relative prominence rating of two selected words. Effects of screen size and audio device on these variables were analyzed using linear regression models. Second, we calculated inter-rater reliability for multiple raters using Fleiss’ kappa, both for all raters as a reference and for subgroups concerning audio device and screen size. 

The results reveal a significant model fit for variable (iii) defined above (proportion of strong ratings; F(5;21) = 5.332; p=.0022**), suggesting a significantly higher proportion of strong prominent ratings obtained with loudspeakers (34.0% of words rated as prominent on average) compared to with headphones (18.3%; t=2.944; p=.0073**), as well as with medium size screens (34.2%) compared to with small screens (24.4%; t=2.433; p=.0232*); however, the proportion of strong prominent ratings tended to be lowest with large screens (14.2% on average). Effects of screen size were also reflected in inter-rater reliability, revealing the highest kappa for users with medium-sized screens (kappa=.566, when ratings are recoded to a binary decision) compared to large (kappa=.485) and small screens (mobile phones, kappa=.437). However, inter-rater reliability was less affected by the listening condition (headphones vs. loudspeakers). 3 

To conclude, the choice of hardware might have effects on multimodal prominence ratings, which has to be considered in crowdsourcing approaches. More detailed results will be presented at the conference.

Place, publisher, year, edition, pages
University of Leuven , 2019. p. 2-3
Keywords [en]
audio-visual perception, crowdsourcing, web-based, inter-rater reliability, headphones
National Category
General Language Studies and Linguistics
Research subject
Humanities, Linguistics
Identifiers
URN: urn:nbn:se:lnu:diva-92579OAI: oai:DiVA.org:lnu-92579DiVA, id: diva2:1411845
Conference
MMSYM 2019, 6th European and 9th Nordic Symposium on Multimodal Communication, Research group MIDI (Multimodality, Interaction & Discourse), University of Leuven, September 9-10, 2019
Projects
PROGEST - Production of prosodic prominence: integrating bodily and articulatory gestures (Swedish Research Council)
Funder
Swedish Research Council, 2017-02140Swedish Research Council, 2013-2003Available from: 2020-03-04 Created: 2020-03-04 Last updated: 2020-03-05Bibliographically approved

Open Access in DiVA

abstract(40 kB)2 downloads
File information
File name FULLTEXT01.pdfFile size 40 kBChecksum SHA-512
f410fccbc1e6170efd0ccb08ac2928b66ad9f7afea255eb0394ecad7a84a92ee8b8f73e0cfb0c98fb16302af44b3fd6e0e14950e202b874f20b4de09b02c1332
Type summaryMimetype application/pdf

Other links

Book of Abstracts

Authority records BETA

Ambrazaitis, Gilbert

Search in DiVA

By author/editor
Ambrazaitis, Gilbert
By organisation
Department of Swedish Language
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 84 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf