This paper describes on-going work in the field of multimodal prosody carried out by means of simultaneous recordings of speech acoustics, articulation and head movements. People naturally move their heads when they speak, and head movements have been found both to correlate strongly with the pitch and amplitude of the speaker's voices and to convey linguistic information. Here, we report on a study that explores how head movement patterns vary and co-occur with lexical pitch accents (and their acoustic correlates F0 and intensity) and vowel length. The study uses data from Swedish, where there are both two lexical pitch accents and two vowel lengths that differ phonologically.
We use EMA (Electromagnetic articulography), which allows for high sample rates, accurate synchronisation of kinematic and acoustic recordings, as well as three-dimensional movement data. Kinematic data is obtained by gluing small sensors on the speakers’ articulators (tongue, lips, jaw). Head movement data is obtained by similar sensors on the nose ridge and behind the ears, which allows us to capture the angle of the tilt of the head.
Articulatory data was collected from 18 South Swedish speakers (12 female) using a Carstens AG501. Each speaker read leading questions + sentences containing a target word from a prompter (presented eight times in random order), an arrangement employed to put a contrastive focus onto the last element in the target sentence. This left the target word in a low-prominence inducing context, hence controlling for possible effects of sentence intonation.
For this study we used eight target words where pitch accent and vowel length were cross- matched so that there were two cases of each combination of word accent category and vowel length category. All words shared the similar word-initial C /m/, followed by a vowel that was either /a/ or /ɑ:/. The target words were segmented and time-normalized between 0 to 1 and the head tilt angle (sagAng) was normalized for each speaker by z-transforming the angles per speaker. Spatial movements were analysed using Generalized Additive Models, which we used to test if there were effects of segmental position (C versus V in the first syllable), word accent (1 or 2) and vowel length (short or long) on sagAng. Models were fit using the maximum likelihood (ML) estimation method.
The Chi-Square test on the ML scores indicates that a model with the word accent distinction is significantly better than a model without it (X2(4.00)=632.796, p<2e-16***). Similarly, a model with vowel length distinction is significantly better than a model without it (X2(4.00)=820.997, p<2e-16***). Finally, a model with segmental position is significantly better than a model without it (X2(8.00)= 173.316, p<2e-16***).
The results indicate that head nod patterns that occur in synchronisation with the stressed syllable of spoken words differ with respect to word accent, vowel length and segmental position. This could possibly point to an effect of F0 and intensity on the head nod movements.
University of Leuven , 2019. p. 11-11
MMSYM 2019 - 6th European and 9th Nordic Symposium on Multimodal Communication, Leuven, September 9-10, 2019