This study examines audio-visual perception of second-language (L2) speech, with the goal of investigating the extent to which the auditory and visual input modalities are integrated in processing unfamiliar L2 speech. Native (Canadian English) and nonnative (Mandarin) perceivers responses were collected for a set of fricative-initial syllables presented with a quiet and a cafe-noise background, and presented in four ways: congruent audio-visual (AVc), incongruent audio-visual (AVi), audio-only (A) and visual-only (V). Results show that for both native groups, performance was better in the AVc condition than A or V condition; and better in quiet than in cafe-noise background. A comparison of the native and nonnative performance revealed that Mandarin participants showed (1) poorer identification of the L2 interdental fricatives, (2) a greater degree of reliance on visual information, even when auditory information was available, and (3) a higher percentage of McGurk responses with the incongruent AV speech. These findings indicate that although normatives were able to use visual information, they failed to adopt the visual cues that are linguistically characteristic of the L2 sounds, suggesting a language-specific AV processing pattern. However, similarities between the two native groups are also indicative of possible perceptual universals involved. Together they point to an integrated network in speech processing across modalities.

, ,
INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP

Wang, Y. (Yue), Behne, D. (Dawn), Jiang, H. (Haisheng), & Danyluck, C. (2006). Native and nonnative audio-visual perception of English fricatives in quiet and café-noise backgrounds. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 881–884).