3.2.3. Listener-judges and rating procedure

admin

13 ans ago

For the ratings of the productions, we followed Derwing, Munro and Wiebe’s (1998),
Missaglia’s (1999), and Birdsong’s (2003) experiments, in so far as we resorted to subjective
evaluations by listeners. Three listener-judges kindly accepted to score all four hundred
items, despite the large amount of work and time that it implied(2) – using the scores of only
one listener would have been insufficient and possibly too much biased. The chosen raters
were perfectly familiar with the pronunciation of English. Two of them (Judge 1 and Judge 2)
were native speakers of English who are linguistically and phonetically untrained: a middleaged
British speaker who came from Lincolnshire (England), and an American speaker in her
twenties, from Tennessee (United States of America). The former did not speak any foreign
language, and the latter knew French and Spanish as foreign languages. Judge 3 was a nonnative
expert in English phonology, who has been teaching it to French-speaking English
students at university level for several years.

Just like Derwing, Munro and Wiebe’s (1998) experiment, which is very close to this one,
the evaluation of the productions was blind. The four hundred items (i.e. pre-training and
post-training productions alike) were randomized, renamed and numbered from 1 to 400 in
a special folder named under the simple label “sound files”, before they were given to the
listener-judges. The latter were not made aware of the aim and procedure of the experiment.
They were just given instructions (see Appendix D) in which they were asked to score four
hundred productions of English by French speakers. Contrary to Ploquin’s (2009) experiment
(cf. 2.2.2.) investigating the production of English prosody by French learners, our three
judges were not to focus on any particular aspect of phonology – they had to rate the global
English quality of the speakers, that is, a whole mixture of overall intelligibility, foreignaccentedness,
and segmental and suprasegmental accuracy. The number of speakers was not
told to them, either. The rating task took place in a quiet room at the University of Lille III.
The judges sat at a computer, and had headphones to listen to the sound files in the folder.
They wrote down the scores in a two-column table – one column with the numbers of the
files, and one column for the scores – on a piece of paper. What they had to do was listen to
one recording/sound file at a time, and give it a score on a 7-point scale, with the lowest score
being 1 (= terrible/strong foreign accent/unintelligible), the highest being 7 (= native-like/no
foreign accent), and various shades in-between. The scale did not include 0. The raters were
advised to stick to their very first idea when they hesitated. They were allowed to listen to
the sound file a second time if they had not heard it the first time, for example if they had
been surprised by the sudden start of the file (sometimes very short in the case of words). To
go to the next file, they simply had to double-click on it. Each judge did the rating task alone,
and on a different day each. The evaluations lasted between an hour and a half and two
hours.

2 Many thanks to all three judges who accepted to do this task.

Page suivante : 3.3. Summary

Retour au menu : EXPERIMENTAL RESEARCH INTO THE ACQUISITION OF ENGLISH RHYTHM AND PROSODY BY FRENCH LEARNERS