Reading Corpus
Transcription conventions
Reading Corpus 1 French L2
Pronunciation errors
Only those errors serious enough to cause a breakdown in communication, or which were followed by a teacher correction, were coded. These were transcribed in UNIBET on the %err tier.
Use of English
Where the whole utterance is in English, a separate speaker tier is used (*STE). English words inserted in French are marked with the @s suffix (father@s). Students who also learn German sometimes use German words; these are marked with @g suffix.
Acknowledgement tokens
Acknowledgement tokens have been counted as back channels and are marked [+bch]. These can be excluded from MLU and MLT counts using the –s”[+bch]” switch.
Pauses are marked with the CHAT pause marker # followed by a number which indicates length of pause e.g. #2
Filled Pauses
Filled Pauses used are ‘aah’, ‘euh’, ‘mm’, and ‘um’.
Reading corpus II Native Speakers
CHAT conventions were used.