Young Learners Corpus

Transcription Conventions

The data is transcribed following the CHILDES conventions, with the additional conventions used for all the corpora in this database (see additional conventions link).

Additionally we made the following decisions regarding the coding of this corpus:

Speaker Tiers

Each speaker tier is coded for language: The speaker code is TEAE if the utterance is said by the teacher in English and TEAF if the utterance is said by the teacher in French. The codes for the children’s lines are the following:

*CHIE:	one child speaks English
*CHIF:	one child speaks French
*CHISE:	several children speak English
*CHISF:	several children speak French

Repetitions

Conscious repetition with no change are not marked using the usual [/] symbols as we wanted to be able to indicate the fact that the teacher was emphasising a word to teach it.

Singing and shouting

If the whole utterance is sung, it is transcribed normally, with the %com line below, indicating ‘singing’. Singing applying to just one word within a sentence is transcribed bonjour [=! singing] or for a few words [=! singing].

Sounds

When the teacher models a French sound, it is transcribed as: &rrr, or &s etc.

Mixed language utterances

These are marked using the @s:eng or @s:fra codes after each word. Additional information about the part-of-speech category of the word is given following the categorisation below:

Noun	@s:eng$n
Adjective	@s:eng$adj
Adverb	@s:eng$adv
Preposition	@s:eng$prep
Verb	@s:eng$v
Pronoun	@s:eng$pro
Determiner	@s:eng$det
Conjunction	@s:eng$conj