Young Learners Corpus
Transcription Conventions
The data is transcribed following the CHILDES conventions, with the additional conventions used for all the corpora in this database (see additional conventions link).
Additionally we made the following decisions regarding the coding of this corpus:
Speaker Tiers
Each speaker tier is coded for language: The speaker code is TEAE if the utterance is said by the teacher in English and TEAF if the utterance is said by the teacher in French. The codes for the children’s lines are the following:
*CHIE: | one child speaks English |
*CHIF: | one child speaks French |
*CHISE: | several children speak English |
*CHISF: | several children speak French |
Repetitions
Conscious repetition with no change are not marked using the usual [/] symbols as we wanted to be able to indicate the fact that the teacher was emphasising a word to teach it.
Singing and shouting
If the whole utterance is sung, it is transcribed normally, with the %com line below, indicating ‘singing’. Singing applying to just one word within a sentence is transcribed bonjour [=! singing] or for a few words
Sounds
When the teacher models a French sound, it is transcribed as: &rrr, or &s etc.
Mixed language utterances
These are marked using the @s:eng or @s:fra codes after each word. Additional information about the part-of-speech category of the word is given following the categorisation below:
Noun | @s:eng$n |
Adjective | @s:eng$adj |
Adverb | @s:eng$adv |
Preposition | @s:eng$prep |
Verb | @s:eng$v |
Pronoun | @s:eng$pro |
Determiner | @s:eng$det |
Conjunction | @s:eng$conj |