Here we describe some of the general decisions we have taken in the transcribing of spoken L2 French and L2 Spanish, using the CHAT system developed by the CHILDES project. We also describe below some of the adaptations we have made to the CHAT system, in the context of L2 data. Detailed guides to the transcription of L2 French and L2 Spanish using CHAT conventions has been produced and the most recent versions are available on request to the research team.
All transcripts and sound files have been anonymised to eliminate personal details so that individual learners are not identifiable.
Filenames include a single capital letter code for the task, a 3-digit code for the participant, a single lower case letter code for the data collection occasion, plus the initials of the researcher who administered the task. Thus for example the filename “O114bKmcM” refers to the Oral Interview task undertaken by Participant 114 during the second data collection round (Visit 1), and administered by researcher Kevin McManus.
Headers for all transcribed files follow CHAT conventions. Here is an example:
The data has been transcribed orthographically. This is necessary in order to use the morphosyntactic parsers provided by CHILDES/ CLAN for French and Spanish on the completed transcripts. In the interests of automatic part of speech (POS) tagging at times the transcription is somewhat deviant from the actual phonological shape of the words produced by learners. However other researchers interested in e.g. L2 phonology, can refer to the soundfiles and add their own level of coding to the transcripts provided.
We have not consistently used an error tier, which was not necessary for our research agenda, as the syntactic and morphological errors made by our L2 learners can be retrieved more systematically from the POS tagged output.
*114: | et euh la plupart de mes élèves sont des garçons parce+que c' est un lycée technique euh. |
*114: | et ils sont très difficiles . |
*114: | et donc c' est difficile de faire les exercices oraux [*] parce+que ils veulent toujours parler en français. |
%err: | oraux = orals |
All pauses are indicated with (.) and have not been timed.
Overlapping of speech turns in the written transcripts is indicated using standard CHAT conventions.
The speech turns for the L2 learner(s) in every file have been separated into distinct utterances as per CHILDES conventions, so MLU calculations can be carried out. However this has not been done for the researcher speech turns, so MLU calculations on the researchers' length of utterance will not be accurate.
A number of codes have been added to the CHAT system for the specific purposes of second language research. These codes cover the following issues: