Search the SPLLOC Corpus
The SPLLOC datasets are comprised of digitally recorded sound files of learner Spanish, together with transcripts in CHAT, plus transcripts in XML formats, and tagged files in some cases.
The tagged files are an additional set of transcripts which have been tagged using the automatic morphosyntactic parser (MOR). They therefore contain an additional level of coding (%MOR; see CHILDES for further details). At present these are available for the "Photos + Interview" task only.
For each of the tasks included in the corpora, there are five folders: soundfiles in wav format, soundfiles in mp3 format, transcripts in CHAT format, transcripts with morphosyntactic tags, and transcripts in XML format. In turn these are subdivided into learner groups and native speakers. To view or download the data choose a task.
You can also extract subsets from the SPLLOC corpora using the search criteria given below:
The search string will be searched for in all utterances produced by the participant(s) specified in the rest of the search criteria. The search string can be a phrase, word or part of word. When searching for full words and phrases these should be delimited by an initial and final space.