LAMBA  Search

Annotated Longitudinal Latvian Children's Speech Corpus

The corpus contains speech recordings of four children aged between 17-44 months and their transcripts in orthographic transcription (approx. 34 hours) made over a period of 18 months. The recordings were made at the children's place of residence during their play or daily activities. The most common interlocutors are the parents of the children. Orthographically transcribed data is automatically morphologically annotated.

I. Auzina, K. Levane-Petrova, G. Rabante-Busa, R. Dargis, A. Fabregas
Designing an annotated longitudinal Latvian children's speech corpus
IOS Press, 2016
I. Auziņa, R. Darģis, G. Rābante-Buša, K. Levāne-Petrova, B. Saulīte
Annotated Longitudinal Latvian Children's Speech Corpus (LAMBA)
CLARIN-LV digital library, 2017
Corpus size 34 hours
Data period 2015–2017
Development period 2015–2017
Developers Institute of Mathematics and Computer Science UL
Funding Norwegian Financial Mechanism, “Latvian Language in Monolingual and Bilingual Acquisition: tools, theories and applications” (No. NFI/R/2014/053)
Other publications
I. Auzina, K. Levane-Petrova, B. Saulite
Local meaning in the two-year-old and three-year-old children's speech
Language: Meaning and Form, 8, 2017