LAMBA Search

Annotated Longitudinal Latvian Children's Speech Corpus

The corpus contains speech recordings of four children aged between 17–44 months and their transcripts in orthographic transcription (approx. 34 hours) made over a period of 18 months. The recordings were made at the children's place of residence during their play or daily activities. The most common interlocutors are the parents of the children. Orthographically transcribed data is automatically morphologically annotated.

Citation

Publication

I. Auzina, K. Levane-Petrova, G. Rabante-Busa, R. Dargis, A. Fabregas
Designing an annotated longitudinal Latvian children's speech corpus
Human Language Technologies - The Baltic Perspective, IOS Press, 2016

PDF DOI

Data

I. Auziņa, R. Darģis, G. Rābante-Buša, K. Levāne-Petrova, B. Saulīte
Annotated Longitudinal Latvian Children's Speech Corpus (LAMBA)
CLARIN-LV digital library, 2017
http://hdl.handle.net/20.500.12574/7

speech (10) specialised (35) morphology (41)

Corpus size	34 hours
Data period	2015–2017
Development period	2015–2017
Developers	Institute of Mathematics and Computer Science UL
Funding	Norwegian Financial Mechanism, “Latvian Language in Monolingual and Bilingual Acquisition: tools, theories and applications” (No. NFI/R/2014/053)
Homepage	http://runa.lamba.lv/
CLARIN	http://hdl.handle.net/20.500.12574/7
Other publications	I. Auzina, K. Levane-Petrova, B. Saulite Local meaning in the two-year-old and three-year-old children's speech Language: Meaning and Form, 8, 2017