LATE-sarunas Search Word Frequency List

LATE Conversational Speech Corpus

Corpus contains recordings of informal conversations, interviews and public speeches and their transcripts in orthographic transcription. Metadata has been added to each audio recording: gender and age group of the speaker, information about the form of speech – dialogue, monologue, spontaneous or prepared speech, etc.

Citation

Publication

I. Auzina, N. Gruzitis, R. Dargis, G. Rabante-Busa, D. Gosko, J. Vempers, R. Kivkucans, A. Znotins
Recent Latvian Speech Corpora for Linguistic Research and Technology Development
Baltic Journal of Modern Computing, 12(4), 646-658, 2024

PDF DOI

Data

I. Auziņa, R. Darģis, G. Rābante-Buša, I. Timinska-Ļaksa, E. Gailīte, A. Auziņa
LATE Conversational Speech Corpus (LATE-sarunas)
CLARIN-LV digital library, 2024
http://hdl.handle.net/20.500.12574/113

speech (10) specialised (35) morphology (41) manually annotated (9)

Corpus size	44 hours (429k tokens)
Data period	2012–2024
Development period	2021–2024
Developers	Institute of Mathematics and Computer Science UL, Institute of Literature, Folklore and Art UL
Funding	State Research Programme "Letonika – Fostering a Latvian and European Society" (VPP-LETONIKA-2021/1-0006)
CLARIN	http://hdl.handle.net/20.500.12574/113
Other publications	I. Auzina and G. Rabante-Busa Sarunvalodai tipiskie fonētiskie līdzekļi: runas korpusa datu analīze Valoda: nozīme un forma, 15, 7-23, 2024 PDF DOI