MuLaR Search Word Frequency List

Corpus of Contemporary Latgalian Speech

The corpus consists of audio recordings and their transcripts. It documents natural, spontaneous speech, including field research recordings, interviews, TV and radio broadcasts.

Citation

Publication

A. Juško-Štekele and A. Kļavinska
Mūsdienu latgaliešu valodas runas korpusa izveide mazāk lietoto valodu dokumentēšanas kontekstā
Letonica, 226-242, 2022

PDF

Data

S. Martena, N. Nau, A. Kļavinska, A. Juško-Štekele, A. Kociņš-Kūceņš, A. Sprukte, A. Briška, I. Gusāns, L. Mazure
Corpus of Contemporary Latgalian Speech (MuLaR)
CLARIN-LV digital library, 2024
http://hdl.handle.net/20.500.12574/118

speech (10) specialised (35) latgalian (5) manually annotated (9)

Corpus size	27 hours (200k tokens)
Data period	2009–2021
Development period	2021–2024
Developers	Rezekne Academy of Technologies
Funding	State Research Programme "Digital Resources for Humanities" (VPP-IZM-DH-2020/1-0001); State Research Programme "Diversity of Latvian in Time and Space" (VPP-LETONIKA-2021/4-0003)
Homepage	https://mularkorpuss.rta.lv
CLARIN	http://hdl.handle.net/20.500.12574/118