LATE-mediji  Search Word Frequency List

LATE-media

Corpus includes audio recordings of media broadcasts and their transcripts in orthographic transcription. The data are written down in the orthography of Standard Latvian, observing also the principles of punctuation.

Citation
Publication
I. Auzina, N. Gruzitis, R. Dargis, G. Rabante-Busa, D. Gosko, J. Vempers, R. Kivkucans, A. Znotins
Recent Latvian Speech Corpora for Linguistic Research and Technology Development
Baltic Journal of Modern Computing, 12(4), 646-658, 2024
Data
I. Auziņa, R. Darģis, K. Levāne-Petrova, A. Auziņa, B. Saulīte, I. Ļaksa-Timinska, E. Gailīte, G. Nešpore-Bērzkalne, G. Rābante-Buša, K. Pokratniece, A. Klints
LATE-media (LATE-mediji)
CLARIN-LV digital library, 2024
http://hdl.handle.net/20.500.12574/114
Corpus size 78 hours (682k tokens)
Data period 2015–2020
Development period 2021–2024
Developers Institute of Mathematics and Computer Science UL
Funding State Research Programme "Letonika – Fostering a Latvian and European Society" (VPP-LETONIKA-2021/1-0006)
CLARIN http://hdl.handle.net/20.500.12574/114