LATE-mediji Search Word Frequency List
LATE-media
Corpus includes audio recordings of media broadcasts and their transcripts in orthographic transcription. The data are written down in the orthography of Standard Latvian, observing also the principles of punctuation.
Citation
Publication
I. Auzina,
N. Gruzitis,
R. Dargis,
G. Rabante-Busa,
D. Gosko,
J. Vempers,
R. Kivkucans,
A. Znotins
Recent Latvian Speech Corpora for Linguistic Research and Technology Development
Baltic Journal of Modern Computing, 12(4), 646-658, 2024
Recent Latvian Speech Corpora for Linguistic Research and Technology Development
Baltic Journal of Modern Computing, 12(4), 646-658, 2024
Data
I. Auziņa, R. Darģis, K. Levāne-Petrova, A. Auziņa, B. Saulīte, I. Ļaksa-Timinska, E. Gailīte, G. Nešpore-Bērzkalne, G. Rābante-Buša, K. Pokratniece, A. Klints
LATE-media (LATE-mediji)
CLARIN-LV digital library, 2024
http://hdl.handle.net/20.500.12574/114
LATE-media (LATE-mediji)
CLARIN-LV digital library, 2024
http://hdl.handle.net/20.500.12574/114
Corpus size | 78 hours (682k tokens) |
Data period | 2015–2020 |
Development period | 2021–2024 |
Developers | Institute of Mathematics and Computer Science UL |
Funding | State Research Programme "Letonika – Fostering a Latvian and European Society" (VPP-LETONIKA-2021/1-0006) |
CLARIN | http://hdl.handle.net/20.500.12574/114 |