BalsuTalka Search Word Frequency List
Balsutalka.lv Speech Corpus (Common Voice 14.0)
Latvian speech corpus collected during the crowdsourcing activity "Balsu talka", in which the pre-selected sentences were spoken by thousands of people of different ages and nationalities, both from Latvia and from the diaspora. The Mozilla Common Voice platform is used for data collection.
|Corpus size||136 hours (817k tokens)|
|Developers||Institute of Mathematics and Computer Science UL, Institute of Literature, Folklore and Art UL, Latvian Open Technologies Association|
|Funding||EU Recovery and Resilience Facility "Language Technology Initiative" (18.104.22.168.i.0/1/22/I/CFLA/002); State Research Programme "Letonika – Fostering a Latvian and European Society" (VPP-LETONIKA-2021/1-0006)|