LVK2022 Search Word Frequency List
The Balanced Corpus of Modern Latvian
A general corpus of Latvian language texts, that aims to cover the variety of contemporary texts (written since 2000) in certain estimated proportions. This is the initial version (alpha) of the corpus, which will still be supplemented with fiction data.
Corpus size | 86.5M words (110M tokens) |
Development period | 2019–2022 |
Developers | Institute of Mathematics and Computer Science UL |
Funding | Latvian Language Agency |