LVK2022  Search Word Frequency List

The Balanced Corpus of Modern Latvian

A general corpus of Latvian language texts, that aims to cover the variety of contemporary texts (written since 2000) in certain estimated proportions. This is the initial version (alpha) of the corpus, which will still be supplemented with fiction data.

Corpus size 86.5M words (110M tokens)
Development period 2019–2022
Developers Institute of Mathematics and Computer Science UL
Funding Latvian Language Agency