LVK2018 Search Word Frequency List
The Balanced Corpus of Modern Latvian
LVK2018 is the representative 10 million word corpus of contemporary Latvian. LVK2018 is an extended version of LVK2013 based on slightly modified corpus design criteria that were also applied for the previous corpora from the LVK series. LVK2018 is designed as a general language, representative and balanced corpus that aims to cover the variety of existing texts in certain estimated proportions. The corpus consists of five different sections: journalism (60%), fiction (20%), scientific (10%), legal (8%), parliamentary transcripts (2%).
Līdzsvarotais mūsdienu latviešu valodas tekstu korpuss, tā nozīme gramatikas pētījumos
Language: Meaning and Form (The Balanced Corpus of Modern Latvian, its role in grammar studies), 10, 131-146, 2019
|Corpus size||10M words (12M tokens)|
|Developers||Institute of Mathematics and Computer Science UL|
|Funding||The European Regional Development Fund (188.8.131.52/16/A/219); Latvian Language Agency|