LVK2018 Search Word Frequency List
The Balanced Corpus of Modern Latvian
LVK2018 is the representative 10 million word corpus of contemporary Latvian. LVK2018 is an extended version of LVK2013 based on slightly modified corpus design criteria that were also applied for the previous corpora from the LVK series. LVK2018 is designed as a general language, representative and balanced corpus that aims to cover the variety of existing texts in certain estimated proportions. The corpus consists of five different sections: journalism (60%), fiction (20%), scientific (10%), legal (8%), parliamentary transcripts (2%).
Citation
Publication
K. Levane-Petrova
Līdzsvarotais mūsdienu latviešu valodas tekstu korpuss, tā nozīme gramatikas pētījumos
Language: Meaning and Form (The Balanced Corpus of Modern Latvian, its role in grammar studies), 10, 131-146, 2019
Līdzsvarotais mūsdienu latviešu valodas tekstu korpuss, tā nozīme gramatikas pētījumos
Language: Meaning and Form (The Balanced Corpus of Modern Latvian, its role in grammar studies), 10, 131-146, 2019
Data
K. Levāne-Petrova, R. Darģis
The Balanced Corpus of Modern Latvian (LVK2018)
CLARIN-LV digital library, 2018
http://hdl.handle.net/20.500.12574/11
The Balanced Corpus of Modern Latvian (LVK2018)
CLARIN-LV digital library, 2018
http://hdl.handle.net/20.500.12574/11
Corpus size | 10M words (12M tokens) |
Data period | 1991–2018 |
Development period | 2016–2018 |
Developers | Institute of Mathematics and Computer Science UL |
Funding | The European Regional Development Fund (1.1.1.1/16/A/219); Latvian Language Agency |
CLARIN | http://hdl.handle.net/20.500.12574/11 |
Other publications |