Corpora with tag general (6)

LVK2018

The Balanced Corpus of Modern Latvian

2016–2018, 10M words (12M tokens)
Developers: IMCS UL

LVTB

Latvian Treebank

2010–2022, 16951 sentences (285 425 tokens)
Developers: IMCS UL

FullStack-LV

Full Stack of Latvian Language Resources

2017–2019, 13691 sentences
Developers: IMCS UL

Hugo.lv

Hugo.lv Parallel Corpora

2018, 10.5M tokens
Developers: KISC

LRK2013

Latvian Speech Recognition Corpus

2013, 100 hours (1.1M tokens)
Developers: IMCS UL, Tilde, LETA

UDLV-LVTB

Latvian UD Treebank

2015–2022, 16951 sentences (285 425 tokens)
Developers: IMCS UL
B. Saulīte, R. Darģis, N. Grūzītis, I. Auziņa, K. Levāne-Petrova, L. Pretkalniņa, L. Rituma, P. Paikens, A. Znotiņš, L. Strankale, K. Pokratniece, I. Poikāns, G. Bārzdiņš, I. Skadiņa, A. Baklāne, V. Saulespurēns, J. Ziediņš.
Latvian National Corpora Collection – Korpuss.lv
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022, pp. 5123–5129
PDF   BibTeX