NKK | Korpuss.lv

text (36) speech (10) general (11) specialised (35) morphology (41) syntax (3) semantics (1) error annotation (2) manually annotated (9) diachronic (7) web (3) learner (2) literary (5) parallel (1) parliamentary (1) historical (2) newspapers (5) representative (9) latgalian (5) blog (3) folklore (3)

Corpora with tag representative (9)

LVK2022

The Balanced Corpus of Modern Latvian

2000–2021, 101M words (123M tokens)

Developers: IMCS UL

LVTB

Latvian Treebank

1991–2024, 19580 sentences (330K tokens) (v2.18)

Developers: IMCS UL

FullStack-LV

Full Stack of Latvian Language Resources

1991–2018, 13691 sentences

Developers: IMCS UL

LiLa

Lithuanian-Latvian-Lithuanian Parallel Text Corpus

1982–2012, 8M words

Developers: IMCS UL, VMU

LRK2013

Latvian Speech Recognition Corpus

2005–2013, 100 hours (1.1M tokens)

Developers: IMCS UL, Tilde, LETA

LVK2018

The Balanced Corpus of Modern Latvian

1991–2018, 10M words (12M tokens)

Developers: IMCS UL

LVMED

Latvian Radiology Speech Corpus

2010–2022, 35 hours (157k tokens)

Developers: IMCS UL, REUH

MuLa2012

Corpus of Contemporary Latgalian Texts 2012

1988–2012, 1M words (1.3M tokens)

Developers: IMCS UL, RTU Rezekne

UDLV-LVTB

Latvian UD Treebank

1991–2024, 19580 sentences (330K tokens) (v2.18)

Developers: IMCS UL

R. Darģis, B. Saulīte
Korpuss.lv – a Versatile Platform for Digital Humanities
Baltic Journal of Modern Computing, 12(4), 2024, pp. 636–645

PDF BibTeX

B. Saulīte, I. Auziņa, R. Darģis
Latvian National Corpora Collection Korpuss.lv | Nacionālā korpusu kolekcija Korpuss.lv
Linguistica Lettica, 31(1), 2023, pp. 202–223

PDF BibTeX

B. Saulīte, R. Darģis, N. Grūzītis, I. Auziņa, K. Levāne-Petrova, L. Pretkalniņa, L. Rituma, P. Paikens, A. Znotiņš, L. Strankale, K. Pokratniece, I. Poikāns, G. Bārzdiņš, I. Skadiņa, A. Baklāne, V. Saulespurēns, J. Ziediņš.
Latvian National Corpora Collection – Korpuss.lv
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022, pp. 5123–5129

PDF BibTeX