NKK | Korpuss.lv

text (36) speech (10) general (11) specialised (35) morphology (41) syntax (3) semantics (1) error annotation (2) manually annotated (9) diachronic (7) web (3) learner (2) literary (5) parallel (1) parliamentary (1) historical (2) newspapers (5) representative (9) latgalian (5) blog (3) folklore (3)

Corpora with tag manually annotated (9)

LATE-sarunas

LATE Conversational Speech Corpus

2012–2024, 44 hours (429k tokens)

Developers: IMCS UL, ILFA UL

LVTB

Latvian Treebank

1991–2024, 19580 sentences (330K tokens) (v2.18)

Developers: IMCS UL

BolsuTolka

Bolsutolka.lv Speech Corpus (Common Voice 19.0)

2023–2024, 29 hours (160k tokens)

Developers: IMCS UL, RTU Rezekne, ILFA UL, LATA

fonLATE

LATE Phonetically Annotated Speech Corpus

2012–2024, 4 hours (48k tokens)

Developers: IMCS UL

FullStack-LV

Full Stack of Latvian Language Resources

1991–2018, 13691 sentences

Developers: IMCS UL

LATE-mediji

LATE Media Speech Corpus

2015–2020, 78 hours (682k tokens)

Developers: IMCS UL

LaVA

Latvian Language Learner Corpus

2018–2021, 192k words (241k tokens)

Developers: IMCS UL

MuLaR

Corpus of Contemporary Latgalian Speech

2009–2021, 27 hours (200k tokens)

Developers: RTU Rezekne

UDLV-LVTB

Latvian UD Treebank

1991–2024, 19580 sentences (330K tokens) (v2.18)

Developers: IMCS UL

R. Darģis, B. Saulīte
Korpuss.lv – a Versatile Platform for Digital Humanities
Baltic Journal of Modern Computing, 12(4), 2024, pp. 636–645

PDF BibTeX

B. Saulīte, I. Auziņa, R. Darģis
Latvian National Corpora Collection Korpuss.lv | Nacionālā korpusu kolekcija Korpuss.lv
Linguistica Lettica, 31(1), 2023, pp. 202–223

PDF BibTeX

B. Saulīte, R. Darģis, N. Grūzītis, I. Auziņa, K. Levāne-Petrova, L. Pretkalniņa, L. Rituma, P. Paikens, A. Znotiņš, L. Strankale, K. Pokratniece, I. Poikāns, G. Bārzdiņš, I. Skadiņa, A. Baklāne, V. Saulespurēns, J. Ziediņš.
Latvian National Corpora Collection – Korpuss.lv
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022, pp. 5123–5129

PDF BibTeX