text (35)
speech (10)
general (11)
specialised (34)
morphology (39)
syntax (3)
semantics (1)
error annotation (2)
manually annotated (9)
diachronic (7)
web (2)
learner (2)
literary (5)
parallel (1)
parliamentary (1)
historical (2)
newspapers (5)
representative (9)
latgalian (5)
blog (2)
folklore (3)
Corpora with tag text (35)
Order by:
FullStack-LV
Full Stack of Latvian Language Resources
1991–2018, 13691 sentences
Developers: IMCS UL
Latvju-dainas
Latvian Dainas
1770–1903, 680k words (921k tokens)
Developers: ILFA UL, Lursoft, UL DHC
Latvju-dainas-ltg
Latvian dainas (in Latgalian)
1770–1903, 18,000 words (24,000 tokens)
Developers: ILFA UL, Lursoft, UL DHC
LPT-teikas
Folk Legend Corpus of LPT
1925–1937, 503k words (616k tokens)
Developers: UL DHC, ILFA UL, IMCS UL
Pārspriedumi
Corpus of Students' Essays
2018, 185k words (226k tokens)
Developers: IMCS UL, RTU Liepaja, RTU Rezekne
Satori-Punctum
"Satori" and "Punctum" Fiction Corpus
2003–2025, 3.5M words (4.3M tokens)
Developers: IMCS UL
Tīmeklis2020
CommonCrawl of Latvian 2020
2013–2022, 403.6M words (492.6M tokens)
Developers: IMCS UL
B. Saulīte, R. Darģis, N. Grūzītis, I. Auziņa, K. Levāne-Petrova, L. Pretkalniņa, L. Rituma, P. Paikens, A. Znotiņš, L. Strankale, K. Pokratniece, I. Poikāns, G. Bārzdiņš, I. Skadiņa, A. Baklāne, V. Saulespurēns, J. Ziediņš.
Latvian National Corpora Collection – Korpuss.lv
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022, pp. 5123–5129
Latvian National Corpora Collection – Korpuss.lv
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022, pp. 5123–5129