text (35)
speech (10)
general (11)
specialised (34)
morphology (40)
syntax (3)
semantics (1)
error annotation (2)
manually annotated (9)
diachronic (7)
web (3)
learner (2)
literary (5)
parallel (1)
parliamentary (1)
historical (2)
newspapers (5)
representative (9)
latgalian (5)
blog (2)
folklore (3)
Corpora with tag speech (10)
Order by:
LATE-sarunas
LATE Conversational Speech Corpus
2012–2024, 44 hours (429k tokens)
Developers: IMCS UL, ILFA UL
BalsuTalka
Balsutalka.lv Speech Corpus (Common Voice 17.0)
2023–2024, 277 hours (1.3M tokens)
Developers: IMCS UL, ILFA UL, LATA
BolsuTolka
Bolsutolka.lv Speech Corpus (Common Voice 19.0)
2023–2024, 29 hours (160k tokens)
Developers: IMCS UL, RTU Rezekne, ILFA UL, LATA
B. Saulīte, R. Darģis, N. Grūzītis, I. Auziņa, K. Levāne-Petrova, L. Pretkalniņa, L. Rituma, P. Paikens, A. Znotiņš, L. Strankale, K. Pokratniece, I. Poikāns, G. Bārzdiņš, I. Skadiņa, A. Baklāne, V. Saulespurēns, J. Ziediņš.
Latvian National Corpora Collection – Korpuss.lv
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022, pp. 5123–5129
Latvian National Corpora Collection – Korpuss.lv
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022, pp. 5123–5129