Corpora with tag manually annotated (9)

LATE-sarunas

LATE-conversational

2012–2024, 44 hours (429k tokens)
Developers: IMCS, UL, ILFA UL

LVTB

Latvian Treebank

1991–2023, 19367 sentences (328K tokens) (v2.15)
Developers: IMCS UL

BolsuTolka

Bolsutolka.lv Speech Corpus (Common Voice 19.0)

2023–2024, 29 hours (160k tokens)
Developers: RATA, IMCS, UL, ILFA UL, LATA

fonLATE

LATE Phonetically Annotated Speech Corpus

2012–2024, 4 hours (48k tokens)
Developers: IMCS UL

FullStack-LV

Full Stack of Latvian Language Resources

1991–2018, 13691 sentences
Developers: IMCS UL

LATE-mediji

LATE-media

2015–2020, 78 hours (682k tokens)
Developers: IMCS UL

LaVA

Latvian Language Learner Corpus

2018–2021, 192k words (241k tokens)
Developers: IMCS UL

MuLaR

Corpus of Contemporary Latgalian Speech

2009–2021, 27 hours (200k tokens)
Developers: RAT

UDLV-LVTB

Latvian UD Treebank

1991–2023, 19368 sentences (328K tokens) (v2.15)
Developers: IMCS UL
B. Saulīte, R. Darģis, N. Grūzītis, I. Auziņa, K. Levāne-Petrova, L. Pretkalniņa, L. Rituma, P. Paikens, A. Znotiņš, L. Strankale, K. Pokratniece, I. Poikāns, G. Bārzdiņš, I. Skadiņa, A. Baklāne, V. Saulespurēns, J. Ziediņš.
Latvian National Corpora Collection – Korpuss.lv
Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022, pp. 5123–5129
PDF   BibTeX