UDLV-LVTB Search Word Frequency List

Latvian UD Treebank

The corpus ir annotated using UD dependency grammar. The data is converted form the manually annotated Latvian Treebank.

Citation

Publication

L. Pretkalnina, L. Rituma, B. Saulite
Deriving Enhanced Universal Dependencies from a Hybrid Dependency-Constituency Treebank
Text, Speech, and Dialogue, Springer, 2018

PDF DOI

text (36) general (11) representative (9) morphology (41) syntax (3) manually annotated (9)

Corpus size	19580 sentences (330K tokens) (v2.18)
Data period	1991–2024
Development period	2015–2026
Developers	Institute of Mathematics and Computer Science UL
Funding	European Regional Development Fund, "Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian" (1.1.1.1/16/A/219); PostDoc grant No. 1.1.1.2/VIAA/1/16/118; State Research Programme "Digital Resources of the Humanities" (VPP-IZM-DH-2020/1-0001); State Research Programme "Research on Modern Latvian Language and Development of Language Technology" (LATE – VPP-LETONIKA-2021/1-0006; DigiLATE – VPP-IZM-LETONIKA-2025/1-0004)
Homepage	http://sintakse.korpuss.lv/
CLARIN	http://hdl.handle.net/11234/1-6149
Other publications	L. Pretkalnina Formāls latviešu valodas gramatikas modelis un tā realizācija mašīnlasāmā sintakses korpusā 2023 PDF N. Gruzitis, L. Pretkalnina, B. Saulite, L. Rituma, G. Nespore-Berzkalne, A. Znotins, P. Paikens Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), 2018 PDF L. Pretkalnina, L. Rituma, B. Saulite Universal Dependency treebank for Latvian: A pilot Human Language Technologies - The Baltic Perspective, IOS Press, 2016 PDF DOI