Latvian Treebank

Latvian Treebank is a balanced manually and syntactically annotated text corpus. It employs a hybrid dependency-constituency model.

L. Rituma, B. Saulite, G. Nespore-Berzkalne
Latviešu valodas sintaktiski marķētā korpusa gramatikas modelis
Language: Meaning and Form (The grammar model of Latvian Treebank), 10, 200-216, 2019
L. Rituma, L. Pretkalniņa, B. Saulīte, G. Nešpore-Bērzkalne, N. Grūzītis, A. Znotiņš
Latvian Treebank (LVTB)
CLARIN-LV digital library, 2024
Corpus size 18850 sentences (300K tokens) (v2.14)
Data period 1991–2022
Development period 2010–2024
Developers Institute of Mathematics and Computer Science UL
Funding European Regional Development Fund, "Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian" (, National Research Programme "National identity"; State Research Programme "Digital Resources of the Humanities" (VPP-IZM-DH-2020/1-0001); State Research Programme "Research on Modern Latvian Language and Development of Language Technology" (VPP-LETONIKA-2021/1-0006)
Other publications
L. Lauze and I. Auzina
Korpusu un individuālā vākuma salīdzinājums: ģenitīva un nominatīva konkurence saistījumā ar adverbu
Valoda: nozīme un forma (A comparison of corpora and individual collection: Genitive and nominative competition in connection with an adverb), 12, 111-125, 2023
L. Rituma, G. Nespore-Berzkalne, B. Saulite, L. Pretkalnina
Vārdkopas analogi „Latviešu valodas sintaktiski marķētajā korpusā”
Valoda: nozīme un forma (Analogue of subordinate phrase in Latvian Treebank), 156-173, 2023
L. Pretkalnina, L. Rituma, B. Saulite
Deriving Enhanced Universal Dependencies from a Hybrid Dependency-Constituency Treebank
Springer, 2018
N. Gruzitis, L. Pretkalnina, B. Saulite, L. Rituma, G. Nespore-Berzkalne, A. Znotins, P. Paikens
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU
L. Pretkalnina and L. Rituma
Constructions in Latvian Treebank: the impact of annotation decisions on the dependency parsing performance
IOS Press, 2014
L. Pretkalnina and L. Rituma
Syntactic issues identified developing the Latvian treebank
IOS Press, 2012