UDLV-LVTB  Search Word Frequency List

Latvian UD Treebank

The corpus ir annotated using UD dependency grammar. The data is converted form the manually annotated Latvian Treebank.

Publication to be cited:
L. Pretkalnina, L. Rituma, B. Saulite
Deriving Enhanced Universal Dependencies from a Hybrid Dependency-Constituency Treebank
Springer, 2018
PDF DOI
Corpus size 16951 sentences (285 425 tokens)
Development period 2015–2022
Developers Institute of Mathematics and Computer Science UL
Funding European Regional Development Fund, "Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian" (1.1.1.1/16/A/219); PostDoc grant No. 1.1.1.2/VIAA/1/16/118; State Research Programme "Digital Resources of the Humanities" (VPP-IZM-DH-2020/1-0001); State Research Programme "Research on Modern Latvian Language and Development of Language Technology" (VPP-LETONIKA-2021/1-0006)
Homepage http://sintakse.korpuss.lv/
CLARIN http://hdl.handle.net/11234/1-4611
Other publications
N. Gruzitis, L. Pretkalnina, B. Saulite, L. Rituma, G. Nespore-Berzkalne, A. Znotins, P. Paikens
Creation of a Balanced State-of-the-Art Multilayer Corpus for NLU
2018
PDF
L. Pretkalnina, L. Rituma, B. Saulite
Universal Dependency treebank for Latvian: A pilot
IOS Press, 2016
PDF DOI