Latvian National Corpora Collection
Latvian National Corpora Collection (LNCC) is a diverse collection of corpora representing both written and spoken language. LNCC covers various use cases and all the important text types and genres. It is a continuous multi-institutional and multi-project effort, supported by the digital humanities and language technology communities in Latvia.
Currently, 30 corpora developed by 13 institutions are available in the LNCC. Most of the corpora are annotated with a uniform morpho-syntactic annotation scheme and included in the federated search. The federated search combines multiple corpora from two corpus indexer instances (endpoints) maintained by IMCS UL and NLL. Federated search includes 24 corpora (2.1 billions tokens)
National Research Programme "Digital Resources of the Humanities" (VPP-IZM-DH-2020/1-0001; 2020–2022)
European Regional Development Fund (126.96.36.199/16/A/219; 2017–2019)
European Regional Development Fund (188.8.131.52/18/I/016; 2018–2020)
Funding for the development of the corpus conception, Balanced Corpus of Modern Latvian, etc. (2005–2022)