LaVA  Search Word Frequency List

Latvian Language Learner Corpus

The corpus includes more than 1000 texts created by foreign Latvian language learners studying at Latvian higher education institutions for the first or second semester. The morphologically annotated texts have been checked manually; the language learners' errors have been manually annotated.

R. Dargis, I. Auzina, K. Levane-Petrova, I. Kaija
Quality Focused Approach to a Learner Corpus Development
I. Auziņa, I. Kaija, K. Levāne-Petrova, K. Pokratniece, R. Darģis
Latvian Language Learner Corpus (LaVA)
CLARIN-LV digital library, 2021
Corpus size 192k words (241k tokens)
Development period 2018–2021
Developers Institute of Mathematics and Computer Science UL
Funding Latvian Council of Science, "Development of Learner corpus of Latvian: methods, tools and applications" (lzp-2018/1-0527)
Other publications
I. Kaija and I. Auzina
Data collection for learner corpus of Latvian: copyright and personal data protection
Selected papers from the CLARIN Annual Conference 2019, 41-47, 2020
I. Auzina, I. Kaija, K. Levane-Petrova
Mērķhipotēžu izvirzīšana latviešu valodas apguvēju korpusā
Valoda: nozīme un forma, 11, 7-26, 2020
K. Levane-Petrova, I. Auzina, K. Pokratniece
Latviešu valodas apguvēju korpusa datu ieguves un apstrādes metodoloģijas izstrāde
LiePA, 2020