Tvitēdiens  Search Word Frequency List

Latvian Twitter Eater Corpus

The Latvian Twitter Eater Corpus is a set of posts from the social media portal Twitter in the narrow domain related to food, drinks, eating and drinking. The corpus has been actively collected since 2011 and includes over 3 million tweets written in Latvian.

Citation
Publication
Sproģis, Uga, Rikters, Matīss
What can we learn from almost a decade of food tweets
Human Language Technologies--The Baltic Perspective, IOS Press, 2020
DOI
Corpus size 42M words (56M tokens)
Data period 2007–2025
Development period 2020-2025
Developers The University of Tokyo, University of Latvia Faculty of Computing, National Institute of Advanced Industrial Science and Technology
Funding Project JPNP20006, commissioned by the New Energy and Industrial Technology Development Organization (NEDO), "Strengthening of the capacity of doctoral studies at the UL within the framework of the new doctoral model” (No. 8.2.2.0/20/I/006)
Homepage https://twitediens.lv/
Other publications
Rikters, Matiss, Vīksna, Rinalds, Marrese-Taylor, Edison
Annotations for Exploring Food Tweets from Multiple Aspects
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA and ICCL, 2024
PDF