Subtitri Search Word Frequency List
Latvian Subtitles of Public Broadcasting
The corpus contains subtitles from various Latvian public media broadcasts (2015–2020) – shows, movies, series, etc. Each has a title, publication date, and a URL where it can be watched. All recordings also indicate the audio language of the broadcast and whether the broadcast was originally recorded in the specified language or dubbed.
|Corpus size||1200 hours (10.8M tokens)|
|Developers||Institute of Mathematics and Computer Science UL|