Subtitri  Search Word Frequency List

Latvian Subtitles of Public Broadcasting

The corpus contains subtitles from various Latvian public media broadcasts (2015–2020) – shows, movies, series, etc. Each has a title, publication date, and a URL where it can be watched. All recordings also indicate the audio language of the broadcast and whether the broadcast was originally recorded in the specified language or dubbed.

Corpus size 1200 hours (10.8M tokens)
Development period 2020–2022
Developers Institute of Mathematics and Computer Science UL