LATE-sarunas  Search Word Frequency List

LATE-conversational

Corpus contains recordings of informal conversations, interviews and public speeches and their transcripts in orthographic transcription. Metadata has been added to each audio recording: gender and age group of the speaker, information about the form of speech – dialogue, monologue, spontaneous or prepared speech, etc.

Corpus size 35 hours (347k tokens)
Data period 2012–2024
Development period 2021–2024
Developers Institute of Mathematics and Computer Science UL, Institute of Literature, Folklore and Art UL
Funding State Research Programme "Letonika – Fostering a Latvian and European Society" (VPP-LETONIKA-2021/1-0006)