fonLATE Search Word Frequency List

LATE Phonetically Annotated Speech Corpus

A small subset of phonetically annotated data has been derived from the LATE-sarunas and LATE-media. The phonetic annotation is available at two levels: (1) the dictionary or standard pronunciation of a word or segment, regardless of its actual pronunciation made by the particular speaker, and (2) the actual pronunciation of a word or segment.

Citation

Data

I. Auziņa, G. Rābante-Buša, R. Darģis
LATE Phonetically Annotated Speech Corpus (fonLATE)
CLARIN-LV digital library, 2024
http://hdl.handle.net/20.500.12574/115

speech (10) specialised (35) morphology (41) manually annotated (9)

Corpus size	4 hours (48k tokens)
Data period	2012–2024
Development period	2024
Developers	Institute of Mathematics and Computer Science UL
Funding	State Research Programme "Letonika – Fostering a Latvian and European Society" (VPP-LETONIKA-2021/1-0006)
CLARIN	http://hdl.handle.net/20.500.12574/115