LVMED  Search Word Frequency List

Latvian Radiology Speech Corpus

An anonymised text corpus of digital imaging reports – manual transcriptions of examination dictations. The corpus covers the following modalities: computed tomography, magnetic resonance, mammography, computed radiography (x-ray) and ultrasound.

Publication to be cited:
R. Dargis, N. Gruzitis, I. Auzina, K. Stepanovs
Creation of Language Resources for the Development of a Medical Speech Recognition System for Latvian
IOS Press, 2020
PDF DOI
Corpus size 35 hours (157k tokens)
Development period 2022
Developers Institute of Mathematics and Computer Science UL, Riga East University Hospital
Funding European Regional Development Fund (1.1.1.1/18/A/153)
Other publications
I. Auzina, R. Dargis, B. Saulite, N. Gruzitis, M. Grasmanis, A. Spektors, K. Stepanovs
Specializēta latviešu valodas runas korpusa un izrunas vārdnīcas izveide vizuālās diagnostikas izmeklējumu lingvistiskai analīzei un sistemātiskai transkribēšanai
Letonica, 244-262, 2022
PDF