TEDxSK and JumpSK Lecture Speech Corpus

TEDxSK and JumpSK is a new Slovak spoken language resource built from TEDx and Jump Slovensko lectures. The presented speech corpus consists of 220 lectures in total duration of 58 hours. Annotated speech corpus was generated automatically, in an unsupervised manner, by using acoustic speech segmentation based on a principal component analysis and automatic speech transcription using two complementary speech recognition systems. For evaluation of quality of automatic transcription of speech, an evaluation set composed of 50 lectures, in total duration of 12 hours with manual transcription, has been created.

Data and Resources

This dataset has no data

Additional Info

Field Value
Source https://dihtechnicom.tuke.sk/#contact
Author DIH Technicom
Last Updated October 24, 2023, 20:12 (UTC)
Created October 24, 2023, 20:12 (UTC)
id_euhubs4data b2d515186125ff70dedf3f93739ecf1c77691eef371b3e34023966b76030e204_UCSGN7FPQCVD6T6KNRQMX62CSTYFCIOASFCYQBECROSS32EDTPIXRE5K
idsExtraInfo https://euhub4data-graphs.itainnova.es/dataset/dcat#Dataset_ab5a68e0-9412-4a71-bbbd-b58374ef20a0
is_repo 0
privacy Personal data: Data made publicly available by data subjects
team DIH Technicom