TEDxSK and JumpSK Lecture Speech Corpus

TEDxSK and JumpSK is a new Slovak spoken language resource built from TEDx and Jump Slovensko lectures. The presented speech corpus consists of 220 lectures in total duration of 58 hours. Annotated speech corpus was generated automatically, in an unsupervised manner, by using acoustic speech segmentation based on a principal component analysis and automatic speech transcription using two complementary speech recognition systems. For evaluation of quality of automatic transcription of speech, an evaluation set composed of 50 lectures, in total duration of 12 hours with manual transcription, has been created.

Data and Resources

This dataset has no data

Additional Info

Field Value
Source https://dihtechnicom.tuke.sk/#contact
Author DIH Technicom
Last Updated May 25, 2023, 18:56 (UTC)
Created May 25, 2023, 18:56 (UTC)
id_euhubs4data b2d515186125ff70dedf3f93739ecf1c77691eef371b3e34023966b76030e204_UCSGN7FPQCVD6T6KNRQMX62CSTYFCIOASFCYQBECROSS32EDTPIXRE5K
idsExtraInfo https://euhub4data-graphs.itainnova.es/dataset/dcat#Dataset_d253878e-8324-4b65-9cba-7016f31f9226
privacy Personal data: Data made publicly available by data subjects
team DIH Technicom