Slovak Categorized News Corpus

This corpus is an attempt to create a representative sample of the contemporary Slovak language from various domains with easy searching and automated processing. It contains a selection of news articles, processed by our NLP tools. The corpus consists of two parts. The first part contains text files and annotations: -Token boundary identification -Sentence boundary identification -Stop-Words -Morphological Analysis -Named Entity Recognition -Named Entity Transcription -Lemma The second part contains an evaluation for information retrieval.

Data and Resources

This dataset has no data

Additional Info

Field Value
Source https://dihtechnicom.tuke.sk/#contact
Author DIH Technicom
Last Updated May 25, 2023, 18:56 (UTC)
Created May 25, 2023, 18:56 (UTC)
id_euhubs4data f8343723dd2f2d5d5c49ce38356c077be651f2c40f725e19f44b036a0d63bdd7_UCSGN7FPQCVD6T6KNRQMX62CSTYFCIOASFCYQBECROSS32EDTPIXRE5K
idsExtraInfo https://euhub4data-graphs.itainnova.es/dataset/dcat#Dataset_6cb2a6a6-008f-4a9c-99e6-4ff28c612209
privacy No personal data
team DIH Technicom