Slovak Categorized News Corpus

This corpus is an attempt to create a representative sample of the contemporary Slovak language from various domains with easy searching and automated processing. It contains a selection of news articles, processed by our NLP tools. The corpus consists of two parts. The first part contains text files and annotations: -Token boundary identification -Sentence boundary identification -Stop-Words -Morphological Analysis -Named Entity Recognition -Named Entity Transcription -Lemma The second part contains an evaluation for information retrieval.

Data and Resources

This dataset has no data

Additional Info

Field Value
Source https://dihtechnicom.tuke.sk/#contact
Author DIH Technicom
Last Updated October 24, 2023, 20:12 (UTC)
Created October 24, 2023, 20:12 (UTC)
Issued
Modified
creator
id_euhubs4data f8343723dd2f2d5d5c49ce38356c077be651f2c40f725e19f44b036a0d63bdd7_UCSGN7FPQCVD6T6KNRQMX62CSTYFCIOASFCYQBECROSS32EDTPIXRE5K
idsExtraInfo https://euhub4data-graphs.itainnova.es/dataset/dcat#Dataset_a0baae65-c10c-4007-b9f4-410a639548bf
is_repo 0
landing_page
privacy No personal data
rdf_url
spatial
team DIH Technicom