Slovak Categorized News Corpus

This corpus is an attempt to create a representative sample of the contemporary Slovak language from various domains with easy searching and automated processing. It contains a selection of news articles, processed by our NLP tools. The corpus consists of two parts. The first part contains text files and annotations: -Token boundary identification -Sentence boundary identification -Stop-Words -Morphological Analysis -Named Entity Recognition -Named Entity Transcription -Lemma The second part contains an evaluation for information retrieval.

Data and Resources

This dataset has no data

Additional Info

Field Value
Author DIH Technicom
Last Updated May 25, 2023, 18:56 (UTC)
Created May 25, 2023, 18:56 (UTC)
id_euhubs4data f8343723dd2f2d5d5c49ce38356c077be651f2c40f725e19f44b036a0d63bdd7_UCSGN7FPQCVD6T6KNRQMX62CSTYFCIOASFCYQBECROSS32EDTPIXRE5K
privacy No personal data
team DIH Technicom