Slovak Web Discussion Corpus

The corpus includes a complete set of web discussions about various topics from a single site. Each discussion is marked with a topic and talking person and is assigned to a specific section. The corpus includes an index for easy searching using regular expressions. Text of the discussions is processed with our tools for word tokenization, sentence boundary detection and morphological analysis. Token annotations include a correct word, proposed by a statistical correction system.

Data and Resources

This dataset has no data

Additional Info

Field Value
Source https://dihtechnicom.tuke.sk/#contact
Author DIH Technicom
Last Updated October 24, 2023, 20:12 (UTC)
Created October 24, 2023, 20:12 (UTC)
Issued
Modified
creator
id_euhubs4data a8e63f06e07ea842ac78abaaa419c0a968d29e1a2879e4b72229d44c42490919_UCSGN7FPQCVD6T6KNRQMX62CSTYFCIOASFCYQBECROSS32EDTPIXRE5K
idsExtraInfo https://euhub4data-graphs.itainnova.es/dataset/dcat#Dataset_097595f3-3aaa-46ac-8846-8cd08e23fa81
is_repo 0
landing_page
privacy No personal data
rdf_url
spatial
team DIH Technicom