Slovak Web Discussion Corpus

The corpus includes a complete set of web discussions about various topics from a single site. Each discussion is marked with a topic and talking person and is assigned to a specific section. The corpus includes an index for easy searching using regular expressions. Text of the discussions is processed with our tools for word tokenization, sentence boundary detection and morphological analysis. Token annotations include a correct word, proposed by a statistical correction system.

Data and Resources

This dataset has no data

Additional Info

Field Value
Author DIH Technicom
Last Updated May 25, 2023, 18:56 (UTC)
Created May 25, 2023, 18:56 (UTC)
id_euhubs4data a8e63f06e07ea842ac78abaaa419c0a968d29e1a2879e4b72229d44c42490919_UCSGN7FPQCVD6T6KNRQMX62CSTYFCIOASFCYQBECROSS32EDTPIXRE5K
privacy No personal data
team DIH Technicom