diff --git a/pages/topics/resources/README.md b/pages/topics/resources/README.md index 75f64687..4b86007f 100644 --- a/pages/topics/resources/README.md +++ b/pages/topics/resources/README.md @@ -43,6 +43,7 @@ Europarlament - [Aranea](http://ucts.uniba.sk/aranea_about/) - [SkTenTen](https://www.sketchengine.eu/sktenten-slovak-corpus/) automaticky POS anotovaný, prístup cez web rozhranie - [CommonCrawl](https://commoncrawl.org/2020/03/february-2020-crawl-archive-now-available/) Obsahuje aj slovenské dáta? +- [Oscar](https://traces1.inria.fr/oscar/) klasifikácia a deduplikácia dát z COmmonCrawl, aj pre slovenčinu (4.5 GB dedub, 665M slov dedup.) ### Wikipedia