From d5280e1a82ef92351bcc56273b2f805d8a1b907d Mon Sep 17 00:00:00 2001 From: dano Date: Mon, 4 May 2020 15:20:36 +0000 Subject: [PATCH] Update 'pages/topics/resources/README.md' --- pages/topics/resources/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/pages/topics/resources/README.md b/pages/topics/resources/README.md index 75f64687..4b86007f 100644 --- a/pages/topics/resources/README.md +++ b/pages/topics/resources/README.md @@ -43,6 +43,7 @@ Europarlament - [Aranea](http://ucts.uniba.sk/aranea_about/) - [SkTenTen](https://www.sketchengine.eu/sktenten-slovak-corpus/) automaticky POS anotovaný, prístup cez web rozhranie - [CommonCrawl](https://commoncrawl.org/2020/03/february-2020-crawl-archive-now-available/) Obsahuje aj slovenské dáta? +- [Oscar](https://traces1.inria.fr/oscar/) klasifikácia a deduplikácia dát z COmmonCrawl, aj pre slovenčinu (4.5 GB dedub, 665M slov dedup.) ### Wikipedia