From 72c65b5d1506a53dbe7508d365ff203765d225c4 Mon Sep 17 00:00:00 2001 From: dano Date: Thu, 16 Apr 2020 06:26:53 +0000 Subject: [PATCH] Update 'pages/topics/resources/README.md' --- pages/topics/resources/README.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/pages/topics/resources/README.md b/pages/topics/resources/README.md index 193e1c78b..75f646875 100644 --- a/pages/topics/resources/README.md +++ b/pages/topics/resources/README.md @@ -50,7 +50,9 @@ Europarlament ### Word Embedding -[FastText Word Embedding from Common Crawl](https://fasttext.cc/docs/en/crawl-vectors.html) +- [FastText Word Embedding from Common Crawl](https://fasttext.cc/docs/en/crawl-vectors.html) +- [FastText Word Embedding from Wikipedia](https://fasttext.cc/docs/en/pretrained-vectors.html) + ### Databázy zdrojov @@ -62,3 +64,11 @@ http://www.meta-share.org/ https://korpus.sk/res.html +Slovak Stemming https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Slovak_Stemmer_Analysis + +### Tools + +- [Spacy](https://spacy.io/), tokenizer, stopwords, custom model +- [Slovak Lexer](https://github.com/hladek/slovak-lexer) / tokenizer +- [Slovak Elasticsearch](https://github.com/essential-data/elasticsearch-sk) - stopwords, stemmer +- [Slovak Hunspell](https://github.com/essential-data/hunspell-sk) - stemmer, spelling