diff --git a/pages/topics/resources/README.md b/pages/topics/resources/README.md index 193e1c78..75f64687 100644 --- a/pages/topics/resources/README.md +++ b/pages/topics/resources/README.md @@ -50,7 +50,9 @@ Europarlament ### Word Embedding -[FastText Word Embedding from Common Crawl](https://fasttext.cc/docs/en/crawl-vectors.html) +- [FastText Word Embedding from Common Crawl](https://fasttext.cc/docs/en/crawl-vectors.html) +- [FastText Word Embedding from Wikipedia](https://fasttext.cc/docs/en/pretrained-vectors.html) + ### Databázy zdrojov @@ -62,3 +64,11 @@ http://www.meta-share.org/ https://korpus.sk/res.html +Slovak Stemming https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Slovak_Stemmer_Analysis + +### Tools + +- [Spacy](https://spacy.io/), tokenizer, stopwords, custom model +- [Slovak Lexer](https://github.com/hladek/slovak-lexer) / tokenizer +- [Slovak Elasticsearch](https://github.com/essential-data/elasticsearch-sk) - stopwords, stemmer +- [Slovak Hunspell](https://github.com/essential-data/hunspell-sk) - stemmer, spelling