dmytro_ushatenko/pages/topics/resources/README.md

60 lines
1.7 KiB
Markdown
Raw Normal View History

2020-03-02 13:49:10 +00:00
# Slovenské jazykové zdroje
### POS
2020-03-12 08:55:20 +00:00
[Multext East](http://nl.ijs.si/ME/) Anotovaný román George Orwell 1984 v 15 európskych jazykoch
2020-03-02 13:49:10 +00:00
### NER
- Learning multilingual named entity recognition from Wikipedia- WIKI Ner?
- Cross-lingual Name Tagging and Linking for 282 Languages - NER anotácia aj slovenskej Wikipédie podľa anglickej
- https://drive.google.com/drive/folders/1bkK6ly_awxe9IgAKL16VVvCtjcYcDSw8
- https://elisa-ie.github.io/wikiann/
2020-03-02 13:49:10 +00:00
### Parsing-POS
[Slovak Dependency Treebank](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1822)
https://github.com/UniversalDependencies/UD_Slovak-SNK
[Artificial Treebank with Ellipsis](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2616)
### Wordnet
[Slovak Word Net](https://korpus.sk/WordNet.html)
### Parellel Corpus
Europarlament
[Czech-Slovak Parallel Corpus](https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0006-AADF-0)
[English-Slovak Parallel Corpus](https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0006-AAE0-A)
[Multext East](http://nl.ijs.si/ME/)
2020-03-12 08:55:20 +00:00
### Sentiment
2020-03-02 13:49:10 +00:00
[Twitter sentiment for 15 European languages](https://www.clarin.si/repository/xmlui/handle/11356/1054)
### Web
- [Aranea](http://ucts.uniba.sk/aranea_about/)
- [SkTenTen](https://www.sketchengine.eu/sktenten-slovak-corpus/) automaticky POS anotovaný, prístup cez web rozhranie
2020-03-02 13:49:10 +00:00
2020-03-12 08:55:20 +00:00
### Wikipedia
[Wikipedia vo formáte JSON Elasticsearch Bulk](https://dumps.wikimedia.org/other/cirrussearch/current/)
2020-03-02 13:49:10 +00:00
### Databázy zdrojov
https://www.clarin.eu/portal
https://www.clarin.eu/resource-families/manually-annotated-corpora
http://www.meta-share.org/
https://korpus.sk/res.html