merge

2020-11-13 09:04:42 +01:00 · 2020-11-13 09:04:42 +01:00 · f5455a89b3
commit f5455a89b3
parent 36dd4ec638 550b764fd9
40 changed files with 2860 additions and 48 deletions
--- a/pages/students/2016/darius_lindvai/README.md
+++ b/pages/students/2016/darius_lindvai/README.md
@ -13,11 +13,28 @@ Repozitár so [zdrojovými kódmi](https://git.kemt.fei.tuke.sk/dl874wn/dp2021)

 ## Diplomový projekt 2 2020

+Virtuálne stretnutie 6.11.2020
+
+Stav:
+
+- Vypracovaná tabuľka s 5 experimentami.
+- vytvorený repozitár.
+
+Na ďalšie stretnutie:
+
+- nahrať kódy na repozitár.
+- závislosťi (názvy balíčkov) poznačte do súboru requirements.txt.
+- Prepracujte experiment tak aby akceptoval argumenty z príkazového riadka. (sys.argv)
+- K experimentom zapísať skript na spustenie. V skripte by mali byť parametre s ktorými ste spustili experiment.
+- dopracujte report.
+- do teorie urobte prehľad metód punctuation restoration a opis Vašej metódy.
+
+
 Virtuálne stretnutie 25.9.2020

 Urobené:

- skript pre vyhodnotenie experimentov
+- skript pre vyhodnotenie experimentov.


 Úlohy do ďalšieho stretnutia:
--- a/pages/students/2016/jakub_maruniak/README.md
+++ b/pages/students/2016/jakub_maruniak/README.md
@ -21,8 +21,21 @@ Zásobník úloh:

 - Použiť model na podporu anotácie
 - Do konca ZS vytvoriť report vo forme článku.
- Vytvorte systém pre zistenie množstva a druhu anotovaných dát. Koľko článkov? Koľko entít jednotlivvých typov?
+- Spísať pravidlá pre validáciu. Aký výsledok anotácie je dobrý? Je potrebné anotované dáta skontrolovať?

+Virtuálne stretnutie 30.10.2020:
+
+Stav:
+
+- Vylepšený návod
+- Vyskúšaný export dát a trénovanie modelu z databázy. Problém pri trénovaní Spacy - iné výsledky ako cez Progigy trénovanie
+- Práca na textovej čsati.
+
+Úlohy do ďalšieho stretnutia:
+- Vytvorte si repozitár s názvom dp2021 a tam pridajte skripty a poznámky.
+- Pokračujte v písaní práce. Vykonajte prieskum literatúry "named entity corpora" aj poznámky.
+- Vytvorte systém pre zistenie množstva a druhu anotovaných dát. Koľko článkov? Koľko entít jednotlivvých typov? Výsledná tabuľka pôjde do práce.
+- Pripraviť sa na produkčné anotácie. Je schéma pripravená?

 Virtuálne stretnutie 16.10.2020:

--- a/pages/students/2016/jakub_maruniak/dp2021/README.md
+++ b/pages/students/2016/jakub_maruniak/dp2021/README.md
@ -1 +1,40 @@
-DP2021
+## Diplomový projekt 2 2020
+Stav:
+- aktualizácia anotačnej schémy (jedná sa o testovaciu schému s vlastnými dátami)
+- vykonaných niekoľko anotácii, trénovanie v Prodigy - nízka presnosť = malé množstvo anotovaných dát. Trénovanie v spacy zatiaľ nefunguje.
+- Štatistiky o množstve prijatých a odmietnutých anotácii získame z Prodigy: prodigy stats wikiart. Zatiaľ 156 anotácii (151 accept, 5 reject). Na získanie prehľadu o množstve anotácii jednotlivých entít potrebujeme vytvoriť skript.
+- Prehľad literatúry Named Entity Corpus
+    - Budovanie korpusu pre NER – automatické vytvorenie už anotovaného korpusu z Wiki pomocou DBpedia – jedná sa o anglický korpus, ale možno spomenúť v porovnaní postupov 
+        - Building a Massive Corpus for Named Entity Recognition using Free Open Data Sources - Daniel Specht Menezes, Pedro Savarese, Ruy L. Milidiú
+    - Porovnanie postupov pre anotáciu korpusu (z hľadiska presnosti aj času) - Manual, SemiManual
+        - Comparison of Annotating Methods for Named Entity Corpora - Kanako Komiya, Masaya Suzuki
+    - Čo je korpus, vývojový cyklus, analýza korpusu (Už využitá literatúra – cyklus MATTER)
+        - Natural Language Annotation for Machine Learning – James Pustejovsky, Amber Stubbs
+
+Aktualizácia 09.11.2020:
+- Vyriešený problém, kedy nefungovalo trénovanie v spacy
+- Vykonaná testovacia anotácia cca 500 viet. Výsledky trénovania pri 20 iteráciách: F-Score 47% (rovnaké výsledky pri trénovaní v Spacy aj Prodigy)
+- Štatistika o počte jednotlivých entít: skript count.py
+
+
+## Diplomový projekt 1 2020
+
+- vytvorenie a spustenie docker kontajneru
+
+
+```
+./build-docker.sh
+docker run -it -p 8080:8080 -v ${PWD}:/work prodigy bash
+# (v mojom prípade:)
+winpty docker run --name prodigy -it -p 8080:8080 -v C://Users/jakub/Desktop/annotation/work prodigy bash
+```
+
+
+
+
+### Spustenie anotačnej schémy
+- `dataminer.csv` články stiahnuté z wiki
+- `cd ner`
+- `./01_text_to_sent.sh` spustenie skriptu *text_to_sent.py*, ktorý rozdelí články na jednotlivé vety
+- `./02_ner_correct.sh` spustenie anotačného procesu pre NER s návrhmi od modelu 
+- `./03_ner_export.sh`  exportovanie anotovaných dát vo formáte jsonl potrebnom pre spracovanie vo spacy
--- a/pages/students/2016/jakub_maruniak/dp2021/annotation/Dockerfile
+++ b/pages/students/2016/jakub_maruniak/dp2021/annotation/Dockerfile
@ -1,17 +1,16 @@
-# > docker run -it -p 8080:8080 -v ${PWD}:/work prodigy bash
-# > winpty docker run --name prodigy -it -p 8080:8080 -v C://Users/jakub/Desktop/annotation/work prodigy bash
-
-FROM python:3.8
-RUN mkdir /prodigy
-WORKDIR /prodigy
-COPY ./prodigy-1.9.6-cp36.cp37.cp38-cp36m.cp37m.cp38-linux_x86_64.whl  /prodigy
-RUN mkdir /work
-COPY ./ner /work
-RUN pip install prodigy-1.9.6-cp36.cp37.cp38-cp36m.cp37m.cp38-linux_x86_64.whl
-RUN pip install https://files.kemt.fei.tuke.sk/models/spacy/sk_sk1-0.0.1.tar.gz
-RUN pip install nltk
-EXPOSE 8080
-ENV PRODIGY_HOME /work
-ENV PRODIGY_HOST 0.0.0.0
-WORKDIR /work
-
+# > docker run -it -p 8080:8080 -v ${PWD}:/work prodigy bash
+# > winpty docker run --name prodigy -it -p 8080:8080 -v C://Users/jakub/Desktop/annotation-master/annotation/work prodigy bash
+
+FROM python:3.8
+RUN mkdir /prodigy
+WORKDIR /prodigy
+COPY ./prodigy-1.9.6-cp36.cp37.cp38-cp36m.cp37m.cp38-linux_x86_64.whl  /prodigy
+RUN mkdir /work
+COPY ./ner /work/ner
+RUN pip install uvicorn==0.11.5 prodigy-1.9.6-cp36.cp37.cp38-cp36m.cp37m.cp38-linux_x86_64.whl
+RUN pip install https://files.kemt.fei.tuke.sk/models/spacy/sk_sk1-0.0.1.tar.gz
+RUN pip install nltk
+EXPOSE 8080
+ENV PRODIGY_HOME /work
+ENV PRODIGY_HOST 0.0.0.0
+WORKDIR /work
--- a/pages/students/2016/jakub_maruniak/dp2021/annotation/README.md
+++ b/pages/students/2016/jakub_maruniak/dp2021/annotation/README.md
@ -1,13 +1,11 @@
-## Diplomový projekt 1 2020
+## Diplomový projekt 2 2020

 - vytvorenie a spustenie docker kontajneru


 ```
 ./build-docker.sh
-docker run -it -p 8080:8080 -v ${PWD}:/work prodigy bash
-# (v mojom prípade:)
-winpty docker run --name prodigy -it -p 8080:8080 -v C://Users/jakub/Desktop/annotation/work prodigy bash
+winpty docker run --name prodigy -it -p 8080:8080 -v C://Users/jakub/Desktop/annotation-master/annotation/work prodigy bash
 ```


@ -17,5 +15,12 @@ winpty docker run --name prodigy -it -p 8080:8080 -v C://Users/jakub/Desktop/ann
 - `dataminer.csv` články stiahnuté z wiki
 - `cd ner`
 - `./01_text_to_sent.sh` spustenie skriptu *text_to_sent.py*, ktorý rozdelí články na jednotlivé vety
- `./02_ner_correct.sh` spustenie anotačného procesu pre NER s návrhmi od modelu 
- `./03_ner_export.sh`  exportovanie anotovaných dát vo formáte jsonl potrebnom pre spracovanie vo spacy
+- `./02_ner_manual.sh` spustenie manuálneho anotačného procesu pre NER  
+- `./03_export.sh`  exportovanie anotovaných dát vo formáte json potrebnom pre spracovanie vo spacy. Možnosť rozdelenia na trénovacie (70%) a testovacie dáta (30%) (--eval-split 0.3).
+
+### Štatistika o anotovaných dátach
+- `prodigy stats wikiart` - informácie o počte prijatých a odmietnutých článkov 
+- `python3 count.py` - informácie o počte jednotlivých entít
+
+### Trénovanie modelu
+Založené na: https://git.kemt.fei.tuke.sk/dano/spacy-skmodel
--- a/pages/students/2016/jakub_maruniak/dp2021/annotation/count.py
+++ b/pages/students/2016/jakub_maruniak/dp2021/annotation/count.py
@ -0,0 +1,14 @@
+# load data
+filename = 'ner/annotations.jsonl'
+file = open(filename, 'rt', encoding='utf-8')
+text = file.read()
+
+# count entity PER
+countPER = text.count('PER')
+countLOC = text.count('LOC')
+countORG = text.count('ORG')
+countMISC = text.count('MISC')
+print('Počet anotovaných entít typu PER:', countPER,'\n', 
+      'Počet anotovaných entít typu LOC:', countLOC,'\n',
+      'Počet anotovaných entít typu ORG:', countORG,'\n',
+      'Počet anotovaných entít typu MISC:', countMISC,'\n')
--- a/pages/students/2016/jakub_maruniak/dp2021/annotation/ner/02_ner_correct.sh
+++ b/pages/students/2016/jakub_maruniak/dp2021/annotation/ner/02_ner_correct.sh
@ -1,3 +0,0 @@
-
-prodigy ner.correct wikiart sk_sk1 ./textfile.csv --label OSOBA,MIESTO,ORGANIZACIA,PRODUKT
-
--- a/pages/students/2016/jakub_maruniak/dp2021/annotation/ner/02_ner_manual.sh
+++ b/pages/students/2016/jakub_maruniak/dp2021/annotation/ner/02_ner_manual.sh
@ -0,0 +1,2 @@
+prodigy ner.manual wikiart sk_sk1 ./textfile.csv --label PER,LOC,ORG,MISC
+
--- a/pages/students/2016/jakub_maruniak/dp2021/annotation/ner/03_export.sh
+++ b/pages/students/2016/jakub_maruniak/dp2021/annotation/ner/03_export.sh
@ -0,0 +1 @@
+prodigy data-to-spacy ./train.json ./eval.json --lang sk --ner wikiart --eval-split 0.3
--- a/pages/students/2016/jakub_maruniak/dp2021/annotation/ner/03_ner_export.sh
+++ b/pages/students/2016/jakub_maruniak/dp2021/annotation/ner/03_ner_export.sh
@ -1 +0,0 @@
-prodigy db-out wikiart > ./annotations.jsonl
--- a/pages/students/2016/jakub_maruniak/dp2021/annotation/train/prepare.sh
+++ b/pages/students/2016/jakub_maruniak/dp2021/annotation/train/prepare.sh
@ -0,0 +1,19 @@
+mkdir -p build
+mkdir -p build/input
+# Prepare Treebank
+mkdir -p build/input/slovak-treebank
+spacy convert ./sources/slovak-treebank/stb.conll ./build/input/slovak-treebank
+# UDAG used as evaluation
+mkdir -p build/input/ud-artificial-gapping
+spacy convert ./sources/ud-artificial-gapping/sk-ud-crawled-orphan.conllu ./build/input/ud-artificial-gapping
+# Prepare skner
+mkdir -p build/input/skner
+# Convert to IOB
+cat ./sources/skner/wikiann-sk.bio | python ./sources/bio-to-iob.py > build/input/skner/wikiann-sk.iob
+# Split to train test
+cat ./build/input/skner/wikiann-sk.iob | python ./sources/iob-to-traintest.py ./build/input/skner/wikiann-sk
+# Convert train and test
+mkdir -p build/input/skner-train
+spacy convert -n 15 --converter ner ./build/input/skner/wikiann-sk.train ./build/input/skner-train
+mkdir -p build/input/skner-test
+spacy convert -n 15 --converter ner ./build/input/skner/wikiann-sk.test ./build/input/skner-test
--- a/pages/students/2016/jakub_maruniak/dp2021/annotation/train/train.sh
+++ b/pages/students/2016/jakub_maruniak/dp2021/annotation/train/train.sh
@ -0,0 +1,19 @@
+set -e
+OUTDIR=build/train/output
+TRAINDIR=build/train
+mkdir -p $TRAINDIR
+mkdir -p $OUTDIR
+mkdir -p dist
+# Delete old training results
+rm -rf $OUTDIR/*
+# Train dependency and POS
+spacy train sk $OUTDIR ./build/input/slovak-treebank ./build/input/ud-artificial-gapping  --n-iter 20 -p tagger,parser
+rm -rf $TRAINDIR/posparser
+mv $OUTDIR/model-best $TRAINDIR/posparser
+# Train NER
+# python ./train.py -t ./train.json -o $TRAINDIR/nerposparser -n 10 -m $TRAINDIR/posparser/
+spacy train sk $TRAINDIR/nerposparser ./ner/train.json ./ner/eval.json --n-iter 20 -p ner
+# Package model
+spacy package $TRAINDIR/nerposparser dist --meta-path ./meta.json --force
+cd dist/sk_sk1-0.2.0
+python ./setup.py sdist --dist-dir ../
--- a/pages/students/2016/jan_holp/README.md
+++ b/pages/students/2016/jan_holp/README.md
@ -31,11 +31,39 @@ Zásobník úloh:

 - Urobiť verejné demo - nasadenie pomocou systému Docker
 - zlepšenie Web UI
+- vytvoriť REST api pre indexovanie dokumentu.
 - V indexe prideliť ohodnotenie každému dokumentu podľa viacerých metód, napr. PageRank
 - Využiť vyhodnotenie pri vyhľadávaní
 - **Použiť overovaciu databázu SCNC na vyhodnotenie každej metódy**
 - **Do konca zimného semestra vytvoriť "Mini Diplomovú prácu cca 8 strán s experimentami" vo forme článku**

+
+Virtuálne stretnutie 6.11:2020:
+
+Stav:
+
+- Riešenie problémov s cassandrou a javascriptom. Ako funguje funkcia then? 
+
+Úlohy na ďalšie stretnutie:
+
+- vypracujte funkciu na indexovanie. Vstup je dokument (objekt s textom a metainformáciami). Fukcia zaindexuje dokument do ES.
+- Naštudujte si ako funguje funkcia then a čo je to callback.
+- Naštudujte si ako sa používa Promise.
+- Naštudujte si ako funguje async - await. 
+- https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Asynchronous/
+
+
+
+Virtuálne stretnutie 23.10:2020:
+
+Stav:
+- Riešenie problémov s cassandrou. Ako vybrať dáta podľa primárneho kľúča.
+
+Do ďďalšiehio stretnutia:
+
+- pokračovať v otvorených úlohách.
+- urobte funkciu pre indexovanie jedného dokumentu.
+
 Virtuálne stretnutie 16.10.

 Stav:
--- a/pages/students/2016/jan_holp/dp2021/zdrojove_subory/cassandra.js
+++ b/pages/students/2016/jan_holp/dp2021/zdrojove_subory/cassandra.js
@ -0,0 +1,105 @@
+//Jan Holp, DP 2021
+
+
+//client1 = cassandra
+//client2 = elasticsearch 
+//-----------------------------------------------------------------
+
+//require the Elasticsearch librray
+const elasticsearch = require('elasticsearch');
+const client2 = new elasticsearch.Client({
+   hosts: [ 'localhost:9200']
+});
+client2.ping({
+     requestTimeout: 30000,
+ }, function(error) {
+ // at this point, eastic search is down, please check your Elasticsearch service
+     if (error) {
+         console.error('Elasticsearch cluster is down!');
+     } else {
+         console.log('Everything is ok');
+     }
+ });
+
+//create new index skweb2
+client2.indices.create({
+    index: 'skweb2'
+}, function(error, response, status) {
+    if (error) {
+        console.log(error);
+    } else {
+        console.log("created a new index", response);
+    }
+});
+
+const cassandra = require('cassandra-driver');
+const client1 = new cassandra.Client({ contactPoints: ['localhost:9042'], localDataCenter: 'datacenter1', keyspace: 'websucker' });
+const query = 'SELECT title  FROM websucker.content WHERE body_size > 0  ALLOW FILTERING';
+client1.execute(query)
+  .then(result => console.log(result)),function(error) {
+    if(error){
+      console.error('Something is wrong!');
+      console.log(error);
+    } else{
+        console.log('Everything is ok');
+    }
+  }; 
+
+/*
+async  function indexData() {
+
+  var i = 0;
+  const query = 'SELECT title  FROM websucker.content WHERE body_size > 0  ALLOW FILTERING'; 
+  client1.execute(query)
+    .then((result) => {
+    try {
+        //for ( i=0; i<15;i++){
+        console.log('%s', result.row[0].title)
+      //}
+  } catch (query) {
+      if (query  instanceof SyntaxError) {
+          console.log( "Neplatne query" );
+        } 
+  }
+
+    
+
+    });
+      
+
+  }
+
+/*
+
+//indexing method
+const bulkIndex = function bulkIndex(index, type, data) {
+	let bulkBody = [];
+	id = 1;
+const errorCount = 0;
+	data.forEach(item => {
+		bulkBody.push({
+			index: {
+				_index: index,
+				_type:  type,
+				_id :   id++,
+			}
+		});
+		bulkBody.push(item);
+	});
+        console.log(bulkBody);
+	client.bulk({body: bulkBody})
+		.then(response => {
+
+			response.items.forEach(item => {
+				if (item.index && item.index.error) {
+					console.log(++errorCount, item.index.error);
+				}
+			});
+			console.log(
+				`Successfully indexed ${data.length - errorCount}
+				out of ${data.length} items`
+			);
+		})
+		.catch(console.err);
+};
+*/
--- a/pages/students/2016/lukas_pokryvka/README.md
+++ b/pages/students/2016/lukas_pokryvka/README.md
@ -23,13 +23,26 @@ Zásobník úloh :
        - tesla
        - xavier
    - Trénovanie na dvoch kartách na jednom stroji 
-        - idoc
+        - idoc DONE
        - titan
    - možno trénovanie na 4 kartách na jednom
        - quadra
    - *Trénovanie na dvoch kartách na dvoch strojoch pomocou NCCL (idoc, tesla)*
    - možno trénovanie na 2 kartách na dvoch strojoch (quadra plus idoc).

+Virtuálne stretnutie 27.10.2020
+
+Stav:
+
+- Trénovanie na procesore, na 1 GPU, na 2 GPU na idoc
+- Príprava podkladov na trénovanie na dvoch strojoch pomocou Pytorch.
+- Vytvorený prístup na teslu a xavier.
+
+Úlohy na ďďalšie stretnutie:
+- Štdúdium odbornej literatúry a vypracovanie poznámok. 
+- Pokračovať v otvorených úlohách zo zásobníka
+- Vypracované skripty uložiť na GIT repozitár
+- vytvorte repozitár dp2021

 Stretnutie 2.10.2020

--- a/pages/students/2016/lukas_pokryvka/dp2021/README.md
+++ b/pages/students/2016/lukas_pokryvka/dp2021/README.md
@ -1 +1,4 @@
-## Všetky skripty, súbory a konfigurácie
+## Všetky skripty, súbory a konfigurácie
+
+https://github.com/pytorch/examples/tree/master/imagenet
+- malo by fungovat pre DDP, nedostupny imagenet subor z oficialnej stranky
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/data-unversioned/data.txt
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/data-unversioned/data.txt
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/data/data.txt
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/data/data.txt
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/model/init.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/model/init.py
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/model/benchmark_seg.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/model/benchmark_seg.py
@ -0,0 +1,76 @@
+import argparse
+import datetime
+import os
+import socket
+import sys
+
+import numpy as np
+from torch.utils.tensorboard import SummaryWriter
+
+import torch
+import torch.nn as nn
+import torch.optim
+
+from torch.optim import SGD, Adam
+from torch.utils.data import DataLoader
+
+from util.util import enumerateWithEstimate
+from p2ch13.dsets import Luna2dSegmentationDataset, TrainingLuna2dSegmentationDataset, getCt
+from util.logconf import logging
+from util.util import xyz2irc
+from p2ch13.model_seg import UNetWrapper, SegmentationAugmentation
+from p2ch13.train_seg import LunaTrainingApp
+
+log = logging.getLogger(__name__)
+# log.setLevel(logging.WARN)
+# log.setLevel(logging.INFO)
+log.setLevel(logging.DEBUG)
+
+class BenchmarkLuna2dSegmentationDataset(TrainingLuna2dSegmentationDataset):
+    def __len__(self):
+        # return 500
+        return 5000
+        return 1000
+
+class LunaBenchmarkApp(LunaTrainingApp):
+    def initTrainDl(self):
+        train_ds = BenchmarkLuna2dSegmentationDataset(
+            val_stride=10,
+            isValSet_bool=False,
+            contextSlices_count=3,
+            # augmentation_dict=self.augmentation_dict,
+        )
+
+        batch_size = self.cli_args.batch_size
+        if self.use_cuda:
+            batch_size *= torch.cuda.device_count()
+
+        train_dl = DataLoader(
+            train_ds,
+            batch_size=batch_size,
+            num_workers=self.cli_args.num_workers,
+            pin_memory=self.use_cuda,
+        )
+
+        return train_dl
+
+    def main(self):
+        log.info("Starting {}, {}".format(type(self).__name__, self.cli_args))
+
+        train_dl = self.initTrainDl()
+
+        for epoch_ndx in range(1, 2):
+            log.info("Epoch {} of {}, {}/{} batches of size {}*{}".format(
+                epoch_ndx,
+                self.cli_args.epochs,
+                len(train_dl),
+                len([]),
+                self.cli_args.batch_size,
+                (torch.cuda.device_count() if self.use_cuda else 1),
+            ))
+
+            self.doTraining(epoch_ndx, train_dl)
+
+
+if __name__ == '__main__':
+    LunaBenchmarkApp().main()
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/model/dsets.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/model/dsets.py
@ -0,0 +1,401 @@
+import copy
+import csv
+import functools
+import glob
+import math
+import os
+import random
+
+from collections import namedtuple
+
+import SimpleITK as sitk
+import numpy as np
+import scipy.ndimage.morphology as morph
+
+import torch
+import torch.cuda
+import torch.nn.functional as F
+from torch.utils.data import Dataset
+
+from util.disk import getCache
+from util.util import XyzTuple, xyz2irc
+from util.logconf import logging
+
+log = logging.getLogger(__name__)
+# log.setLevel(logging.WARN)
+# log.setLevel(logging.INFO)
+log.setLevel(logging.DEBUG)
+
+raw_cache = getCache('part2ch13_raw')
+
+MaskTuple = namedtuple('MaskTuple', 'raw_dense_mask, dense_mask, body_mask, air_mask, raw_candidate_mask, candidate_mask, lung_mask, neg_mask, pos_mask')
+
+CandidateInfoTuple = namedtuple('CandidateInfoTuple', 'isNodule_bool, hasAnnotation_bool, isMal_bool, diameter_mm, series_uid, center_xyz')
+
+@functools.lru_cache(1)
+def getCandidateInfoList(requireOnDisk_bool=True):
+    # We construct a set with all series_uids that are present on disk.
+    # This will let us use the data, even if we haven't downloaded all of
+    # the subsets yet.
+    mhd_list = glob.glob('data-unversioned/subset*/*.mhd')
+    presentOnDisk_set = {os.path.split(p)[-1][:-4] for p in mhd_list}
+
+    candidateInfo_list = []
+    with open('data/annotations_with_malignancy.csv', "r") as f:
+        for row in list(csv.reader(f))[1:]:
+            series_uid = row[0]
+            annotationCenter_xyz = tuple([float(x) for x in row[1:4]])
+            annotationDiameter_mm = float(row[4])
+            isMal_bool = {'False': False, 'True': True}[row[5]]
+
+            candidateInfo_list.append(
+                CandidateInfoTuple(
+                    True,
+                    True,
+                    isMal_bool,
+                    annotationDiameter_mm,
+                    series_uid,
+                    annotationCenter_xyz,
+                )
+            )
+
+    with open('data/candidates.csv', "r") as f:
+        for row in list(csv.reader(f))[1:]:
+            series_uid = row[0]
+
+            if series_uid not in presentOnDisk_set and requireOnDisk_bool:
+                continue
+
+            isNodule_bool = bool(int(row[4]))
+            candidateCenter_xyz = tuple([float(x) for x in row[1:4]])
+
+            if not isNodule_bool:
+                candidateInfo_list.append(
+                    CandidateInfoTuple(
+                        False,
+                        False,
+                        False,
+                        0.0,
+                        series_uid,
+                        candidateCenter_xyz,
+                    )
+                )
+
+    candidateInfo_list.sort(reverse=True)
+    return candidateInfo_list
+
+@functools.lru_cache(1)
+def getCandidateInfoDict(requireOnDisk_bool=True):
+    candidateInfo_list = getCandidateInfoList(requireOnDisk_bool)
+    candidateInfo_dict = {}
+
+    for candidateInfo_tup in candidateInfo_list:
+        candidateInfo_dict.setdefault(candidateInfo_tup.series_uid,
+                                      []).append(candidateInfo_tup)
+
+    return candidateInfo_dict
+
+class Ct:
+    def __init__(self, series_uid):
+        mhd_path = glob.glob(
+            'data-unversioned/subset*/{}.mhd'.format(series_uid)
+        )[0]
+
+        ct_mhd = sitk.ReadImage(mhd_path)
+        self.hu_a = np.array(sitk.GetArrayFromImage(ct_mhd), dtype=np.float32)
+
+        # CTs are natively expressed in https://en.wikipedia.org/wiki/Hounsfield_scale
+        # HU are scaled oddly, with 0 g/cc (air, approximately) being -1000 and 1 g/cc (water) being 0.
+
+        self.series_uid = series_uid
+
+        self.origin_xyz = XyzTuple(*ct_mhd.GetOrigin())
+        self.vxSize_xyz = XyzTuple(*ct_mhd.GetSpacing())
+        self.direction_a = np.array(ct_mhd.GetDirection()).reshape(3, 3)
+
+        candidateInfo_list = getCandidateInfoDict()[self.series_uid]
+
+        self.positiveInfo_list = [
+            candidate_tup
+            for candidate_tup in candidateInfo_list
+            if candidate_tup.isNodule_bool
+        ]
+        self.positive_mask = self.buildAnnotationMask(self.positiveInfo_list)
+        self.positive_indexes = (self.positive_mask.sum(axis=(1,2))
+                                 .nonzero()[0].tolist())
+
+    def buildAnnotationMask(self, positiveInfo_list, threshold_hu = -700):
+        boundingBox_a = np.zeros_like(self.hu_a, dtype=np.bool)
+
+        for candidateInfo_tup in positiveInfo_list:
+            center_irc = xyz2irc(
+                candidateInfo_tup.center_xyz,
+                self.origin_xyz,
+                self.vxSize_xyz,
+                self.direction_a,
+            )
+            ci = int(center_irc.index)
+            cr = int(center_irc.row)
+            cc = int(center_irc.col)
+
+            index_radius = 2
+            try:
+                while self.hu_a[ci + index_radius, cr, cc] > threshold_hu and \
+                        self.hu_a[ci - index_radius, cr, cc] > threshold_hu:
+                    index_radius += 1
+            except IndexError:
+                index_radius -= 1
+
+            row_radius = 2
+            try:
+                while self.hu_a[ci, cr + row_radius, cc] > threshold_hu and \
+                        self.hu_a[ci, cr - row_radius, cc] > threshold_hu:
+                    row_radius += 1
+            except IndexError:
+                row_radius -= 1
+
+            col_radius = 2
+            try:
+                while self.hu_a[ci, cr, cc + col_radius] > threshold_hu and \
+                        self.hu_a[ci, cr, cc - col_radius] > threshold_hu:
+                    col_radius += 1
+            except IndexError:
+                col_radius -= 1
+
+            # assert index_radius > 0, repr([candidateInfo_tup.center_xyz, center_irc, self.hu_a[ci, cr, cc]])
+            # assert row_radius > 0
+            # assert col_radius > 0
+
+            boundingBox_a[
+                 ci - index_radius: ci + index_radius + 1,
+                 cr - row_radius: cr + row_radius + 1,
+                 cc - col_radius: cc + col_radius + 1] = True
+
+        mask_a = boundingBox_a & (self.hu_a > threshold_hu)
+
+        return mask_a
+
+    def getRawCandidate(self, center_xyz, width_irc):
+        center_irc = xyz2irc(center_xyz, self.origin_xyz, self.vxSize_xyz,
+                             self.direction_a)
+
+        slice_list = []
+        for axis, center_val in enumerate(center_irc):
+            start_ndx = int(round(center_val - width_irc[axis]/2))
+            end_ndx = int(start_ndx + width_irc[axis])
+
+            assert center_val >= 0 and center_val < self.hu_a.shape[axis], repr([self.series_uid, center_xyz, self.origin_xyz, self.vxSize_xyz, center_irc, axis])
+
+            if start_ndx < 0:
+                # log.warning("Crop outside of CT array: {} {}, center:{} shape:{} width:{}".format(
+                #     self.series_uid, center_xyz, center_irc, self.hu_a.shape, width_irc))
+                start_ndx = 0
+                end_ndx = int(width_irc[axis])
+
+            if end_ndx > self.hu_a.shape[axis]:
+                # log.warning("Crop outside of CT array: {} {}, center:{} shape:{} width:{}".format(
+                #     self.series_uid, center_xyz, center_irc, self.hu_a.shape, width_irc))
+                end_ndx = self.hu_a.shape[axis]
+                start_ndx = int(self.hu_a.shape[axis] - width_irc[axis])
+
+            slice_list.append(slice(start_ndx, end_ndx))
+
+        ct_chunk = self.hu_a[tuple(slice_list)]
+        pos_chunk = self.positive_mask[tuple(slice_list)]
+
+        return ct_chunk, pos_chunk, center_irc
+
+@functools.lru_cache(1, typed=True)
+def getCt(series_uid):
+    return Ct(series_uid)
+
+@raw_cache.memoize(typed=True)
+def getCtRawCandidate(series_uid, center_xyz, width_irc):
+    ct = getCt(series_uid)
+    ct_chunk, pos_chunk, center_irc = ct.getRawCandidate(center_xyz,
+                                                         width_irc)
+    ct_chunk.clip(-1000, 1000, ct_chunk)
+    return ct_chunk, pos_chunk, center_irc
+
+@raw_cache.memoize(typed=True)
+def getCtSampleSize(series_uid):
+    ct = Ct(series_uid)
+    return int(ct.hu_a.shape[0]), ct.positive_indexes
+
+
+class Luna2dSegmentationDataset(Dataset):
+    def __init__(self,
+                 val_stride=0,
+                 isValSet_bool=None,
+                 series_uid=None,
+                 contextSlices_count=3,
+                 fullCt_bool=False,
+            ):
+        self.contextSlices_count = contextSlices_count
+        self.fullCt_bool = fullCt_bool
+
+        if series_uid:
+            self.series_list = [series_uid]
+        else:
+            self.series_list = sorted(getCandidateInfoDict().keys())
+
+        if isValSet_bool:
+            assert val_stride > 0, val_stride
+            self.series_list = self.series_list[::val_stride]
+            assert self.series_list
+        elif val_stride > 0:
+            del self.series_list[::val_stride]
+            assert self.series_list
+
+        self.sample_list = []
+        for series_uid in self.series_list:
+            index_count, positive_indexes = getCtSampleSize(series_uid)
+
+            if self.fullCt_bool:
+                self.sample_list += [(series_uid, slice_ndx)
+                                     for slice_ndx in range(index_count)]
+            else:
+                self.sample_list += [(series_uid, slice_ndx)
+                                     for slice_ndx in positive_indexes]
+
+        self.candidateInfo_list = getCandidateInfoList()
+
+        series_set = set(self.series_list)
+        self.candidateInfo_list = [cit for cit in self.candidateInfo_list
+                                   if cit.series_uid in series_set]
+
+        self.pos_list = [nt for nt in self.candidateInfo_list
+                            if nt.isNodule_bool]
+
+        log.info("{!r}: {} {} series, {} slices, {} nodules".format(
+            self,
+            len(self.series_list),
+            {None: 'general', True: 'validation', False: 'training'}[isValSet_bool],
+            len(self.sample_list),
+            len(self.pos_list),
+        ))
+
+    def __len__(self):
+        return len(self.sample_list)
+
+    def __getitem__(self, ndx):
+        series_uid, slice_ndx = self.sample_list[ndx % len(self.sample_list)]
+        return self.getitem_fullSlice(series_uid, slice_ndx)
+
+    def getitem_fullSlice(self, series_uid, slice_ndx):
+        ct = getCt(series_uid)
+        ct_t = torch.zeros((self.contextSlices_count * 2 + 1, 512, 512))
+
+        start_ndx = slice_ndx - self.contextSlices_count
+        end_ndx = slice_ndx + self.contextSlices_count + 1
+        for i, context_ndx in enumerate(range(start_ndx, end_ndx)):
+            context_ndx = max(context_ndx, 0)
+            context_ndx = min(context_ndx, ct.hu_a.shape[0] - 1)
+            ct_t[i] = torch.from_numpy(ct.hu_a[context_ndx].astype(np.float32))
+
+        # CTs are natively expressed in https://en.wikipedia.org/wiki/Hounsfield_scale
+        # HU are scaled oddly, with 0 g/cc (air, approximately) being -1000 and 1 g/cc (water) being 0.
+        # The lower bound gets rid of negative density stuff used to indicate out-of-FOV
+        # The upper bound nukes any weird hotspots and clamps bone down
+        ct_t.clamp_(-1000, 1000)
+
+        pos_t = torch.from_numpy(ct.positive_mask[slice_ndx]).unsqueeze(0)
+
+        return ct_t, pos_t, ct.series_uid, slice_ndx
+
+
+class TrainingLuna2dSegmentationDataset(Luna2dSegmentationDataset):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        self.ratio_int = 2
+
+    def __len__(self):
+        return 300000
+
+    def shuffleSamples(self):
+        random.shuffle(self.candidateInfo_list)
+        random.shuffle(self.pos_list)
+
+    def __getitem__(self, ndx):
+        candidateInfo_tup = self.pos_list[ndx % len(self.pos_list)]
+        return self.getitem_trainingCrop(candidateInfo_tup)
+
+    def getitem_trainingCrop(self, candidateInfo_tup):
+        ct_a, pos_a, center_irc = getCtRawCandidate(
+            candidateInfo_tup.series_uid,
+            candidateInfo_tup.center_xyz,
+            (7, 96, 96),
+        )
+        pos_a = pos_a[3:4]
+
+        row_offset = random.randrange(0,32)
+        col_offset = random.randrange(0,32)
+        ct_t = torch.from_numpy(ct_a[:, row_offset:row_offset+64,
+                                     col_offset:col_offset+64]).to(torch.float32)
+        pos_t = torch.from_numpy(pos_a[:, row_offset:row_offset+64,
+                                       col_offset:col_offset+64]).to(torch.long)
+
+        slice_ndx = center_irc.index
+
+        return ct_t, pos_t, candidateInfo_tup.series_uid, slice_ndx
+
+class PrepcacheLunaDataset(Dataset):
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        self.candidateInfo_list = getCandidateInfoList()
+        self.pos_list = [nt for nt in self.candidateInfo_list if nt.isNodule_bool]
+
+        self.seen_set = set()
+        self.candidateInfo_list.sort(key=lambda x: x.series_uid)
+
+    def __len__(self):
+        return len(self.candidateInfo_list)
+
+    def __getitem__(self, ndx):
+        # candidate_t, pos_t, series_uid, center_t = super().__getitem__(ndx)
+
+        candidateInfo_tup = self.candidateInfo_list[ndx]
+        getCtRawCandidate(candidateInfo_tup.series_uid, candidateInfo_tup.center_xyz, (7, 96, 96))
+
+        series_uid = candidateInfo_tup.series_uid
+        if series_uid not in self.seen_set:
+            self.seen_set.add(series_uid)
+
+            getCtSampleSize(series_uid)
+            # ct = getCt(series_uid)
+            # for mask_ndx in ct.positive_indexes:
+            #     build2dLungMask(series_uid, mask_ndx)
+
+        return 0, 1 #candidate_t, pos_t, series_uid, center_t
+
+
+class TvTrainingLuna2dSegmentationDataset(torch.utils.data.Dataset):
+    def __init__(self, isValSet_bool=False, val_stride=10, contextSlices_count=3):
+        assert contextSlices_count == 3
+        data = torch.load('./imgs_and_masks.pt')
+        suids = list(set(data['suids']))
+        trn_mask_suids = torch.arange(len(suids)) % val_stride < (val_stride - 1)
+        trn_suids = {s for i, s in zip(trn_mask_suids, suids) if i}
+        trn_mask = torch.tensor([(s in trn_suids) for s in data["suids"]])
+        if not isValSet_bool:
+            self.imgs = data["imgs"][trn_mask]
+            self.masks = data["masks"][trn_mask]
+            self.suids = [s for s, i in zip(data["suids"], trn_mask) if i]
+        else:
+            self.imgs = data["imgs"][~trn_mask]
+            self.masks = data["masks"][~trn_mask]
+            self.suids = [s for s, i in zip(data["suids"], trn_mask) if not i]
+        # discard spurious hotspots and clamp bone
+        self.imgs.clamp_(-1000, 1000)
+        self.imgs /= 1000
+
+
+    def __len__(self):
+        return len(self.imgs)
+
+    def __getitem__(self, i):
+        oh, ow = torch.randint(0, 32, (2,))
+        sl = self.masks.size(1)//2
+        return self.imgs[i, :, oh: oh + 64, ow: ow + 64], 1, self.masks[i, sl: sl+1, oh: oh + 64, ow: ow + 64].to(torch.float32), self.suids[i], 9999
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/model/model.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/model/model.py
@ -0,0 +1,224 @@
+import math
+import random
+from collections import namedtuple
+
+import torch
+from torch import nn as nn
+import torch.nn.functional as F
+
+from util.logconf import logging
+from util.unet import UNet
+
+log = logging.getLogger(__name__)
+# log.setLevel(logging.WARN)
+# log.setLevel(logging.INFO)
+log.setLevel(logging.DEBUG)
+
+class UNetWrapper(nn.Module):
+    def __init__(self, **kwargs):
+        super().__init__()
+
+        self.input_batchnorm = nn.BatchNorm2d(kwargs['in_channels'])
+        self.unet = UNet(**kwargs)
+        self.final = nn.Sigmoid()
+
+        self._init_weights()
+
+    def _init_weights(self):
+        init_set = {
+            nn.Conv2d,
+            nn.Conv3d,
+            nn.ConvTranspose2d,
+            nn.ConvTranspose3d,
+            nn.Linear,
+        }
+        for m in self.modules():
+            if type(m) in init_set:
+                nn.init.kaiming_normal_(
+                    m.weight.data, mode='fan_out', nonlinearity='relu', a=0
+                )
+                if m.bias is not None:
+                    fan_in, fan_out = \
+                        nn.init._calculate_fan_in_and_fan_out(m.weight.data)
+                    bound = 1 / math.sqrt(fan_out)
+                    nn.init.normal_(m.bias, -bound, bound)
+
+        # nn.init.constant_(self.unet.last.bias, -4)
+        # nn.init.constant_(self.unet.last.bias, 4)
+
+
+    def forward(self, input_batch):
+        bn_output = self.input_batchnorm(input_batch)
+        un_output = self.unet(bn_output)
+        fn_output = self.final(un_output)
+        return fn_output
+
+class SegmentationAugmentation(nn.Module):
+    def __init__(
+            self, flip=None, offset=None, scale=None, rotate=None, noise=None
+    ):
+        super().__init__()
+
+        self.flip = flip
+        self.offset = offset
+        self.scale = scale
+        self.rotate = rotate
+        self.noise = noise
+
+    def forward(self, input_g, label_g):
+        transform_t = self._build2dTransformMatrix()
+        transform_t = transform_t.expand(input_g.shape[0], -1, -1)
+        transform_t = transform_t.to(input_g.device, torch.float32)
+        affine_t = F.affine_grid(transform_t[:,:2],
+                input_g.size(), align_corners=False)
+
+        augmented_input_g = F.grid_sample(input_g,
+                affine_t, padding_mode='border',
+                align_corners=False)
+        augmented_label_g = F.grid_sample(label_g.to(torch.float32),
+                affine_t, padding_mode='border',
+                align_corners=False)
+
+        if self.noise:
+            noise_t = torch.randn_like(augmented_input_g)
+            noise_t *= self.noise
+
+            augmented_input_g += noise_t
+
+        return augmented_input_g, augmented_label_g > 0.5
+
+    def _build2dTransformMatrix(self):
+        transform_t = torch.eye(3)
+
+        for i in range(2):
+            if self.flip:
+                if random.random() > 0.5:
+                    transform_t[i,i] *= -1
+
+            if self.offset:
+                offset_float = self.offset
+                random_float = (random.random() * 2 - 1)
+                transform_t[2,i] = offset_float * random_float
+
+            if self.scale:
+                scale_float = self.scale
+                random_float = (random.random() * 2 - 1)
+                transform_t[i,i] *= 1.0 + scale_float * random_float
+
+        if self.rotate:
+            angle_rad = random.random() * math.pi * 2
+            s = math.sin(angle_rad)
+            c = math.cos(angle_rad)
+
+            rotation_t = torch.tensor([
+                [c, -s, 0],
+                [s, c, 0],
+                [0, 0, 1]])
+
+            transform_t @= rotation_t
+
+        return transform_t
+
+
+# MaskTuple = namedtuple('MaskTuple', 'raw_dense_mask, dense_mask, body_mask, air_mask, raw_candidate_mask, candidate_mask, lung_mask, neg_mask, pos_mask')
+#
+# class SegmentationMask(nn.Module):
+#     def __init__(self):
+#         super().__init__()
+#
+#         self.conv_list = nn.ModuleList([
+#             self._make_circle_conv(radius) for radius in range(1, 8)
+#         ])
+#
+#     def _make_circle_conv(self, radius):
+#         diameter = 1 + radius * 2
+#
+#         a = torch.linspace(-1, 1, steps=diameter)**2
+#         b = (a[None] + a[:, None])**0.5
+#
+#         circle_weights = (b <= 1.0).to(torch.float32)
+#
+#         conv = nn.Conv2d(1, 1, kernel_size=diameter, padding=radius, bias=False)
+#         conv.weight.data.fill_(1)
+#         conv.weight.data *= circle_weights / circle_weights.sum()
+#
+#         return conv
+#
+#
+#     def erode(self, input_mask, radius, threshold=1):
+#         conv = self.conv_list[radius - 1]
+#         input_float = input_mask.to(torch.float32)
+#         result = conv(input_float)
+#
+#         # log.debug(['erode in ', radius, threshold, input_float.min().item(), input_float.mean().item(), input_float.max().item()])
+#         # log.debug(['erode out', radius, threshold, result.min().item(), result.mean().item(), result.max().item()])
+#
+#         return result >= threshold
+#
+#     def deposit(self, input_mask, radius, threshold=0):
+#         conv = self.conv_list[radius - 1]
+#         input_float = input_mask.to(torch.float32)
+#         result = conv(input_float)
+#
+#         # log.debug(['deposit in ', radius, threshold, input_float.min().item(), input_float.mean().item(), input_float.max().item()])
+#         # log.debug(['deposit out', radius, threshold, result.min().item(), result.mean().item(), result.max().item()])
+#
+#         return result > threshold
+#
+#     def fill_cavity(self, input_mask):
+#         cumsum = input_mask.cumsum(-1)
+#         filled_mask = (cumsum > 0)
+#         filled_mask &= (cumsum < cumsum[..., -1:])
+#         cumsum = input_mask.cumsum(-2)
+#         filled_mask &= (cumsum > 0)
+#         filled_mask &= (cumsum < cumsum[..., -1:, :])
+#
+#         return filled_mask
+#
+#
+#     def forward(self, input_g, raw_pos_g):
+#         gcc_g = input_g + 1
+#
+#         with torch.no_grad():
+#             # log.info(['gcc_g', gcc_g.min(), gcc_g.mean(), gcc_g.max()])
+#
+#             raw_dense_mask = gcc_g > 0.7
+#             dense_mask = self.deposit(raw_dense_mask, 2)
+#             dense_mask = self.erode(dense_mask, 6)
+#             dense_mask = self.deposit(dense_mask, 4)
+#
+#             body_mask = self.fill_cavity(dense_mask)
+#             air_mask = self.deposit(body_mask & ~dense_mask, 5)
+#             air_mask = self.erode(air_mask, 6)
+#
+#             lung_mask = self.deposit(air_mask, 5)
+#
+#             raw_candidate_mask = gcc_g > 0.4
+#             raw_candidate_mask &= air_mask
+#             candidate_mask = self.erode(raw_candidate_mask, 1)
+#             candidate_mask = self.deposit(candidate_mask, 1)
+#
+#             pos_mask = self.deposit((raw_pos_g > 0.5) & lung_mask, 2)
+#
+#             neg_mask = self.deposit(candidate_mask, 1)
+#             neg_mask &= ~pos_mask
+#             neg_mask &= lung_mask
+#
+#             # label_g = (neg_mask | pos_mask).to(torch.float32)
+#             label_g = (pos_mask).to(torch.float32)
+#             neg_g = neg_mask.to(torch.float32)
+#             pos_g = pos_mask.to(torch.float32)
+#
+#         mask_dict = {
+#             'raw_dense_mask': raw_dense_mask,
+#             'dense_mask': dense_mask,
+#             'body_mask': body_mask,
+#             'air_mask': air_mask,
+#             'raw_candidate_mask': raw_candidate_mask,
+#             'candidate_mask': candidate_mask,
+#             'lung_mask': lung_mask,
+#             'neg_mask': neg_mask,
+#             'pos_mask': pos_mask,
+#         }
+#
+#         return label_g, neg_g, pos_g, lung_mask, mask_dict
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/model/prepcache.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/model/prepcache.py
@ -0,0 +1,69 @@
+import timing
+import argparse
+import sys
+
+import numpy as np
+
+import torch.nn as nn
+from torch.autograd import Variable
+from torch.optim import SGD
+from torch.utils.data import DataLoader
+
+from util.util import enumerateWithEstimate
+from .dsets import PrepcacheLunaDataset, getCtSampleSize
+from util.logconf import logging
+# from .model import LunaModel
+
+log = logging.getLogger(__name__)
+# log.setLevel(logging.WARN)
+log.setLevel(logging.INFO)
+# log.setLevel(logging.DEBUG)
+
+
+class LunaPrepCacheApp:
+    @classmethod
+    def __init__(self, sys_argv=None):
+        if sys_argv is None:
+            sys_argv = sys.argv[1:]
+
+        parser = argparse.ArgumentParser()
+        parser.add_argument('--batch-size',
+            help='Batch size to use for training',
+            default=1024,
+            type=int,
+        )
+        parser.add_argument('--num-workers',
+            help='Number of worker processes for background data loading',
+            default=8,
+            type=int,
+        )
+        # parser.add_argument('--scaled',
+        #     help="Scale the CT chunks to square voxels.",
+        #     default=False,
+        #     action='store_true',
+        # )
+
+        self.cli_args = parser.parse_args(sys_argv)
+
+    def main(self):
+        log.info("Starting {}, {}".format(type(self).__name__, self.cli_args))
+
+        self.prep_dl = DataLoader(
+            PrepcacheLunaDataset(
+                # sortby_str='series_uid',
+            ),
+            batch_size=self.cli_args.batch_size,
+            num_workers=self.cli_args.num_workers,
+        )
+
+        batch_iter = enumerateWithEstimate(
+            self.prep_dl,
+            "Stuffing cache",
+            start_ndx=self.prep_dl.num_workers,
+        )
+        for batch_ndx, batch_tup in batch_iter:
+            pass
+
+
+if __name__ == '__main__':
+    LunaPrepCacheApp().main()
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/util/init.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/util/init.py
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/util/augmentation.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/util/augmentation.py
@ -0,0 +1,331 @@
+import math
+import random
+import warnings
+
+import numpy as np
+import scipy.ndimage
+
+import torch
+from torch.autograd import Function
+from torch.autograd.function import once_differentiable
+import torch.backends.cudnn as cudnn
+
+from util.logconf import logging
+log = logging.getLogger(__name__)
+# log.setLevel(logging.WARN)
+# log.setLevel(logging.INFO)
+log.setLevel(logging.DEBUG)
+
+def cropToShape(image, new_shape, center_list=None, fill=0.0):
+    # log.debug([image.shape, new_shape, center_list])
+    # assert len(image.shape) == 3, repr(image.shape)
+
+    if center_list is None:
+        center_list = [int(image.shape[i] / 2) for i in range(3)]
+
+    crop_list = []
+    for i in range(0, 3):
+        crop_int = center_list[i]
+        if image.shape[i] > new_shape[i] and crop_int is not None:
+
+            # We can't just do crop_int +/- shape/2 since shape might be odd
+            # and ints round down.
+            start_int = crop_int - int(new_shape[i]/2)
+            end_int = start_int + new_shape[i]
+            crop_list.append(slice(max(0, start_int), end_int))
+        else:
+            crop_list.append(slice(0, image.shape[i]))
+
+    # log.debug([image.shape, crop_list])
+    image = image[crop_list]
+
+    crop_list = []
+    for i in range(0, 3):
+        if image.shape[i] < new_shape[i]:
+            crop_int = int((new_shape[i] - image.shape[i]) / 2)
+            crop_list.append(slice(crop_int, crop_int + image.shape[i]))
+        else:
+            crop_list.append(slice(0, image.shape[i]))
+
+    # log.debug([image.shape, crop_list])
+    new_image = np.zeros(new_shape, dtype=image.dtype)
+    new_image[:] = fill
+    new_image[crop_list] = image
+
+    return new_image
+
+
+def zoomToShape(image, new_shape, square=True):
+    # assert image.shape[-1] in {1, 3, 4}, repr(image.shape)
+
+    if square and image.shape[0] != image.shape[1]:
+        crop_int = min(image.shape[0], image.shape[1])
+        new_shape = [crop_int, crop_int, image.shape[2]]
+        image = cropToShape(image, new_shape)
+
+    zoom_shape = [new_shape[i] / image.shape[i] for i in range(3)]
+
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        image = scipy.ndimage.interpolation.zoom(
+            image, zoom_shape,
+            output=None, order=0, mode='nearest', cval=0.0, prefilter=True)
+
+    return image
+
+def randomOffset(image_list, offset_rows=0.125, offset_cols=0.125):
+
+    center_list = [int(image_list[0].shape[i] / 2) for i in range(3)]
+    center_list[0] += int(offset_rows * (random.random() - 0.5) * 2)
+    center_list[1] += int(offset_cols * (random.random() - 0.5) * 2)
+    center_list[2] = None
+
+    new_list = []
+    for image in image_list:
+        new_image = cropToShape(image, image.shape, center_list)
+        new_list.append(new_image)
+
+    return new_list
+
+
+def randomZoom(image_list, scale=None, scale_min=0.8, scale_max=1.3):
+    if scale is None:
+        scale = scale_min + (scale_max - scale_min) * random.random()
+
+    new_list = []
+    for image in image_list:
+        # assert image.shape[-1] in {1, 3, 4}, repr(image.shape)
+
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            # log.info([image.shape])
+            zimage = scipy.ndimage.interpolation.zoom(
+                image, [scale, scale, 1.0],
+                output=None, order=0, mode='nearest', cval=0.0, prefilter=True)
+        image = cropToShape(zimage, image.shape)
+
+        new_list.append(image)
+
+    return new_list
+
+
+_randomFlip_transform_list = [
+    # lambda a: np.rot90(a, axes=(0, 1)),
+    # lambda a: np.flip(a, 0),
+    lambda a: np.flip(a, 1),
+]
+
+def randomFlip(image_list, transform_bits=None):
+    if transform_bits is None:
+        transform_bits = random.randrange(0, 2 ** len(_randomFlip_transform_list))
+
+    new_list = []
+    for image in image_list:
+        # assert image.shape[-1] in {1, 3, 4}, repr(image.shape)
+
+        for n in range(len(_randomFlip_transform_list)):
+            if transform_bits & 2**n:
+                # prhist(image, 'before')
+                image = _randomFlip_transform_list[n](image)
+                # prhist(image, 'after ')
+
+        new_list.append(image)
+
+    return new_list
+
+
+def randomSpin(image_list, angle=None, range_tup=None, axes=(0, 1)):
+    if range_tup is None:
+        range_tup = (0, 360)
+
+    if angle is None:
+        angle = range_tup[0] + (range_tup[1] - range_tup[0]) * random.random()
+
+    new_list = []
+    for image in image_list:
+        # assert image.shape[-1] in {1, 3, 4}, repr(image.shape)
+
+        image = scipy.ndimage.interpolation.rotate(
+                image, angle, axes=axes, reshape=False,
+                output=None, order=0, mode='nearest', cval=0.0, prefilter=True)
+
+        new_list.append(image)
+
+    return new_list
+
+
+def randomNoise(image_list, noise_min=-0.1, noise_max=0.1):
+    noise = np.zeros_like(image_list[0])
+    noise += (noise_max - noise_min) * np.random.random_sample(image_list[0].shape) + noise_min
+    noise *= 5
+    noise = scipy.ndimage.filters.gaussian_filter(noise, 3)
+    # noise += (noise_max - noise_min) * np.random.random_sample(image_hsv.shape) + noise_min
+
+    new_list = []
+    for image_hsv in image_list:
+        image_hsv = image_hsv + noise
+
+        new_list.append(image_hsv)
+
+    return new_list
+
+
+def randomHsvShift(image_list, h=None, s=None, v=None,
+                   h_min=-0.1, h_max=0.1,
+                   s_min=0.5, s_max=2.0,
+                   v_min=0.5, v_max=2.0):
+    if h is None:
+        h = h_min + (h_max - h_min) * random.random()
+    if s is None:
+        s = s_min + (s_max - s_min) * random.random()
+    if v is None:
+        v = v_min + (v_max - v_min) * random.random()
+
+    new_list = []
+    for image_hsv in image_list:
+        # assert image_hsv.shape[-1] == 3, repr(image_hsv.shape)
+
+        image_hsv[:,:,0::3] += h
+        image_hsv[:,:,1::3] = image_hsv[:,:,1::3] ** s
+        image_hsv[:,:,2::3] = image_hsv[:,:,2::3] ** v
+
+        new_list.append(image_hsv)
+
+    return clampHsv(new_list)
+
+
+def clampHsv(image_list):
+    new_list = []
+    for image_hsv in image_list:
+        image_hsv = image_hsv.clone()
+
+        # Hue wraps around
+        image_hsv[:,:,0][image_hsv[:,:,0] > 1] -= 1
+        image_hsv[:,:,0][image_hsv[:,:,0] < 0] += 1
+
+        # Everything else clamps between 0 and 1
+        image_hsv[image_hsv > 1] = 1
+        image_hsv[image_hsv < 0] = 0
+
+        new_list.append(image_hsv)
+
+    return new_list
+
+
+# def torch_augment(input):
+#     theta = random.random() * math.pi * 2
+#     s = math.sin(theta)
+#     c = math.cos(theta)
+#     c1 = 1 - c
+#     axis_vector = torch.rand(3, device='cpu', dtype=torch.float64)
+#     axis_vector -= 0.5
+#     axis_vector /= axis_vector.abs().sum()
+#     l, m, n = axis_vector
+#
+#     matrix = torch.tensor([
+#         [l*l*c1 +   c, m*l*c1 - n*s, n*l*c1 + m*s, 0],
+#         [l*m*c1 + n*s, m*m*c1 +   c, n*m*c1 - l*s, 0],
+#         [l*n*c1 - m*s, m*n*c1 + l*s, n*n*c1 +   c, 0],
+#         [0, 0, 0, 1],
+#     ], device=input.device, dtype=torch.float32)
+#
+#     return th_affine3d(input, matrix)
+
+
+
+
+# following from https://github.com/ncullen93/torchsample/blob/master/torchsample/utils.py
+# MIT licensed
+
+# def th_affine3d(input, matrix):
+#     """
+#     3D Affine image transform on torch.Tensor
+#     """
+#     A = matrix[:3,:3]
+#     b = matrix[:3,3]
+#
+#     # make a meshgrid of normal coordinates
+#     coords = th_iterproduct(input.size(-3), input.size(-2), input.size(-1), dtype=torch.float32)
+#
+#     # shift the coordinates so center is the origin
+#     coords[:,0] = coords[:,0] - (input.size(-3) / 2. - 0.5)
+#     coords[:,1] = coords[:,1] - (input.size(-2) / 2. - 0.5)
+#     coords[:,2] = coords[:,2] - (input.size(-1) / 2. - 0.5)
+#
+#     # apply the coordinate transformation
+#     new_coords = coords.mm(A.t().contiguous()) + b.expand_as(coords)
+#
+#     # shift the coordinates back so origin is origin
+#     new_coords[:,0] = new_coords[:,0] + (input.size(-3) / 2. - 0.5)
+#     new_coords[:,1] = new_coords[:,1] + (input.size(-2) / 2. - 0.5)
+#     new_coords[:,2] = new_coords[:,2] + (input.size(-1) / 2. - 0.5)
+#
+#     # map new coordinates using bilinear interpolation
+#     input_transformed = th_trilinear_interp3d(input, new_coords)
+#
+#     return input_transformed
+#
+#
+# def th_trilinear_interp3d(input, coords):
+#     """
+#     trilinear interpolation of 3D torch.Tensor image
+#     """
+#     # take clamp then floor/ceil of x coords
+#     x = torch.clamp(coords[:,0], 0, input.size(-3)-2)
+#     x0 = x.floor()
+#     x1 = x0 + 1
+#     # take clamp then floor/ceil of y coords
+#     y = torch.clamp(coords[:,1], 0, input.size(-2)-2)
+#     y0 = y.floor()
+#     y1 = y0 + 1
+#     # take clamp then floor/ceil of z coords
+#     z = torch.clamp(coords[:,2], 0, input.size(-1)-2)
+#     z0 = z.floor()
+#     z1 = z0 + 1
+#
+#     stride = torch.tensor(input.stride()[-3:], dtype=torch.int64, device=input.device)
+#     x0_ix = x0.mul(stride[0]).long()
+#     x1_ix = x1.mul(stride[0]).long()
+#     y0_ix = y0.mul(stride[1]).long()
+#     y1_ix = y1.mul(stride[1]).long()
+#     z0_ix = z0.mul(stride[2]).long()
+#     z1_ix = z1.mul(stride[2]).long()
+#
+#     # input_flat = th_flatten(input)
+#     input_flat = x.contiguous().view(x[0], x[1], -1)
+#
+#     vals_000 = input_flat[:, :, x0_ix+y0_ix+z0_ix]
+#     vals_001 = input_flat[:, :, x0_ix+y0_ix+z1_ix]
+#     vals_010 = input_flat[:, :, x0_ix+y1_ix+z0_ix]
+#     vals_011 = input_flat[:, :, x0_ix+y1_ix+z1_ix]
+#     vals_100 = input_flat[:, :, x1_ix+y0_ix+z0_ix]
+#     vals_101 = input_flat[:, :, x1_ix+y0_ix+z1_ix]
+#     vals_110 = input_flat[:, :, x1_ix+y1_ix+z0_ix]
+#     vals_111 = input_flat[:, :, x1_ix+y1_ix+z1_ix]
+#
+#     xd = x - x0
+#     yd = y - y0
+#     zd = z - z0
+#     xm1 = 1 - xd
+#     ym1 = 1 - yd
+#     zm1 = 1 - zd
+#
+#     x_mapped = (
+#             vals_000.mul(xm1).mul(ym1).mul(zm1) +
+#             vals_001.mul(xm1).mul(ym1).mul(zd) +
+#             vals_010.mul(xm1).mul(yd).mul(zm1) +
+#             vals_011.mul(xm1).mul(yd).mul(zd) +
+#             vals_100.mul(xd).mul(ym1).mul(zm1) +
+#             vals_101.mul(xd).mul(ym1).mul(zd) +
+#             vals_110.mul(xd).mul(yd).mul(zm1) +
+#             vals_111.mul(xd).mul(yd).mul(zd)
+#     )
+#
+#     return x_mapped.view_as(input)
+#
+# def th_iterproduct(*args, dtype=None):
+#     return torch.from_numpy(np.indices(args).reshape((len(args),-1)).T)
+#
+# def th_flatten(x):
+#     """Flatten tensor"""
+#     return x.contiguous().view(x[0], x[1], -1)
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/util/disk.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/util/disk.py
@ -0,0 +1,136 @@
+import gzip
+
+from diskcache import FanoutCache, Disk
+from diskcache.core import BytesType, MODE_BINARY, BytesIO
+
+from util.logconf import logging
+log = logging.getLogger(__name__)
+# log.setLevel(logging.WARN)
+log.setLevel(logging.INFO)
+# log.setLevel(logging.DEBUG)
+
+
+class GzipDisk(Disk):
+    def store(self, value, read, key=None):
+        """
+        Override from base class diskcache.Disk.
+
+        Chunking is due to needing to work on pythons < 2.7.13:
+        - Issue #27130: In the "zlib" module, fix handling of large buffers
+          (typically 2 or 4 GiB).  Previously, inputs were limited to 2 GiB, and
+          compression and decompression operations did not properly handle results of
+          2 or 4 GiB.
+
+        :param value: value to convert
+        :param bool read: True when value is file-like object
+        :return: (size, mode, filename, value) tuple for Cache table
+        """
+        # pylint: disable=unidiomatic-typecheck
+        if type(value) is BytesType:
+            if read:
+                value = value.read()
+                read = False
+
+            str_io = BytesIO()
+            gz_file = gzip.GzipFile(mode='wb', compresslevel=1, fileobj=str_io)
+
+            for offset in range(0, len(value), 2**30):
+                gz_file.write(value[offset:offset+2**30])
+            gz_file.close()
+
+            value = str_io.getvalue()
+
+        return super(GzipDisk, self).store(value, read)
+
+
+    def fetch(self, mode, filename, value, read):
+        """
+        Override from base class diskcache.Disk.
+
+        Chunking is due to needing to work on pythons < 2.7.13:
+        - Issue #27130: In the "zlib" module, fix handling of large buffers
+          (typically 2 or 4 GiB).  Previously, inputs were limited to 2 GiB, and
+          compression and decompression operations did not properly handle results of
+          2 or 4 GiB.
+
+        :param int mode: value mode raw, binary, text, or pickle
+        :param str filename: filename of corresponding value
+        :param value: database value
+        :param bool read: when True, return an open file handle
+        :return: corresponding Python value
+        """
+        value = super(GzipDisk, self).fetch(mode, filename, value, read)
+
+        if mode == MODE_BINARY:
+            str_io = BytesIO(value)
+            gz_file = gzip.GzipFile(mode='rb', fileobj=str_io)
+            read_csio = BytesIO()
+
+            while True:
+                uncompressed_data = gz_file.read(2**30)
+                if uncompressed_data:
+                    read_csio.write(uncompressed_data)
+                else:
+                    break
+
+            value = read_csio.getvalue()
+
+        return value
+
+def getCache(scope_str):
+    return FanoutCache('data-unversioned/cache/' + scope_str,
+                       disk=GzipDisk,
+                       shards=64,
+                       timeout=1,
+                       size_limit=3e11,
+                       # disk_min_file_size=2**20,
+                       )
+
+# def disk_cache(base_path, memsize=2):
+#     def disk_cache_decorator(f):
+#         @functools.wraps(f)
+#         def wrapper(*args, **kwargs):
+#             args_str = repr(args) + repr(sorted(kwargs.items()))
+#             file_str = hashlib.md5(args_str.encode('utf8')).hexdigest()
+#
+#             cache_path = os.path.join(base_path, f.__name__, file_str + '.pkl.gz')
+#
+#             if not os.path.exists(os.path.dirname(cache_path)):
+#                 os.makedirs(os.path.dirname(cache_path), exist_ok=True)
+#
+#             if os.path.exists(cache_path):
+#                 return pickle_loadgz(cache_path)
+#             else:
+#                 ret = f(*args, **kwargs)
+#                 pickle_dumpgz(cache_path, ret)
+#                 return ret
+#
+#         return wrapper
+#
+#     return disk_cache_decorator
+#
+#
+# def pickle_dumpgz(file_path, obj):
+#     log.debug("Writing {}".format(file_path))
+#     with open(file_path, 'wb') as file_obj:
+#         with gzip.GzipFile(mode='wb', compresslevel=1, fileobj=file_obj) as gz_file:
+#             pickle.dump(obj, gz_file, pickle.HIGHEST_PROTOCOL)
+#
+#
+# def pickle_loadgz(file_path):
+#     log.debug("Reading {}".format(file_path))
+#     with open(file_path, 'rb') as file_obj:
+#         with gzip.GzipFile(mode='rb', fileobj=file_obj) as gz_file:
+#             return pickle.load(gz_file)
+#
+#
+# def dtpath(dt=None):
+#     if dt is None:
+#         dt = datetime.datetime.now()
+#
+#     return str(dt).rsplit('.', 1)[0].replace(' ', '--').replace(':', '.')
+#
+#
+# def safepath(s):
+#     s = s.replace(' ', '_')
+#     return re.sub('[^A-Za-z0-9_.-]', '', s)
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/util/logconf.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/util/logconf.py
@ -0,0 +1,19 @@
+import logging
+import logging.handlers
+
+root_logger = logging.getLogger()
+root_logger.setLevel(logging.INFO)
+
+# Some libraries attempt to add their own root logger handlers. This is
+# annoying and so we get rid of them.
+for handler in list(root_logger.handlers):
+    root_logger.removeHandler(handler)
+
+logfmt_str = "%(asctime)s %(levelname)-8s pid:%(process)d %(name)s:%(lineno)03d:%(funcName)s %(message)s"
+formatter = logging.Formatter(logfmt_str)
+
+streamHandler = logging.StreamHandler()
+streamHandler.setFormatter(formatter)
+streamHandler.setLevel(logging.DEBUG)
+
+root_logger.addHandler(streamHandler)
--- a/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/util/unet.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/lungCancer/util/unet.py
@ -0,0 +1,143 @@
+# From https://github.com/jvanvugt/pytorch-unet
+# https://raw.githubusercontent.com/jvanvugt/pytorch-unet/master/unet.py
+
+# MIT License
+#
+# Copyright (c) 2018 Joris
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+# Adapted from https://discuss.pytorch.org/t/unet-implementation/426
+
+import torch
+from torch import nn
+import torch.nn.functional as F
+
+
+class UNet(nn.Module):
+    def __init__(self, in_channels=1, n_classes=2, depth=5, wf=6, padding=False,
+                 batch_norm=False, up_mode='upconv'):
+        """
+        Implementation of
+        U-Net: Convolutional Networks for Biomedical Image Segmentation
+        (Ronneberger et al., 2015)
+        https://arxiv.org/abs/1505.04597
+
+        Using the default arguments will yield the exact version used
+        in the original paper
+
+        Args:
+            in_channels (int): number of input channels
+            n_classes (int): number of output channels
+            depth (int): depth of the network
+            wf (int): number of filters in the first layer is 2**wf
+            padding (bool): if True, apply padding such that the input shape
+                            is the same as the output.
+                            This may introduce artifacts
+            batch_norm (bool): Use BatchNorm after layers with an
+                               activation function
+            up_mode (str): one of 'upconv' or 'upsample'.
+                           'upconv' will use transposed convolutions for
+                           learned upsampling.
+                           'upsample' will use bilinear upsampling.
+        """
+        super(UNet, self).__init__()
+        assert up_mode in ('upconv', 'upsample')
+        self.padding = padding
+        self.depth = depth
+        prev_channels = in_channels
+        self.down_path = nn.ModuleList()
+        for i in range(depth):
+            self.down_path.append(UNetConvBlock(prev_channels, 2**(wf+i),
+                                                padding, batch_norm))
+            prev_channels = 2**(wf+i)
+
+        self.up_path = nn.ModuleList()
+        for i in reversed(range(depth - 1)):
+            self.up_path.append(UNetUpBlock(prev_channels, 2**(wf+i), up_mode,
+                                            padding, batch_norm))
+            prev_channels = 2**(wf+i)
+
+        self.last = nn.Conv2d(prev_channels, n_classes, kernel_size=1)
+
+    def forward(self, x):
+        blocks = []
+        for i, down in enumerate(self.down_path):
+            x = down(x)
+            if i != len(self.down_path)-1:
+                blocks.append(x)
+                x = F.avg_pool2d(x, 2)
+
+        for i, up in enumerate(self.up_path):
+            x = up(x, blocks[-i-1])
+
+        return self.last(x)
+
+
+class UNetConvBlock(nn.Module):
+    def __init__(self, in_size, out_size, padding, batch_norm):
+        super(UNetConvBlock, self).__init__()
+        block = []
+
+        block.append(nn.Conv2d(in_size, out_size, kernel_size=3,
+                               padding=int(padding)))
+        block.append(nn.ReLU())
+        # block.append(nn.LeakyReLU())
+        if batch_norm:
+            block.append(nn.BatchNorm2d(out_size))
+
+        block.append(nn.Conv2d(out_size, out_size, kernel_size=3,
+                               padding=int(padding)))
+        block.append(nn.ReLU())
+        # block.append(nn.LeakyReLU())
+        if batch_norm:
+            block.append(nn.BatchNorm2d(out_size))
+
+        self.block = nn.Sequential(*block)
+
+    def forward(self, x):
+        out = self.block(x)
+        return out
+
+
+class UNetUpBlock(nn.Module):
+    def __init__(self, in_size, out_size, up_mode, padding, batch_norm):
+        super(UNetUpBlock, self).__init__()
+        if up_mode == 'upconv':
+            self.up = nn.ConvTranspose2d(in_size, out_size, kernel_size=2,
+                                         stride=2)
+        elif up_mode == 'upsample':
+            self.up = nn.Sequential(nn.Upsample(mode='bilinear', scale_factor=2),
+                                    nn.Conv2d(in_size, out_size, kernel_size=1))
+
+        self.conv_block = UNetConvBlock(in_size, out_size, padding, batch_norm)
+
+    def center_crop(self, layer, target_size):
+        _, _, layer_height, layer_width = layer.size()
+        diff_y = (layer_height - target_size[0]) // 2
+        diff_x = (layer_width - target_size[1]) // 2
+        return layer[:, :, diff_y:(diff_y + target_size[0]), diff_x:(diff_x + target_size[1])]
+
+    def forward(self, x, bridge):
+        up = self.up(x)
+        crop1 = self.center_crop(bridge, up.shape[2:])
+        out = torch.cat([up, crop1], 1)
+        out = self.conv_block(out)
+
+        return out
--- a/pages/students/2016/lukas_pokryvka/dp2021/mnist/mnist-dist.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/mnist/mnist-dist.py
@ -0,0 +1,105 @@
+import os
+from datetime import datetime
+import argparse
+import torch.multiprocessing as mp
+import torchvision
+import torchvision.transforms as transforms
+import torch
+import torch.nn as nn
+import torch.distributed as dist
+from apex.parallel import DistributedDataParallel as DDP
+from apex import amp
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-n', '--nodes', default=1, type=int, metavar='N',
+                        help='number of data loading workers (default: 4)')
+    parser.add_argument('-g', '--gpus', default=1, type=int,
+                        help='number of gpus per node')
+    parser.add_argument('-nr', '--nr', default=0, type=int,
+                        help='ranking within the nodes')
+    parser.add_argument('--epochs', default=2, type=int, metavar='N',
+                        help='number of total epochs to run')
+    args = parser.parse_args()
+    args.world_size = args.gpus * args.nodes
+    os.environ['MASTER_ADDR'] = '147.232.47.114'
+    os.environ['MASTER_PORT'] = '8888'
+    mp.spawn(train, nprocs=args.gpus, args=(args,))
+
+
+class ConvNet(nn.Module):
+    def __init__(self, num_classes=10):
+        super(ConvNet, self).__init__()
+        self.layer1 = nn.Sequential(
+            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
+            nn.BatchNorm2d(16),
+            nn.ReLU(),
+            nn.MaxPool2d(kernel_size=2, stride=2))
+        self.layer2 = nn.Sequential(
+            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
+            nn.BatchNorm2d(32),
+            nn.ReLU(),
+            nn.MaxPool2d(kernel_size=2, stride=2))
+        self.fc = nn.Linear(7*7*32, num_classes)
+
+    def forward(self, x):
+        out = self.layer1(x)
+        out = self.layer2(out)
+        out = out.reshape(out.size(0), -1)
+        out = self.fc(out)
+        return out
+
+
+def train(gpu, args):
+    rank = args.nr * args.gpus + gpu
+    dist.init_process_group(backend='nccl', init_method='env://', world_size=args.world_size, rank=rank)
+    torch.manual_seed(0)
+    model = ConvNet()
+    torch.cuda.set_device(gpu)
+    model.cuda(gpu)
+    batch_size = 10
+    # define loss function (criterion) and optimizer
+    criterion = nn.CrossEntropyLoss().cuda(gpu)
+    optimizer = torch.optim.SGD(model.parameters(), 1e-4)
+    # Wrap the model
+    model = nn.parallel.DistributedDataParallel(model, device_ids=[gpu])
+    # Data loading code
+    train_dataset = torchvision.datasets.MNIST(root='./data',
+                                               train=True,
+                                               transform=transforms.ToTensor(),
+                                               download=True)
+    train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset,
+                                                                    num_replicas=args.world_size,
+                                                                    rank=rank)
+    train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
+                                               batch_size=batch_size,
+                                               shuffle=False,
+                                               num_workers=0,
+                                               pin_memory=True,
+                                               sampler=train_sampler)
+
+    start = datetime.now()
+    total_step = len(train_loader)
+    for epoch in range(args.epochs):
+        for i, (images, labels) in enumerate(train_loader):
+            images = images.cuda(non_blocking=True)
+            labels = labels.cuda(non_blocking=True)
+            # Forward pass
+            outputs = model(images)
+            loss = criterion(outputs, labels)
+
+            # Backward and optimize
+            optimizer.zero_grad()
+            loss.backward()
+            optimizer.step()
+            if (i + 1) % 100 == 0 and gpu == 0:
+                print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch + 1, args.epochs, i + 1, total_step,
+                                                                         loss.item()))
+    if gpu == 0:
+        print("Training complete in: " + str(datetime.now() - start))
+
+
+if __name__ == '__main__':
+    torch.multiprocessing.set_start_method('spawn')
+    main()
--- a/pages/students/2016/lukas_pokryvka/dp2021/mnist/mnist.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/mnist/mnist.py
@ -0,0 +1,92 @@
+import os
+from datetime import datetime
+import argparse
+import torch.multiprocessing as mp
+import torchvision
+import torchvision.transforms as transforms
+import torch
+import torch.nn as nn
+import torch.distributed as dist
+from apex.parallel import DistributedDataParallel as DDP
+from apex import amp
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-n', '--nodes', default=1, type=int, metavar='N',
+                        help='number of data loading workers (default: 4)')
+    parser.add_argument('-g', '--gpus', default=1, type=int,
+                        help='number of gpus per node')
+    parser.add_argument('-nr', '--nr', default=0, type  =int,
+                        help='ranking within the nodes')
+    parser.add_argument('--epochs', default=2, type=int, metavar='N',
+                        help='number of total epochs to run')
+    args = parser.parse_args()
+    train(0, args)
+
+
+class ConvNet(nn.Module):
+    def __init__(self, num_classes=10):
+        super(ConvNet, self).__init__()
+        self.layer1 = nn.Sequential(
+            nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
+            nn.BatchNorm2d(16),
+            nn.ReLU(),
+            nn.MaxPool2d(kernel_size=2, stride=2))
+        self.layer2 = nn.Sequential(
+            nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
+            nn.BatchNorm2d(32),
+            nn.ReLU(),
+            nn.MaxPool2d(kernel_size=2, stride=2))
+        self.fc = nn.Linear(7*7*32, num_classes)
+
+    def forward(self, x):
+        out = self.layer1(x)
+        out = self.layer2(out)
+        out = out.reshape(out.size(0), -1)
+        out = self.fc(out)
+        return out
+
+
+def train(gpu, args):
+    model = ConvNet()
+    torch.cuda.set_device(gpu)
+    model.cuda(gpu)
+    batch_size = 50
+    # define loss function (criterion) and optimizer
+    criterion = nn.CrossEntropyLoss().cuda(gpu)
+    optimizer = torch.optim.SGD(model.parameters(), 1e-4)
+    # Data loading code
+    train_dataset = torchvision.datasets.MNIST(root='./data',
+                                               train=True,
+                                               transform=transforms.ToTensor(),
+                                               download=True)
+    train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
+                                               batch_size=batch_size,
+                                               shuffle=True,
+                                               num_workers=0,
+                                               pin_memory=True)
+
+    start = datetime.now()
+    total_step = len(train_loader)
+    for epoch in range(args.epochs):
+        for i, (images, labels) in enumerate(train_loader):
+            images = images.cuda(non_blocking=True)
+            labels = labels.cuda(non_blocking=True)
+            # Forward pass
+            outputs = model(images)
+            loss = criterion(outputs, labels)
+
+            # Backward and optimize
+            optimizer.zero_grad()
+            loss.backward()
+            optimizer.step()
+            if (i + 1) % 100 == 0 and gpu == 0:
+                print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch + 1, args.epochs, i + 1, total_step,
+                                                                         loss.item()))
+    if gpu == 0:
+        print("Training complete in: " + str(datetime.now() - start))
+
+
+if __name__ == '__main__':
+    main()
--- a/pages/students/2016/lukas_pokryvka/dp2021/yelp/data/random.csv
+++ b/pages/students/2016/lukas_pokryvka/dp2021/yelp/data/random.csv
--- a/pages/students/2016/lukas_pokryvka/dp2021/yelp/script.py
+++ b/pages/students/2016/lukas_pokryvka/dp2021/yelp/script.py
@ -0,0 +1,748 @@
+from argparse import Namespace
+from collections import Counter
+import json
+import os
+import re
+import string
+
+import numpy as np
+import pandas as pd
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+from torch.utils.data import Dataset, DataLoader
+from tqdm.notebook import tqdm
+
+
+class Vocabulary(object):
+    """Class to process text and extract vocabulary for mapping"""
+
+    def __init__(self, token_to_idx=None, add_unk=True, unk_token="<UNK>"):
+        """
+        Args:
+            token_to_idx (dict): a pre-existing map of tokens to indices
+            add_unk (bool): a flag that indicates whether to add the UNK token
+            unk_token (str): the UNK token to add into the Vocabulary
+        """
+
+        if token_to_idx is None:
+            token_to_idx = {}
+        self._token_to_idx = token_to_idx
+
+        self._idx_to_token = {idx: token 
+                              for token, idx in self._token_to_idx.items()}
+        
+        self._add_unk = add_unk
+        self._unk_token = unk_token
+        
+        self.unk_index = -1
+        if add_unk:
+            self.unk_index = self.add_token(unk_token) 
+        
+        
+    def to_serializable(self):
+        """ returns a dictionary that can be serialized """
+        return {'token_to_idx': self._token_to_idx, 
+                'add_unk': self._add_unk, 
+                'unk_token': self._unk_token}
+
+    @classmethod
+    def from_serializable(cls, contents):
+        """ instantiates the Vocabulary from a serialized dictionary """
+        return cls(**contents)
+
+    def add_token(self, token):
+        """Update mapping dicts based on the token.
+
+        Args:
+            token (str): the item to add into the Vocabulary
+        Returns:
+            index (int): the integer corresponding to the token
+        """
+        if token in self._token_to_idx:
+            index = self._token_to_idx[token]
+        else:
+            index = len(self._token_to_idx)
+            self._token_to_idx[token] = index
+            self._idx_to_token[index] = token
+        return index
+    
+    def add_many(self, tokens):
+        """Add a list of tokens into the Vocabulary
+        
+        Args:
+            tokens (list): a list of string tokens
+        Returns:
+            indices (list): a list of indices corresponding to the tokens
+        """
+        return [self.add_token(token) for token in tokens]
+
+    def lookup_token(self, token):
+        """Retrieve the index associated with the token 
+          or the UNK index if token isn't present.
+        
+        Args:
+            token (str): the token to look up 
+        Returns:
+            index (int): the index corresponding to the token
+        Notes:
+            `unk_index` needs to be >=0 (having been added into the Vocabulary) 
+              for the UNK functionality 
+        """
+        if self.unk_index >= 0:
+            return self._token_to_idx.get(token, self.unk_index)
+        else:
+            return self._token_to_idx[token]
+
+    def lookup_index(self, index):
+        """Return the token associated with the index
+        
+        Args: 
+            index (int): the index to look up
+        Returns:
+            token (str): the token corresponding to the index
+        Raises:
+            KeyError: if the index is not in the Vocabulary
+        """
+        if index not in self._idx_to_token:
+            raise KeyError("the index (%d) is not in the Vocabulary" % index)
+        return self._idx_to_token[index]
+
+    def __str__(self):
+        return "<Vocabulary(size=%d)>" % len(self)
+
+    def __len__(self):
+        return len(self._token_to_idx)
+
+
+
+
+class ReviewVectorizer(object):
+    """ The Vectorizer which coordinates the Vocabularies and puts them to use"""
+    def __init__(self, review_vocab, rating_vocab):
+        """
+        Args:
+            review_vocab (Vocabulary): maps words to integers
+            rating_vocab (Vocabulary): maps class labels to integers
+        """
+        self.review_vocab = review_vocab
+        self.rating_vocab = rating_vocab
+
+    def vectorize(self, review):
+        """Create a collapsed one-hit vector for the review
+        
+        Args:
+            review (str): the review 
+        Returns:
+            one_hot (np.ndarray): the collapsed one-hot encoding 
+        """
+        one_hot = np.zeros(len(self.review_vocab), dtype=np.float32)
+        
+        for token in review.split(" "):
+            if token not in string.punctuation:
+                one_hot[self.review_vocab.lookup_token(token)] = 1
+
+        return one_hot
+
+    @classmethod
+    def from_dataframe(cls, review_df, cutoff=25):
+        """Instantiate the vectorizer from the dataset dataframe
+        
+        Args:
+            review_df (pandas.DataFrame): the review dataset
+            cutoff (int): the parameter for frequency-based filtering
+        Returns:
+            an instance of the ReviewVectorizer
+        """
+        review_vocab = Vocabulary(add_unk=True)
+        rating_vocab = Vocabulary(add_unk=False)
+        
+        # Add ratings
+        for rating in sorted(set(review_df.rating)):
+            rating_vocab.add_token(rating)
+
+        # Add top words if count > provided count
+        word_counts = Counter()
+        for review in review_df.review:
+            for word in review.split(" "):
+                if word not in string.punctuation:
+                    word_counts[word] += 1
+               
+        for word, count in word_counts.items():
+            if count > cutoff:
+                review_vocab.add_token(word)
+
+        return cls(review_vocab, rating_vocab)
+
+    @classmethod
+    def from_serializable(cls, contents):
+        """Instantiate a ReviewVectorizer from a serializable dictionary
+        
+        Args:
+            contents (dict): the serializable dictionary
+        Returns:
+            an instance of the ReviewVectorizer class
+        """
+        review_vocab = Vocabulary.from_serializable(contents['review_vocab'])
+        rating_vocab =  Vocabulary.from_serializable(contents['rating_vocab'])
+
+        return cls(review_vocab=review_vocab, rating_vocab=rating_vocab)
+
+    def to_serializable(self):
+        """Create the serializable dictionary for caching
+        
+        Returns:
+            contents (dict): the serializable dictionary
+        """
+        return {'review_vocab': self.review_vocab.to_serializable(),
+                'rating_vocab': self.rating_vocab.to_serializable()}
+
+
+
+class ReviewDataset(Dataset):
+    def __init__(self, review_df, vectorizer):
+        """
+        Args:
+            review_df (pandas.DataFrame): the dataset
+            vectorizer (ReviewVectorizer): vectorizer instantiated from dataset
+        """
+        self.review_df = review_df
+        self._vectorizer = vectorizer
+
+        self.train_df = self.review_df[self.review_df.split=='train']
+        self.train_size = len(self.train_df)
+
+        self.val_df = self.review_df[self.review_df.split=='val']
+        self.validation_size = len(self.val_df)
+
+        self.test_df = self.review_df[self.review_df.split=='test']
+        self.test_size = len(self.test_df)
+
+        self._lookup_dict = {'train': (self.train_df, self.train_size),
+                             'val': (self.val_df, self.validation_size),
+                             'test': (self.test_df, self.test_size)}
+
+        self.set_split('train')
+
+    @classmethod
+    def load_dataset_and_make_vectorizer(cls, review_csv):
+        """Load dataset and make a new vectorizer from scratch
+        
+        Args:
+            review_csv (str): location of the dataset
+        Returns:
+            an instance of ReviewDataset
+        """
+        review_df = pd.read_csv(review_csv)
+        train_review_df = review_df[review_df.split=='train']
+        return cls(review_df, ReviewVectorizer.from_dataframe(train_review_df))
+    
+    @classmethod
+    def load_dataset_and_load_vectorizer(cls, review_csv, vectorizer_filepath):
+        """Load dataset and the corresponding vectorizer. 
+        Used in the case in the vectorizer has been cached for re-use
+        
+        Args:
+            review_csv (str): location of the dataset
+            vectorizer_filepath (str): location of the saved vectorizer
+        Returns:
+            an instance of ReviewDataset
+        """
+        review_df = pd.read_csv(review_csv)
+        vectorizer = cls.load_vectorizer_only(vectorizer_filepath)
+        return cls(review_df, vectorizer)
+
+    @staticmethod
+    def load_vectorizer_only(vectorizer_filepath):
+        """a static method for loading the vectorizer from file
+        
+        Args:
+            vectorizer_filepath (str): the location of the serialized vectorizer
+        Returns:
+            an instance of ReviewVectorizer
+        """
+        with open(vectorizer_filepath) as fp:
+            return ReviewVectorizer.from_serializable(json.load(fp))
+
+    def save_vectorizer(self, vectorizer_filepath):
+        """saves the vectorizer to disk using json
+        
+        Args:
+            vectorizer_filepath (str): the location to save the vectorizer
+        """
+        with open(vectorizer_filepath, "w") as fp:
+            json.dump(self._vectorizer.to_serializable(), fp)
+
+    def get_vectorizer(self):
+        """ returns the vectorizer """
+        return self._vectorizer
+
+    def set_split(self, split="train"):
+        """ selects the splits in the dataset using a column in the dataframe 
+        
+        Args:
+            split (str): one of "train", "val", or "test"
+        """
+        self._target_split = split
+        self._target_df, self._target_size = self._lookup_dict[split]
+
+    def __len__(self):
+        return self._target_size
+
+    def __getitem__(self, index):
+        """the primary entry point method for PyTorch datasets
+        
+        Args:
+            index (int): the index to the data point 
+        Returns:
+            a dictionary holding the data point's features (x_data) and label (y_target)
+        """
+        row = self._target_df.iloc[index]
+
+        review_vector = \
+            self._vectorizer.vectorize(row.review)
+
+        rating_index = \
+            self._vectorizer.rating_vocab.lookup_token(row.rating)
+
+        return {'x_data': review_vector,
+                'y_target': rating_index}
+
+    def get_num_batches(self, batch_size):
+        """Given a batch size, return the number of batches in the dataset
+        
+        Args:
+            batch_size (int)
+        Returns:
+            number of batches in the dataset
+        """
+        return len(self) // batch_size  
+    
+def generate_batches(dataset, batch_size, shuffle=True,
+                     drop_last=True, device="cpu"):
+    """
+    A generator function which wraps the PyTorch DataLoader. It will 
+      ensure each tensor is on the write device location.
+    """
+    dataloader = DataLoader(dataset=dataset, batch_size=batch_size,
+                            shuffle=shuffle, drop_last=drop_last)
+
+    for data_dict in dataloader:
+        out_data_dict = {}
+        for name, tensor in data_dict.items():
+            out_data_dict[name] = data_dict[name].to(device)
+        yield out_data_dict
+
+
+
+class ReviewClassifier(nn.Module):
+    """ a simple perceptron based classifier """
+    def __init__(self, num_features):
+        """
+        Args:
+            num_features (int): the size of the input feature vector
+        """
+        super(ReviewClassifier, self).__init__()
+        self.fc1 = nn.Linear(in_features=num_features, 
+                             out_features=1)
+
+    def forward(self, x_in, apply_sigmoid=False):
+        """The forward pass of the classifier
+        
+        Args:
+            x_in (torch.Tensor): an input data tensor. 
+                x_in.shape should be (batch, num_features)
+            apply_sigmoid (bool): a flag for the sigmoid activation
+                should be false if used with the Cross Entropy losses
+        Returns:
+            the resulting tensor. tensor.shape should be (batch,)
+        """
+        y_out = self.fc1(x_in).squeeze()
+        if apply_sigmoid:
+            y_out = torch.sigmoid(y_out)
+        return y_out
+
+
+
+
+def make_train_state(args):
+    return {'stop_early': False,
+            'early_stopping_step': 0,
+            'early_stopping_best_val': 1e8,
+            'learning_rate': args.learning_rate,
+            'epoch_index': 0,
+            'train_loss': [],
+            'train_acc': [],
+            'val_loss': [],
+            'val_acc': [],
+            'test_loss': -1,
+            'test_acc': -1,
+            'model_filename': args.model_state_file}
+
+def update_train_state(args, model, train_state):
+    """Handle the training state updates.
+
+    Components:
+     - Early Stopping: Prevent overfitting.
+     - Model Checkpoint: Model is saved if the model is better
+
+    :param args: main arguments
+    :param model: model to train
+    :param train_state: a dictionary representing the training state values
+    :returns:
+        a new train_state
+    """
+
+    # Save one model at least
+    if train_state['epoch_index'] == 0:
+        torch.save(model.state_dict(), train_state['model_filename'])
+        train_state['stop_early'] = False
+
+    # Save model if performance improved
+    elif train_state['epoch_index'] >= 1:
+        loss_tm1, loss_t = train_state['val_loss'][-2:]
+
+        # If loss worsened
+        if loss_t >= train_state['early_stopping_best_val']:
+            # Update step
+            train_state['early_stopping_step'] += 1
+        # Loss decreased
+        else:
+            # Save the best model
+            if loss_t < train_state['early_stopping_best_val']:
+                torch.save(model.state_dict(), train_state['model_filename'])
+
+            # Reset early stopping step
+            train_state['early_stopping_step'] = 0
+
+        # Stop early ?
+        train_state['stop_early'] = \
+            train_state['early_stopping_step'] >= args.early_stopping_criteria
+
+    return train_state
+
+def compute_accuracy(y_pred, y_target):
+    y_target = y_target.cpu()
+    y_pred_indices = (torch.sigmoid(y_pred)>0.5).cpu().long()#.max(dim=1)[1]
+    n_correct = torch.eq(y_pred_indices, y_target).sum().item()
+    return n_correct / len(y_pred_indices) * 100
+
+
+
+
+def set_seed_everywhere(seed, cuda):
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    if cuda:
+        torch.cuda.manual_seed_all(seed)
+
+def handle_dirs(dirpath):
+    if not os.path.exists(dirpath):
+        os.makedirs(dirpath)
+
+
+
+
+args = Namespace(
+    # Data and Path information
+    frequency_cutoff=25,
+    model_state_file='model.pth',
+    review_csv='data/yelp/reviews_with_splits_lite.csv',
+    # review_csv='data/yelp/reviews_with_splits_full.csv',
+    save_dir='model_storage/ch3/yelp/',
+    vectorizer_file='vectorizer.json',
+    # No Model hyper parameters
+    # Training hyper parameters
+    batch_size=128,
+    early_stopping_criteria=5,
+    learning_rate=0.001,
+    num_epochs=100,
+    seed=1337,
+    # Runtime options
+    catch_keyboard_interrupt=True,
+    cuda=True,
+    expand_filepaths_to_save_dir=True,
+    reload_from_files=False,
+)
+
+if args.expand_filepaths_to_save_dir:
+    args.vectorizer_file = os.path.join(args.save_dir,
+                                        args.vectorizer_file)
+
+    args.model_state_file = os.path.join(args.save_dir,
+                                         args.model_state_file)
+    
+    print("Expanded filepaths: ")
+    print("\t{}".format(args.vectorizer_file))
+    print("\t{}".format(args.model_state_file))
+    
+# Check CUDA
+if not torch.cuda.is_available():
+    args.cuda = False
+if torch.cuda.device_count() > 1:
+  print("Pouzivam", torch.cuda.device_count(), "graficke karty!")
+
+args.device = torch.device("cuda" if args.cuda else "cpu")
+
+# Set seed for reproducibility
+set_seed_everywhere(args.seed, args.cuda)
+
+# handle dirs
+handle_dirs(args.save_dir)
+
+
+
+
+if args.reload_from_files:
+    # training from a checkpoint
+    print("Loading dataset and vectorizer")
+    dataset = ReviewDataset.load_dataset_and_load_vectorizer(args.review_csv,
+                                                            args.vectorizer_file)
+else:
+    print("Loading dataset and creating vectorizer")
+    # create dataset and vectorizer
+    dataset = ReviewDataset.load_dataset_and_make_vectorizer(args.review_csv)
+    dataset.save_vectorizer(args.vectorizer_file)    
+vectorizer = dataset.get_vectorizer()
+
+classifier = ReviewClassifier(num_features=len(vectorizer.review_vocab))
+
+
+
+classifier = nn.DataParallel(classifier)
+classifier = classifier.to(args.device)
+
+loss_func = nn.BCEWithLogitsLoss()
+optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate)
+scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer,
+                                                 mode='min', factor=0.5,
+                                                 patience=1)
+
+train_state = make_train_state(args)
+
+epoch_bar = tqdm(desc='training routine', 
+                          total=args.num_epochs,
+                          position=0)
+
+dataset.set_split('train')
+train_bar = tqdm(desc='split=train',
+                          total=dataset.get_num_batches(args.batch_size), 
+                          position=1, 
+                          leave=True)
+dataset.set_split('val')
+val_bar = tqdm(desc='split=val',
+                        total=dataset.get_num_batches(args.batch_size), 
+                        position=1, 
+                        leave=True)
+
+try:
+    for epoch_index in range(args.num_epochs):
+        train_state['epoch_index'] = epoch_index
+
+        # Iterate over training dataset
+
+        # setup: batch generator, set loss and acc to 0, set train mode on
+        dataset.set_split('train')
+        batch_generator = generate_batches(dataset, 
+                                           batch_size=args.batch_size, 
+                                           device=args.device)
+        running_loss = 0.0
+        running_acc = 0.0
+        classifier.train()
+
+        for batch_index, batch_dict in enumerate(batch_generator):
+            # the training routine is these 5 steps:
+
+            # --------------------------------------
+            # step 1. zero the gradients
+            optimizer.zero_grad()
+
+            # step 2. compute the output
+            y_pred = classifier(x_in=batch_dict['x_data'].float())
+
+            # step 3. compute the loss
+            loss = loss_func(y_pred, batch_dict['y_target'].float())
+            loss_t = loss.item()
+            running_loss += (loss_t - running_loss) / (batch_index + 1)
+
+            # step 4. use loss to produce gradients
+            loss.backward()
+
+            # step 5. use optimizer to take gradient step
+            optimizer.step()
+            # -----------------------------------------
+            # compute the accuracy
+            acc_t = compute_accuracy(y_pred, batch_dict['y_target'])
+            running_acc += (acc_t - running_acc) / (batch_index + 1)
+
+            # update bar
+            train_bar.set_postfix(loss=running_loss, 
+                                  acc=running_acc, 
+                                  epoch=epoch_index)
+            train_bar.update()
+
+        train_state['train_loss'].append(running_loss)
+        train_state['train_acc'].append(running_acc)
+
+        # Iterate over val dataset
+
+        # setup: batch generator, set loss and acc to 0; set eval mode on
+        dataset.set_split('val')
+        batch_generator = generate_batches(dataset, 
+                                           batch_size=args.batch_size, 
+                                           device=args.device)
+        running_loss = 0.
+        running_acc = 0.
+        classifier.eval()
+
+        for batch_index, batch_dict in enumerate(batch_generator):
+
+            # compute the output
+            y_pred = classifier(x_in=batch_dict['x_data'].float())
+
+            # step 3. compute the loss
+            loss = loss_func(y_pred, batch_dict['y_target'].float())
+            loss_t = loss.item()
+            running_loss += (loss_t - running_loss) / (batch_index + 1)
+
+            # compute the accuracy
+            acc_t = compute_accuracy(y_pred, batch_dict['y_target'])
+            running_acc += (acc_t - running_acc) / (batch_index + 1)
+            
+            val_bar.set_postfix(loss=running_loss, 
+                                acc=running_acc, 
+                                epoch=epoch_index)
+            val_bar.update()
+
+        train_state['val_loss'].append(running_loss)
+        train_state['val_acc'].append(running_acc)
+
+        train_state = update_train_state(args=args, model=classifier,
+                                         train_state=train_state)
+
+        scheduler.step(train_state['val_loss'][-1])
+
+        train_bar.n = 0
+        val_bar.n = 0
+        epoch_bar.update()
+
+        if train_state['stop_early']:
+            break
+
+        train_bar.n = 0
+        val_bar.n = 0
+        epoch_bar.update()
+except KeyboardInterrupt:
+    print("Exiting loop")
+
+
+
+
+
+
+
+classifier.load_state_dict(torch.load(train_state['model_filename']))
+classifier = classifier.to(args.device)
+
+dataset.set_split('test')
+batch_generator = generate_batches(dataset, 
+                                   batch_size=args.batch_size, 
+                                   device=args.device)
+running_loss = 0.
+running_acc = 0.
+classifier.eval()
+
+for batch_index, batch_dict in enumerate(batch_generator):
+    # compute the output
+    y_pred = classifier(x_in=batch_dict['x_data'].float())
+
+    # compute the loss
+    loss = loss_func(y_pred, batch_dict['y_target'].float())
+    loss_t = loss.item()
+    running_loss += (loss_t - running_loss) / (batch_index + 1)
+
+    # compute the accuracy
+    acc_t = compute_accuracy(y_pred, batch_dict['y_target'])
+    running_acc += (acc_t - running_acc) / (batch_index + 1)
+
+train_state['test_loss'] = running_loss
+train_state['test_acc'] = running_acc
+
+
+
+
+
+
+print("Test loss: {:.3f}".format(train_state['test_loss']))
+print("Test Accuracy: {:.2f}".format(train_state['test_acc']))
+
+
+
+
+
+
+def preprocess_text(text):
+    text = text.lower()
+    text = re.sub(r"([.,!?])", r" \1 ", text)
+    text = re.sub(r"[^a-zA-Z.,!?]+", r" ", text)
+    return text
+
+
+
+
+
+def predict_rating(review, classifier, vectorizer, decision_threshold=0.5):
+    """Predict the rating of a review
+    
+    Args:
+        review (str): the text of the review
+        classifier (ReviewClassifier): the trained model
+        vectorizer (ReviewVectorizer): the corresponding vectorizer
+        decision_threshold (float): The numerical boundary which separates the rating classes
+    """
+    review = preprocess_text(review)
+    
+    vectorized_review = torch.tensor(vectorizer.vectorize(review))
+    result = classifier(vectorized_review.view(1, -1))
+    
+    probability_value = F.sigmoid(result).item()
+    index = 1
+    if probability_value < decision_threshold:
+        index = 0
+
+    return vectorizer.rating_vocab.lookup_index(index)
+
+
+
+
+
+test_review = "this is a pretty awesome book"
+
+classifier = classifier.cpu()
+prediction = predict_rating(test_review, classifier, vectorizer, decision_threshold=0.5)
+print("{} -> {}".format(test_review, prediction))
+
+
+
+
+
+# Sort weights
+fc1_weights = classifier.fc1.weight.detach()[0]
+_, indices = torch.sort(fc1_weights, dim=0, descending=True)
+indices = indices.numpy().tolist()
+
+# Top 20 words
+print("Influential words in Positive Reviews:")
+print("--------------------------------------")
+for i in range(20):
+    print(vectorizer.review_vocab.lookup_index(indices[i]))
+    
+print("====\n\n\n")
+
+# Top 20 negative words
+print("Influential words in Negative Reviews:")
+print("--------------------------------------")
+indices.reverse()
+for i in range(20):
+    print(vectorizer.review_vocab.lookup_index(indices[i]))
--- a/pages/students/2016/maros_harahus/README.md
+++ b/pages/students/2016/maros_harahus/README.md
@ -12,16 +12,42 @@ taxonomy:

 Zásobník úloh:

+- skúsiť prezentovať na lokálnej konferencii, (Data, Znalosti and WIKT) alebo fakultný zborník (krátka verzia diplomovky).
+- Využiť korpus Multext East pri trénovaní.  Vytvoriť mapovanie Multext Tagov na SNK Tagy.
+
+
+Virtuálne stretnutie 6.11.2020
+
+Stav:
+
+- Prečítané (podrobne) 2 články a urobené poznámky. Poznánky sú na GITe.
+- Dorobené ďalšie experimenty.
+
+Úlohy do ďalšieho stretnutia:
+
+- Pokračovať v otvorených úlohách.
+
+
+Virtuálne stretnutie 30.10.2020
+
+Stav:
+
+- Súbory sú na GIte
+- Vykonané experimenty, Výsledky experimentov sú v tabuľke
+- Návod na spustenie
+- Vyriešenie technických problémov. Je k dispozicíí Conda prostredie.
+
+Úlohy na ďalšie stretnutie:
+
 - Preštudovať literatúru na tému "pretrain" a "word embedding"
-    - [Healthcare NERModelsUsing Language Model Pretraining](http://ceur-ws.org/Vol-2551/paper-04.pdf)
+    - [Healthcare NER Models Using Language Model Pretraining](http://ceur-ws.org/Vol-2551/paper-04.pdf)
    - [Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy](https://ieeexplore.ieee.org/abstract/document/8909591)
    - https://arxiv.org/abs/1909.00505
    - https://arxiv.org/abs/1607.04606
    - LSTM, recurrent neural network, 
+    - Urobte si poznámky z viacerých čnánkov, poznačte si zdroj a čo ste sa dozvedeli.
 - Vykonať viacero experimentov s pretrénovaním - rôzne modely, rôzne veľkosti adaptačných dát a zostaviť tabuľku
 - Opísať pretrénovanie, zhrnúť vplyv pretrénovania na trénovanie v krátkom článku cca 10 strán.
- skúsiť prezentovať na lokálnej konferencii, (Data, Znalosti and WIKT) alebo fakultný zborník (krátka verzia diplomovky).
- Využiť korpus Multext East pri trénovaní.  Vytvoriť mapovanie Multext Tagov na SNK Tagy.


 Virtuálne stretnutie 8.10.2020
--- a/pages/students/2016/tomas_kucharik/README.md
+++ b/pages/students/2016/tomas_kucharik/README.md
@ -21,6 +21,46 @@ Cieľom práce je príprava nástrojov a budovanie tzv. "Question Answering data

 ## Diplomový projekt 2

+Zásobník úloh:
+
+- Dá sa zistiť koľko času strávil anotátor pri vytváraní otázky? Ak sa to dá zistiť z DB schémy, tak by bolo dobré to zobraziť vo webovej aplikácii.
+
+
+Virtuálne stretnutie 27.10.2020
+
+Stav:
+
+- Dorobená webová aplikácia podľa pokynov z minulého stretnutia, kódy sú na gite
+
+Úlohy na ďalšie stretnutie:
+
+- Urobiť konfiguračný systém - načítať konfiguráciu zo súboru (python-configuration?). Meno konfiguračného súboru by sa malo dať zmeniť cez premennú prostredia (getenv).
+- Dorobiť autentifikáciu pre anotátorov pre zobrazovanie výsledkov, aby anotátor videl iba svoje výsledky. Je to potrebné? Zatiaľ dorobiť iba pomocou e-mailu.
+- Dorobiť heslo na webovú aplikáciu
+- Dorobiť zobrazovanie zlých a dobrých anotácií pre každého anotátora.
+- Preštudovať odbornú literatúru na tému "Crowdsourcing language resources". Vyberte niekoľko odborných publikácií (scholar, scopus), napíšte bibliografický odkaz a čo ste sa z publikácii dozvedeli o vytváraní jazykových zdrojov. Aké iné korpusy boli touto metódou vytvorené? 
+
+
+
+
+Virtuálne stretnutie 20.10.2020
+
+Stav:
+
+- Vylepšený skript pre prípravu dát , mierna zmena rozhrania (duplicitná práca kvôli nedostatku v komunikácii).
+
+Úohy do ďalšieho stretnutia:
+
+- Dorobiť webovú aplikáciu pre zisťoovanie množstva anotovaných dát.
+- Odladiť chyby súvisiace s novou anotačnou schémou.
+- Zobraziť množstvo anotovaných dát
+- Zobraziť množstvo platných anotovaných dát.
+- Zobbraziť množstvo validovaných dát.
+- Otázky sa v rámci jedného paragrafu nesmú opakovať. Každá otázka musí mať odpoveď. Každá otázka musí byť dlhšia ako 10 znakov alebo dlhšia ako 2 slová. Odpoveď musí mať aspoň jedno slovo. Otázka musí obsahovať slovenské slová. 
+- Výsledky posielajte čím skôr do projektového repozitára, adresár database_app.
+
+
+
 Stretnutie 25.9.2020

 Urobené:
--- a/pages/students/2017/martin_jancura/README.md
+++ b/pages/students/2017/martin_jancura/README.md
@ -6,10 +6,8 @@ taxonomy:
    tag: [demo,nlp]
    author: Daniel Hladek
 ---
-
 # Martin Jancura

-
 *Rok začiatku štúdia*:  2017

 ## Bakalársky projekt 2020
@ -31,9 +29,36 @@ Možné backendy:
 Zásobník úloh:

 - Pripraviť backend.
- Pripraviť frontend v Javascripte.
+- Pripraviť frontend v Javascripte - in progress.
 - Zapisať človekom urobený preklad do databázy.

+
+Virtuálne stretnutie 6.11.2020:
+
+Stav: 
+
+Práca na písomnej časti.
+
+Úlohy do ďalšieho stretnutia:
+
+- Pohľadať takú knižnicu, kde vieme využiť vlastný preklad. Skúste si nainštalovať OpenNMT.
+- Prejdite si tutoriál https://github.com/OpenNMT/OpenNMT-py#quickstart alebo podobný.
+- Navrhnite ako prepojiť frontend a backend.
+
+
+Virtuálne stretnutie 23.10.2020:
+
+Stav:
+
+- Urobený frontend pre komunikáciu s Microsof Translation API, využíva Axios a Vanilla Javascriupt
+
+Úlohy do ďalšieho stretnutia:
+
+- Pohľadať takú knižnicu, kde vieme využiť vlastný preklad. Skúste si nainštalovať OpenNMT.
+- Zistiť čo znamená  politika CORS.
+- Pokračujte v písaní práce, pridajte časť o strojovom preklade.. Prečítajte si články https://opennmt.net/OpenNMT/references/ a urobte si poznámky. Do poznámky dajte bibliografický odkaz a čo ste sa dozvedeli z článku.
+
+
 Virtuálne stretnutie 16.10:

 Stav:
--- a/pages/students/2018/martin_wencel/README.md
+++ b/pages/students/2018/martin_wencel/README.md
@ -31,7 +31,42 @@ Návrh na zadanie:
 1. Navrhnite možné zlepšenia Vami vytvorenej aplikácie.

 Zásobník úloh:
- Vytvorte si repozitár na GITe, nazvite ho bp2010. Do neho budete dávať kódy a dokumentáciu ktorú vytvoríte.
+
+- Pripravte Docker image Vašej aplikácie podľa https://pythonspeed.com/docker/
+
+
+Virtuálne stretnutie 30.10.:
+
+Stav:
+
+- Modifikovaná existujúca aplikácia "spacy-streamlit", zdrojové kóódy sú na GITe podľa pokynov z minulého stretnutia.
+- Obsahuje aj formulár, neobsahuje REST API
+
+Úlohy do ďalšieho stretnutia:
+
+- Pokračujte v písaní. Prečítajte si odborné články na tému "dependency parsing" a vypracujte poznámky čo ste sa dozvedeli. Poznačte si zdroj.
+- Pokkračujte v práci na demonštračnej webovej aplikácii.
+
+
+Virtuálne stretnutie 19.10.:
+
+Stav: 
+
+- Vypracované a odovzdané poznámky k bakalárskej práci, obsahujú výpisy z literatúry.
+- Vytvorený repozitár. https://git.kemt.fei.tuke.sk/mw223on/bp2020
+- Nainštalovaný a spustený slovenský Spacy model.
+- Nainštalované Spacy REST Api https://github.com/explosion/spacy-services
+- Vyskúšané demo Display so slovenským modelom
+
+Úlohy na ďalšie stretnutie:
+
+- Pripravte webovú aplikáciu ktorá bude prezentovať rozpoznávanie závislostí a pomenovaných entít v slovenskom jayzyku. Mala by sa skladať z frontentu a backendu.
+- zapíšte potrebné Python balíčky do súboru "requirements.txt"
+- Vytvorte skript na inštaláciu aplikácie pomocou pip.
+- Vytvorte skript pre spustenie backendu aj frontendu. Výsledky dajte do repozitára.
+- Vytvorte návrh na frontend (HTML + CSS).
+- Pozrite na zdrojové kódy Spacy a zistite, čo presne robí príkaz display.serve
+- Vysledky dajte do repozitára.

 Virtuálne stretnutie 9.10.

--- a/pages/students/2018/ondrej_megela/README.md
+++ b/pages/students/2018/ondrej_megela/README.md
@ -20,8 +20,21 @@ Návrh na zadanie:
 2. Vytvorte jazykový model metódou BERT alebo poodobnou metódou.
 3. Vyhodnnotte vytvorený jazykový model a navrhnite zlepšenia. 

-Zásobník úloh:
+
+Virtuálne stretnutie 30.10.2020
+
+Stav:
+- Vypracované poznámky k seq2seq
+- nainštalovaný Pytorch a fairseq
+- problémy s tutoriálom. Riešenie by mohlo byť použitie release verzie 0.9.0, pip install fairseq=0.9.0
+
+Do ďďalšieho stretnutia:
+
+- Vyriešte technické porblémy
+- prejdide si tutoriál https://fairseq.readthedocs.io/en/latest/getting_started.html#training-a-new-model
 - Prejsť si tutoriál https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.md alebo podobný.
+- Preštudujte si články na tému BERT, urobte si poznámky čo ste sa dozvedeli spolu so zdrojom.
+

 Virtuálne stretnutie 16.10.2020

--- a/pages/students/2018/samuel_sirotnik/README.md
+++ b/pages/students/2018/samuel_sirotnik/README.md
@ -23,13 +23,50 @@ Pokusný klaster Raspberry Pi pre výuku klaudových technológií
 Ciel projektu je vytvoriť domáci lacný klaster pre výuku cloudových technológií.


+Zásobník úloh:
+
+- Aktivujte si technológiu WSL2 a Docker Desktop ak používate Windows.
+
+Virtuálne stretnutie 30.10.
+
+Stav:
+- vypracovaný písomný prehľad podľa pokynov
+- nainštalovaný RaspberryPI OS do Virtual\boxu
+- vypracovaný predbežný HW návrh
+- Nainšalované Docker Toolbox aj Ubuntu s Dockerom
+- Oboznámenie sa s Dockerom
+- Vedúci: vykoananý nákup HW - Dosky 5x RPi4 model B 8GB, SD Karty 128GB 11ks, the pi hut Cluster Case for raspberry pi 4ks, Zdroj 60W and 18W Quick Charger Epico 1ks. 220V kábel a zásuvka s vypínačom.
+
+Do budúceho stretnutia:
+
+- Dá sa kúpiť oficiálmy 5 portový switch?
+- Skompletizovať nákup a dohodntúť spôsob odovzdania. Podpísať preberací protokol.
+- Použite https://kind.sigs.k8s.io na simuláciu klastra.
+- Nainštalujte si https://microk8s.io/ , prečítajte tutoriály  https://ubuntu.com/tutorials/
+- Prejdite si https://kubernetes.io/docs/tutorials/hello-minikube/ alebo pododbný tutoriály
+
+
+Virtuálne stretnutie 16.10.
+
+
+Stav:
+- Prečítanie články
+- začatý tutorál Docker zo ZCT
+- vedúci vytovoril prístup na Jetson Xavier  AGX2 s ARM procesorom.
+- začatý nákup na Raspberry Pi a príslušenstvo.
+
+Úlohy do ďalšieho stretnutia
+- Vypracovať prehľad (min 4) existujúcich riešení Raspberry Pi cluster (na odovzdanie). Aký hardware a software použili?
+    - napájanie, chladenie, sieťové prepojenie
+- Oboznámte sa s https://www.raspberrypi.org/downloads/raspberry-pi-os/
+- Nainštalujte si https://roboticsbackend.com/install-raspbian-desktop-on-a-virtual-machine-virtualbox/
+- Napíšte podrobný návrh hardware pre vytvorenie Raspberry Pi Cluster.

 Stretnutie 29.9.


 Dohodli sme sa na zadaní práce.

-
 Návrhy na zlepšenie (pre vedúceho):

 - Zistiť podmienky financovania (odhad 350EUR).
--- a/pages/topics/named-entity/navod/README.md
+++ b/pages/topics/named-entity/navod/README.md
@ -39,23 +39,35 @@ Učenie prebieha tak, že v texte ukážete ktoré slová patria názvom osôb,


 Vašou úlohou bude v texte vyznačiť vlastné podstatné mená.
-Vlastné podstatné meno sa v slovenskom jazyku spravidla začína veľkým písmeno, ale môže obsahovať aj ďalšie slová písané malým písmenom. 
+Vlastné podstatné meno sa v slovenskom jazyku spravidla začína veľkým písmenom, ale môže obsahovať aj ďalšie slová písané malým písmenom. 
+Ak vlastné podstatné meno v sebe obsahuje iný názov, napr. Nové Mesto nad Váhom, anotujte ho ako jeden celok.

 - PER: mená osôb
 - LOC: geografické názvy
 - ORG: názvy organizácii
 - MISC: iné názvy, napr. názvy produktov.

-Ak vlastné podstatné meno v sebe obsahuje iný názov, napr. Nové Mesto nad Váhom, anotujte ho ako jeden celok.
+V texte narazíte aj na slová, ktoré síce pomenúvajú geografickú oblasť, avšak nejedná sa o vlastné podstatné mená (napr. britská kolónia, londýnsky šerif...). Takéto slová nepovažujeme za pomenované entity a preto Vás prosíme, aby ste ich neoznačovali.
+
+V prípade, že v texte sa nenachádzajú žiadne anotácie, tento článok je platný, a teda zvoľte možnosť Accept.
+
+V prípade, že text sa skladá iba z jedného, resp. niekoľkých slov, ktoré sami o sebe nenesú žiaden význam, tento článok je neplatný, a teda zvoľte možnosť Reject.  

 ## Anotačné dávky

 Do formulára napíšte Váš e-mail aby bolo možné rozpoznať, kto vykonal anotáciu. 

+Počas anotácie môžete pre zjednodušenie práce využívať klávesové skratky: 
+- 1,2,3,4 - prepínanie medzi entitami
+- klávesa "a" - Accept
+- klávesa "x" - Reject
+- klávesa "space" - Ignore
+- klávesa "backspace" alebo "del" - Undo
+
+Po anotovaní nezabudnite svojú prácu uložiť (ikona vľavo hore, alebo "Ctrl + s").
+
 ### Pokusná anotačná dávka

 Dávka je zameraná na zber spätnej väzby od anotátorov na zlepšenie rozhrania a anotačného procesu.

 {% include "forms/form.html.twig" with { form: forms('ner1') } %}
-
-
--- a/pages/topics/question/navod/README.md
+++ b/pages/topics/question/navod/README.md
@ -36,11 +36,12 @@ Učenie prebieha tak, že vytvoríte príklad s otázkou a odpoveďou. Účasť

 ## Návod pre anotátorov

-Najprv sa Vám zobrazí krátky článok.  Vašou úlohou bude prečítať si časť článku, vymyslieť k nemu otázku a v texte vyznačiť odpoveď. Odpoveď na otázku sa musí nachádzať v texte článku. Na vyznačenie jednej otázky máte asi 50 sekúnd.
+Najprv sa Vám zobrazí krátky článok.  Vašou úlohou bude prečítať si časť článku, vymyslieť k nemu otázku a v texte vyznačiť odpoveď.  Otázka musí byť jednoznačná a odpoveď na otázku sa musí nachádzať v texte článku. Na vyznačenie jednej otázky máte asi 50 sekúnd.

-1. Prečítajte si článok. Ak článok nie je vyhovujúci ťuknite na červený krížik "reject" (Tab a potom 'x').
+1. Prečítajte si článok. Ak článok nie je vyhovujúci ťuknite na červený krížik "Reject" (Tab a potom 'x').
 2. Napíšte otázku. Ak neviete napísať otázku, ťuknite na "Ignore" (Tab a potom 'i'). 
-3. Vyznačte myšou odpoveď a ťuknite na zelenú fajku "Accept" (klávesa a) a pokračujte ďalšou otázkou k tomu istému článku alebo k novému článku. Ten istý text sa zobrazí maximálne 5 krát.
+3. Vyznačte myšou odpoveď a ťuknite na zelenú fajku "Accept" (klávesa a) a pokračujte ďalšou otázkou k tomu istému článku alebo k novému článku. 
+4. Ten istý článok sa Vám zobrazí 5 krát, vymyslite k nemu 5 rôznych otázok. 

 Ak je zobrazený text nevhodný, tak ho zamietnite. Nevhodný text:

@ -61,6 +62,12 @@ Ak je zobrazený text nevhodný, tak ho zamietnite. Nevhodný text:
 4. <span style="color:pink">Na čo slúži lyzozóm?</span>
 5. <span style="color:orange">Čo je to autofágia?<span>

+Príklad na nesprávu otázku:
+1. Čo je to Golgiho aparát? - odpoveď sa v článku nenachádza.
+2. Čo sa deje v mŕtvych bunkách? - otázka nie je jednoznačná, presná odpoveď sa v článku nenachádza.
+3. Čo je normálny fyziologický proces? - odpoveď sa v článku nenachádza.
+
+
 Do formulára napíšte Váš e-mail aby bolo možné rozpoznať, kto vykonal anotáciu. 

 ## Anotačné dávky
				`@ -1,3 +0,0 @@`

				`prodigy ner.correct wikiart sk_sk1 ./textfile.csv --label OSOBA,MIESTO,ORGANIZACIA,PRODUKT`
				`@ -0,0 +1,2 @@`
				`prodigy ner.manual wikiart sk_sk1 ./textfile.csv --label PER,LOC,ORG,MISC`
				`@ -0,0 +1 @@`
				`prodigy data-to-spacy ./train.json ./eval.json --lang sk --ner wikiart --eval-split 0.3`
				`@ -1 +0,0 @@`
				`prodigy db-out wikiart > ./annotations.jsonl`