Update 'pages/topics/bert/README.md'

2021-10-20 06:45:31 +00:00 · 2021-10-20 06:45:31 +00:00 · a2f8b2b110
commit a2f8b2b110
parent 87828f160b
1 changed files with 4 additions and 0 deletions
--- a/pages/topics/bert/README.md
+++ b/pages/topics/bert/README.md
@ -22,9 +22,13 @@ author: Daniel Hládek
 [https://medium.com/nvidia-ai/how-to-scale-the-bert-training-with-nvidia-gpus-c1575e8eaf71](zz):
    When the mini-batch size n is multiplied by k, we should multiply the starting learning rate η by the square root of k as some theories may suggest. However, with experiments from multiple researchers, linear scaling shows better results, i.e. multiply the starting learning rate by k instead.
 | BERT Large | 330M |
 | BERT Base | 110M |
 Väčšia veľkosť vstupného vektora => menšia veľkosť dávky => menší parameter učenia => pomalšie učenie
 ## Hotové úlohy