From a2f8b2b1100ec3084c7732d65664cfbccbd865de Mon Sep 17 00:00:00 2001 From: dano Date: Wed, 20 Oct 2021 06:45:31 +0000 Subject: [PATCH] Update 'pages/topics/bert/README.md' --- pages/topics/bert/README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/pages/topics/bert/README.md b/pages/topics/bert/README.md index ac264ba3..d12a2472 100644 --- a/pages/topics/bert/README.md +++ b/pages/topics/bert/README.md @@ -22,9 +22,13 @@ author: Daniel Hládek [https://medium.com/nvidia-ai/how-to-scale-the-bert-training-with-nvidia-gpus-c1575e8eaf71](zz): + When the mini-batch size n is multiplied by k, we should multiply the starting learning rate η by the square root of k as some theories may suggest. However, with experiments from multiple researchers, linear scaling shows better results, i.e. multiply the starting learning rate by k instead. + | BERT Large | 330M | | BERT Base | 110M | +Väčšia veľkosť vstupného vektora => menšia veľkosť dávky => menší parameter učenia => pomalšie učenie + ## Hotové úlohy