Author: Zizhun Guo
作者:
写于:
$Loss = Error(y, y’)$ (1)
$Loss > previousLoss - tol$ (2)
$y’ - y > y’’ - y - tol$ (3)
(1) and (2) are the definition of Loss function and how it is implemented for the stopping criteria. From (1) and (2), we get (3) that once the difference between two loss gets less than ‘tol’, in other words, the changing rate of loss becomes acceptable, the training process would stop as the model is considered convergent.
$L_1 = L + \alpha |W|$ (4) $L_2 = L + \alpha W^2$ (5) $W’ = W - \alpha \Delta (6)$
(4) and (5) are the formula that describe the Loss function aided by the regularization. (6) shows that each iteration of the gradient decent, the weights decreases more. Therefore as the model converges, the weights are smaller than what the regularization is not included.
Please check out textual models README.md