Let the gradient pass down by the above cell be:
E_delta = dE/dht
If we are using MSE (mean square error)for error then,
E_delta=(y-h(x))
Here y is the original value and h(x) is the predicted value.
Gradient with respect to output gate
dE/do = (dE/dht ) * (dht /do) = E_delta * ( dht / do)
dE/do = E_delta * tanh(ct)
Gradient with respect to ct
dE/dct = (dE / dht )*(dht /dct)= E_delta *(dht /dct)
dE/dct = E_delta * o * (1-tanh2 (ct))
Gradient with respect to input gate dE/di, dE/dg
dE/di = (dE/di ) * (dct / di)
dE/di = E_delta * o * (1-tanh2 (ct)) * g
Similarly,
dE/dg = E_delta * o * (1-tanh2 (ct)) * i
Gradient with respect to forget gate
dE/df = E_delta * (dE/dct ) * (dct / dt) t
dE/df = E_delta * o * (1-tanh2 (ct)) * ct-1
Gradient with respect to ct-1
dE/dct = E_delta * (dE/dct ) * (dct / dct-1)
dE/dct = E_delta * o * (1-tanh2 (ct)) * f
Gradient with respect to output gate weights:
dE/dwxo = dE/do *(do/dwxo) = E_delta * tanh(ct) * sigmoid(zo) * (1-sigmoid(zo) * xt
dE/dwho = dE/do *(do/dwho) = E_delta * tanh(ct) * sigmoid(zo) * (1-sigmoid(zo) * ht-1
dE/dbo = dE/do *(do/dbo) = E_delta * tanh(ct) * sigmoid(zo) * (1-sigmoid(zo)
Gradient with respect to forget gate weights:
dE/dwxf = dE/df *(df/dwxf) = E_delta * o * (1-tanh2 (ct)) * ct-1 * sigmoid(zf) * (1-sigmoid(zf) * xt
dE/dwhf = dE/df *(df/dwhf) = E_delta * o * (1-tanh2 (ct)) * ct-1 * sigmoid(zf) * (1-sigmoid(zf) * ht-1
dE/dbo = dE/df *(df/dbo) = E_delta * o * (1-tanh2 (ct)) * ct-1 * sigmoid(zf) * (1-sigmoid(zf)
Gradient with respect to input gate weights:
dE/dwxi = dE/di *(di/dwxi) = E_delta * o * (1-tanh2 (ct)) * g * sigmoid(zi) * (1-sigmoid(zi) * xt
dE/dwhi = dE/di *(di/dwhi) = E_delta * o * (1-tanh2 (ct)) * g * sigmoid(zi) * (1-sigmoid(zi) * ht-1
dE/dbi = dE/di *(di/dbi) = E_delta * o * (1-tanh2 (ct)) * g * sigmoid(zi) * (1-sigmoid(zi)
dE/dwxg = dE/dg *(dg/dwxg) = E_delta * o * (1-tanh2 (ct)) * i * (1?tanh2(zg))*xt
dE/dwhg = dE/dg *(dg/dwhg) = E_delta * o * (1-tanh2 (ct)) * i * (1?tanh2(zg))*ht-1
dE/dbg = dE/dg *(dg/dbg) = E_delta * o * (1-tanh2 (ct)) * i * (1?tanh2(zg))