Batch Normalization

Batch Normalization Layer

It is usually used before the activation node. The most important improvement is that the speed of convergence is faster, and the model is more stable, avoiding the exploading and vanishing of grad.

Algorithmn of BN is calculating the mean and variance of each batch and transfering the value of all batch to mean 0, variance 1.


Reference

https://www.zhihu.com/search?q=batch%20normalization&utm_content=search_suggestion&type=content