Question 27/30 v3 lecture 10

What is a major limitation of BatchNorm?


It cannot be used for online training (batch size of 1). Anytime we have a small batch size we either will be unable to train or the training will be unstable. It will also be problematic for an RNN - how do you normalize a batch where each sequence can contain a variable number of words and where weights are tied?

Relevant part of lecture

supplementary material

Summary of BatchNorm, LayerNorm, InstanceNorm and GroupNorm