Question 1/13 v3 lecture 11

What is LSUV (layerwise sequential unit variance)?


The usual approach to initialization is this - figure out a mathematical formula for initializing weights (specific to your architecture) and use it to set them.

LSUV proposes the following:
  1. Initialize the weights however you would like.
  2. Grab a single minibatch. Find all of the modules that are of type conv layer.
  3. Pass the minibatch through each module and calculate the mean and standard devation of the activations
  4. Instead of coming up with a perfect init formula, create a loop. The loop calls the module passing in the minibatch. Modify the weights iteratively until the mean is close enough to zero and standard deviation close to 1.

Relevant part of lecture

supplementary material

All you need is a good init - a paper by Dmytro Mishkin and Jiri Matas introducing LSUV.