LSUV proposes the following:
- Initialize the weights however you would like.
- Grab a single minibatch. Find all of the modules that are of type conv layer.
- Pass the minibatch through each module and calculate the mean and standard devation of the activations
- Instead of coming up with a perfect init formula, create a loop. The loop calls the module passing in the minibatch. Modify the weights iteratively until the mean is close enough to zero and standard deviation close to 1.