Question 1/13 v3 lecture 9

Calling backward() on a result of a calculation backpropagates the gradients. What will happen if we call backward() multiple times in succession?


The gradients will be accumulated (gradients will sum across calls)


Relevant part of lecture

supplementary material

Having control over when gradients are zeroed is necessary for implementing certain kinds of neural networks (RNNs, GANs, many others). You can also leverage this functionality to train with a larger batch size than your hardware would allow.

The flipside is that if you forget to zero the gradients, they will continue to grow indefinitely (initially your model might train but at some point the training will diverge).