backward()
on a result of a calculation backpropagates the gradients. What will happen if we call backward()
multiple times in succession?
Answer
Screenshot
Relevant part of lecture
supplementary material
The flipside is that if you forget to zero the gradients, they will continue to grow indefinitely (initially your model might train but at some point the training will diverge).