Question 4/13 v3 lecture 9

A common operation in implementing neural networks is raising e to the power of x. This can produce large floating point numbers.

What can happen when calculating gradients, or running calculations in general, with large floating point numbers?


This can introduce inaccuracies as the further away you get from 0, the less fine grained the numbers are


Relevant part of lecture