What is temporal activation regularization (TAR)?

It is a type of regularization used for training RNNs. It attempts to minimize the difference in activations between adjacent time steps. The idea is that each new word should not change the meaning too dramatically and this tries to model that (that is unless the decrease in loss would make it worth it!)