In the section Policy Gradients diagnostics, it is mentioned :
If KL is .01 then very small.
If 10 then too much.
I couldn't find the given values in the slides of the original talk. Where these values are from ?
Also, in the previous part (about entropy), you mentioned how to fix the problem with fast-dropping entropy, and it is really helpful.
Does anyone know how to fix the problem of KL divergence being too low ?
In the section Policy Gradients diagnostics, it is mentioned :
I couldn't find the given values in the slides of the original talk. Where these values are from ?
Also, in the previous part (about entropy), you mentioned how to fix the problem with fast-dropping entropy, and it is really helpful.
Does anyone know how to fix the problem of KL divergence being too low ?