mlpack blog

Proximal Policy Optimization - Week 10

Unknown, 4 August 2019

This week, I rewrite the normal distribution by my own, so I can predict the normal distribution's parameter mean and variance to construct distribution. Then the agent can sample action by using the distribution. After carefully read the diagonal Gaussian distribution may can have the same functionality, maybe I can change to use this.

This week, I was stuck in how to backward gradient through the network. With the help of members in mlpack community, I have a more clear mind on that now. Maybe next week I can solve this problem.

Thanks for reading :).