Skip to content

Replication of results for standard PPO #1

@MatthewCWeston

Description

@MatthewCWeston

I've been looking at using evidential PPO for a project I've been working on, and, to that ends, I've tried to replicate the comparison results published in the paper. While the results obtained using evidential PPO are in line with those in the paper, the results I get when using standard, non-evidential PPO look quite a bit higher than the ones that were published. In particular, looking at the HalfCheetah environment with the front-one paralysis strategy:

Image

The results I obtained were, for EPPO:

AULC: 3833.0048828125
Final Return: 4082.907470703125

And, for standard PPO:

AULC: 3481.70068359375
Final Return: 3648.4755859375

Loss curves:

Image

EPPO does consistently do better than PPO, but the margin is substantially smaller than what I see in the paper's results. Hyperparameters used are identical to the defaults provided in this repository. The code for my vanilla PPO implementation is here, and I'm reasonably confident that no trace of the evidential critic remains:

evidential_ppo_paper.ipynb

Is there a difference between my implementation and yours?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions