Adds amber scenario. by CdavM · Pull Request #19 · Froot-NetSys/NetArena

CdavM · 2026-04-01T11:00:40Z

No description provided.

Kolleida · 2026-04-02T00:48:39Z

amber/amber-manifest-green.json5

+    entrypoint: [
+      "uv",
+      "run",
+      "malt_agent.py",


Entrypoint should be running the ./start_route_agent.sh in place of uv run malt_agent.py and using the route_agent image (I'm assuming this is for route app?). Also, this container needs to be run with --privileged and mount /lib/modules.

Thanks @Kolleida! I believe it's not possible to run an image with --privileged in amber. This is why we went with the MALT agent instead. Is there another way we can run your benchmark via amber? Please let me know what image/entrypoint/config parameters to use. Thank you!

@CdavM If you are running MALT, then the role should be "malt_operator", not "route_operator" (this was mainly used by the leaderboard query to filter MALT specific results). Also, the config should look something like this:

assessment_config: { prompt_type: "zeroshot_base", num_queries: 3, complexity_level = ["level1", "level2", "level3"], output_dir: "dump", output_file = "query_output.jsonl" benchmark_path: "assessment_queries.jsonl", regenerate_benchmark: true }

This config generates 30 queries in total spread across the 3 levels. Increasing num_queries adds 10 queries (you can choose how much you think is appropriate for good signal).

Later I saw the agentbeats version of NetArena on the website, and the description references the K8s benchmark, but you guys are doing MALT instead. Is this also because the setup needed (e.g. boostrap a KIND cluster) is impossible/hard to express in amber?

Kolleida

What agent is this supposed to run? There seems to be a mismatch between the assessment config/operator name at the bottom of the manifest and the green agent container being run. If there is anything I can do to help/clarify things please lmk!

Adds amber scenario.

80bf83f

CdavM force-pushed the cdavm/amber branch from c54abe7 to 80bf83f Compare April 1, 2026 11:14

Kolleida reviewed Apr 2, 2026

View reviewed changes

Kolleida requested changes Apr 2, 2026

View reviewed changes

Update config and agent name.

5e54ccb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds amber scenario.#19

Adds amber scenario.#19
CdavM wants to merge 2 commits intoFroot-NetSys:a2a-agentxfrom
RDI-Foundation:cdavm/amber

CdavM commented Apr 1, 2026

Uh oh!

Kolleida Apr 2, 2026 •

edited

Loading

Uh oh!

CdavM Apr 2, 2026

Uh oh!

Kolleida Apr 2, 2026

Uh oh!

Kolleida left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CdavM commented Apr 1, 2026

Uh oh!

Kolleida Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CdavM Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Kolleida Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Kolleida left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kolleida Apr 2, 2026 •

edited

Loading

Kolleida left a comment •

edited

Loading