Skip to content

improve: stricter input validation in gr00t_inference#90

Closed
cagataycali wants to merge 31 commits into
strands-labs:mainfrom
cagataycali:improve/groot-input-validation
Closed

improve: stricter input validation in gr00t_inference#90
cagataycali wants to merge 31 commits into
strands-labs:mainfrom
cagataycali:improve/groot-input-validation

Conversation

@cagataycali
Copy link
Copy Markdown
Member

@cagataycali cagataycali commented Apr 1, 2026

Summary

Stricter input validation for the gr00t_inference tool — centralised in a single validate_inputs() function with per-action scoping.

Changes

  1. Centralised validation: All input validation consolidated into validate_inputs() — single entry-point, independently testable.
  2. Action-scoped: Port-only actions (find_containers, list, status, stop) validate only port/host/protocol; mutating actions validate the full parameter surface.
  3. Host validation: Accepts valid IPs and RFC-952 hostnames (localhost, host.docker.internal). Rejects typos like 127.0.01 via _ALL_NUMERIC_RE (multi-label only; single-label numerics are valid per RFC-1123).
  4. Default host changed to 127.0.0.1 (AGENTS.md compliance). Container actions (start/restart/lifecycle) auto-flip to 0.0.0.0 internally since Docker -p port-publish requires bind-all inside the container.
  5. N1.7 process identification: _PGREP_INFERENCE_PORT_FMT and _is_gr00t_*_process now match both inference_service.py (N1.5/N1.6) and gr00t.eval.run_gr00t_server (N1.7). Closes the N1.7 stop/status identification gap.
  6. Process identification tightened: Requires inference_service.py OR gr00t.eval.run_gr00t_server + python/gr00t + --port flag in cmdline. Prevents false-matching editors/log-tailers.
  7. Option-injection guard: repo_url, repo_tag, policy_name starting with - are rejected (prevents git flag injection via subprocess argv).
  8. Platform gate: Host-system pgrep fallback returns clear error on non-Linux instead of silent no-op.
  9. Exception clauses narrowed to (OSError, subprocess.SubprocessError, UnicodeDecodeError). PermissionError logs at WARNING; other errors log at DEBUG.
  10. Path traversal hardened: Splits on both / and \ separators.
  11. Shell metacharacter denylist narrowed: Removed quotes/bangs/parens/brackets (safe in argv-style calls).

Validation Scope

Validated: port, host, protocol, data_config, embodiment_tag, container_name, TRT dtypes, checkpoint_path, trt_engine_path, image_name, volumes, container_command, repo_url/repo_tag/policy_name (dash-prefix only).

Not validated (argv-safe, no injection risk): hf_repo, hf_subfolder, hf_local_dir, source_dir.

Tests

80+ validation tests across 17 test classes covering all branches, integration wiring, regression scenarios, N1.7 identification, and the new security guards.

Copy link
Copy Markdown
Member

@awsarron awsarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to add tests for these cases. gr00t_inference doesn't have any test coverage at the moment

Comment thread strands_robots/tools/gr00t_inference.py Outdated
@cagataycali cagataycali requested a review from awsarron April 6, 2026 06:30
@cagataycali cagataycali added this to the v0.4 milestone Apr 6, 2026
@cagataycali cagataycali removed the request for review from max-rattray-aws April 10, 2026 16:29
@cagataycali
Copy link
Copy Markdown
Member Author

Status Check

1 unresolved thread — but the fix was already implemented in commit 4d9634a.

@awsarron's nit ("extract validation into single function") was addressed: all validation logic is now in a dedicated validate_inputs() function with 15 tests covering every branch.

CI: ✅ All checks passing
Mergeable: ✅ Clean

@awsarron could you resolve the thread and re-review? This is a small, self-contained improvement.


🤖 Pipeline analysis by AI agent. Strands Agents. Feedback welcome!

@cagataycali cagataycali modified the milestones: v0.4.0, v0.3.9 Apr 21, 2026
@cagataycali cagataycali force-pushed the improve/groot-input-validation branch 2 times, most recently from df78bde to 91f1dab Compare May 14, 2026 01:45
@cagataycali
Copy link
Copy Markdown
Member Author

🔄 Re-review Request — All Feedback Addressed

Hi @awsarron @sundargthb — friendly ping on this one.

✅ Current State

Criteria Status
Review thread (encapsulate validation) ✅ Resolved — validate_inputs() extracted
Tests ✅ 15 tests covering every validation branch
CI checks ✅ All passing
Merge conflicts ✅ Clean — MERGEABLE

⏰ Timing Consideration

This PR has been waiting since April 1 (~6 weeks). There are now 4 new PRs from @yinsong1986 (#149-152) in-flight as part of the N1.7 alignment work (issue #148). Two of them (#150, #152) touch the same gr00t_inference.py file.

Since PR #90 is a small, additive validation layer (+138/-8 net), merging it first would be cleanest — it avoids rebase churn in either direction and gives yinsong's larger refactors a validated foundation to build on. If yinsong's PRs land first, I'm happy to rebase, but the ordering-first approach saves everyone work.

🎯 Ask

Could you either:

  1. Re-approve if the resolved feedback looks good, or
  2. Flag any remaining concerns so I can address them immediately?

Thanks! 🙏


🤖 Agent-assisted analysis. Strands Agents.

Comment thread strands_robots/tools/gr00t_inference.py Outdated
@cagataycali cagataycali force-pushed the improve/groot-input-validation branch from 53cc20c to 2140c24 Compare May 15, 2026 19:43
@cagataycali
Copy link
Copy Markdown
Member Author

⚠️ CI Failure — Infrastructure Issue (Not Code)

The current CI failure is a disk space exhaustion on the GitHub Actions runner, not a code problem:

Failed to download `torch==2.10.0`
failed to write to file `/home/runner/.cache/uv/.tmpuVYwKv/torch/lib/libtorch_cuda_linalg.so`:
No space left on device (os error 28)

The hatch run lint step installs [all] extras which pulls lerobot → torch with CUDA (~2.5GB), and the runner ran out of disk.

Local Verification (all clean ✅)

$ ruff check .           # All checks passed!
$ mypy strands_robots/   # Success: no issues found in 58 source files
$ pytest tests/ -x       # 1076 passed, 59 skipped in 4.33s

Branch is rebased on latest main (post #145, #149, #150, #151 merges). No conflicts.

Suggested Fix for CI

The workflow could benefit from a disk space cleanup step before install:

- name: Free disk space
  run: sudo rm -rf /usr/share/dotnet /usr/local/lib/android /opt/ghc

This would reclaim ~20GB and prevent recurrence for all PRs. Happy to file a separate PR for this if helpful.


🤖 AI agent analysis. Strands Agents. Feedback welcome!

@cagataycali cagataycali force-pushed the improve/groot-input-validation branch 3 times, most recently from 6d09a45 to bfc257d Compare May 16, 2026 07:40
@cagataycali cagataycali requested review from awsarron and removed request for awsarron May 16, 2026 21:20
@cagataycali cagataycali force-pushed the improve/groot-input-validation branch from 2263518 to bcb752d Compare May 19, 2026 01:44
@cagataycali
Copy link
Copy Markdown
Member Author

@awsarron @sundargthb — friendly check-in. Both review threads are resolved, CI green, branch is clean on latest main. This is a small additive validation layer (+138/-8). Happy to address any remaining concerns if you have them. 🙏

@cagataycali cagataycali force-pushed the improve/groot-input-validation branch from bcb752d to fd45f87 Compare May 20, 2026 13:59
@cagataycali cagataycali force-pushed the improve/groot-input-validation branch from fd45f87 to 6624ba6 Compare May 21, 2026 09:16
cagataycali pushed a commit to cagataycali/robots that referenced this pull request May 21, 2026
All pgrep -f patterns now use '( |$)' after the port number to prevent
substring matches (e.g. port 80 matching a process on port 8000).

Applies to both container-side (docker exec pgrep) and host-system
fallback paths. The _is_gr00t_process/_is_gr00t_host_process
verification was already correct, but the pgrep filter itself could
return false positives that would be filtered downstream — this tightens
the initial candidate set as defense-in-depth.

Addresses review feedback from @sundargthb on PR strands-labs#90.
…rence

Improvements to the gr00t_inference tool:

1. Input validation for all user-supplied parameters:
   - data_config and embodiment_tag validated against strict alphanumeric
     patterns (they are enumerable values from the docstring).
   - checkpoint_path and trt_engine_path reject shell metacharacters,
     null bytes, and '..' traversal components.
   - container_name validated against Docker naming rules.
   - dtype values checked against explicit allowlists.
   - Port range validated (1-65535).

2. Default host changed from 0.0.0.0 to 127.0.0.1 (loopback):
   - Inference services should default to localhost-only binding.
   - Users can still explicitly pass host='0.0.0.0' when network
     access is needed.

3. Process verification for stop action:
   - Added _is_gr00t_process() to verify a PID belongs to a GR00T
     inference process before sending signals.
   - Host-system fallback now uses pgrep -f with the inference_service
     pattern instead of lsof (which matches any process on the port).
…, action-scoped over-validation, centralise dash-prefix guard

Changes:
- Add RFC 1035 §2.3.4 hostname total-length cap (253 chars) in validate_inputs()
- Scope validation per action: build_image/download_checkpoint/start_container
  no longer validate data_config/embodiment_tag/dtypes they don't consume
- Move option-injection dash-prefix guard into validate_inputs() (centralised)
- Log at INFO when host auto-flips 127.0.0.1→0.0.0.0 for container actions
  (addresses silent-kwarg-override concern — user sees the rewrite)
- Document _ALL_NUMERIC_RE broader reject set (IPv4 short-forms)
- Document regex boundary divergence (ERE vs Python re) as intentional
- Update validate_inputs() docstring with caller contract (must wrap ValueError)
- Fix test_valid_all_interfaces: use '0.0.0.0' not '127.0.0.1' (copy-paste bug)
- Fix test_read_only_action_accepts_any_data_config: use value that WOULD fail
  for action='start' (Has-Hyphens-And-Caps) to actually pin the skip behaviour

All 81 validation tests pass, ruff + mypy clean.
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Centralises gr00t_inference parameter validation in a new validate_inputs() helper, scoped per action (port-only, image-only, full). Tightens process identification to require --port plus a Python/gr00t entrypoint match (closing the N1.7 stop/status gap), narrows several exception clauses, adds an option-injection guard for repo_url/repo_tag/policy_name, hardens path-traversal handling, gates the host-system pgrep fallback on Linux, and changes the default host to 127.0.0.1 (with _start_service auto-flipping to 0.0.0.0 for the -p-publish docker case). Ships ~80 unit tests across 17 classes and a CHANGELOG entry. The validation surface and the AGENTS.md validate_inputs(...) pattern itself are well executed.

What's good

  • Single-entry-point validation matches the AGENTS.md PR #92 review-learnings ("Centralise validation in one function").
  • Strong test coverage: regression tests for the cross-port-kill bug, N1.7 cmdline detection, host-typo rejection, single-label numeric hostnames, option-injection, registry-port image refs, and integration tests that drive gr00t_inference() end-to-end and assert validators wire through to {"status": "error"}.
  • Exception narrowing in _is_gr00t_process / _is_gr00t_host_process plus structured logging at WARNING/DEBUG levels.
  • CHANGELOG entry documents both the broadened (RFC-952 hostnames) and stricter (multi-label all-numeric) host validation, plus the host default change.

Concerns

  1. The headline "loopback default" is silently undone on the only path that matters. Defaulting host="127.0.0.1" is good, but _start_service unconditionally flips it back to "0.0.0.0" whenever the value equals "127.0.0.1" — there is no way for a user who explicitly passes host="127.0.0.1" (e.g. running with --network=host or a side-car proxy) to keep loopback. The inline comment claims "only auto-flip if the user did NOT explicitly pass host=", but the code can’t tell the difference. This violates AGENTS.md PR #86 “Reject silently-dropped kwargs” and partly negates the safety improvement the PR is selling. Inline below.
  2. container_name is not validated for start_container. The image-only branch in validate_inputs skips the _CONTAINER_NAME_RE check, but _start_container interpolates it into docker run --name <name>. The full-validation branch does check it. The asymmetry is the kind of “silent drop” AGENTS.md flags. Inline below.
  3. lifecycle="teardown" is forced through the full-validation branch. Teardown only consumes container_name + remove_volumes, but it lands in the start/restart/lifecycle branch and gets data_config / embodiment_tag / vit_dtype / llm_dtype / dit_dtype / trt_engine_path validated. Defaults pass, but a stale data_config="FOO" from a prior call gives a confusing error for an action that doesn’t need it. There are also no tests for lifecycle="teardown" (or for any of the four image-only branches end-to-end).
  4. Several except Exception: clauses left untouched in _list_running_services (line 727), _is_service_running (line 739), and the outer _stop_service (line 950). The PR’s changelog claims “Exception clauses in _is_gr00t_process/_is_gr00t_host_process narrowed from except Exception to specific exception types,” which is true — but the surrounding functions in the same file still have the pattern AGENTS.md PR #86 explicitly forbids. Worth a follow-up if not in scope here.
  5. _DOCKER_IMAGE_RE allows registry ports up to 99999 ((?::[0-9]{1,5})?). 5-digit upper bound is wrong for TCP ports; not a security issue (argv-style docker run will reject), just an inconsistency with the port 1–65535 check elsewhere in validate_inputs().

Verification suggestions

  • Manually call gr00t_inference(action="start", host="127.0.0.1", …) against a real container and confirm whether the user’s explicit loopback choice survives or is silently flipped. Then run ss -tlnp | grep <port> on the host — if it shows 0.0.0.0:<port> after host="127.0.0.1", the silent override is confirmed.
  • pytest tests/policies/groot/test_gr00t_inference_validation.py -v and check that the lifecycle="teardown" path has zero tests — confirming the gap.
  • rg "except Exception" strands_robots/tools/gr00t_inference.py to see how many catch-all clauses remain.

Comment thread strands_robots/tools/gr00t_inference.py Outdated
# service to bind all interfaces inside the container. We only auto-flip if
# the user did NOT explicitly pass host= (i.e. they accepted the default).
# Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host).
if host == "127.0.0.1":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent override of explicit user input. The comment three lines up says "We only auto-flip if the user did NOT explicitly pass host=", but the code can’t actually tell the difference — it just checks if host == "127.0.0.1". A user who deliberately passes host="127.0.0.1" (e.g. running the container with --network=host and wanting loopback-only binding, or fronting it with a reverse proxy) will silently get bound to 0.0.0.0 instead, which is the opposite of what they asked for. This both contradicts the docstring/comment and violates AGENTS.md PR #86 “Reject silently-dropped kwargs”.

It also partly defeats the headline change in this PR: the host default became 127.0.0.1 for safety, but on the actual start/restart/lifecycle paths (the only ones that bind a port) it is unconditionally reverted to 0.0.0.0.

Suggested fix: make this opt-in. Either

  1. Add a sentinel default (host: str | None = None) and only auto-flip when host is None, or
  2. Drop the auto-flip and document that container actions require host="0.0.0.0" — raise ValueError with a clear message when host="127.0.0.1" is passed without --network=host.

Either way, the _start_service test test_start_service_auto_flips_loopback (asserting the silent flip) should be updated/removed accordingly.

for param_name, param_value in [("repo_url", repo_url), ("repo_tag", repo_tag)]:
if param_value is not None and param_value.startswith("-"):
raise ValueError(f"{param_name} must not start with '-' (got {param_value!r})")
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container_name is not validated on this branch. The image-only branch (covering build_image, download_checkpoint, start_container) returns here without running _CONTAINER_NAME_RE.match(container_name). But _start_container then interpolates container_name into docker run --name <name> (and _remove_container does docker rm -f <name>). The full-validation branch below at line ~229 does validate it for start/restart/lifecycle.

The asymmetry means gr00t_inference(action="start_container", container_name="--privileged") reaches docker argv unvalidated. argv-style is defence-in-depth here so this is not a direct shell injection, but it is exactly the AGENTS.md PR #92 “allowlist enumerable values” pattern — if the value is enumerable, validate it on every entry path, not just on the inference-mutating ones. Same rationale as the repo_url/repo_tag guard you already added on lines 212–214.

Fix: hoist the container_name check (lines 229–230) above the early return blocks so all mutating actions share it.


# Container command — reject shell metacharacters
if container_command is not None and _SHELL_META.search(container_command):
raise ValueError(f"container_command contains disallowed characters: {container_command!r}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lifecycle action lands in the full-validation branch even for phase="teardown". The dispatch path for action="lifecycle" reaches line 280 below and validates data_config, embodiment_tag, vit_dtype, llm_dtype, dit_dtype, trt_engine_path, etc. — all of which are unused by _lifecycle when phase="teardown" (which only needs container_name + remove_volumes).

Default values pass, so the happy path works, but a user retrying a teardown after a failed start (with data_config="FOO" from the previous prompt context) gets a confusing data_config must be lowercase alphanumeric/underscore error for an operation that doesn’t care. Consider gating on lifecycle == "teardown" and treating it the same as the port-only actions (validate container_name if provided, then return). There is also no test for lifecycle="teardown" in this PR — worth pinning, especially since the validate_inputs matrix is part of the contract.

"""Reject paths containing shell metacharacters, null bytes, or traversal sequences."""
if "\x00" in value:
raise ValueError(f"{label} must not contain null bytes")
if any(part == ".." for part in re.split(r"[/\\]", value)):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registry port allows up to 5 digits (99999). (?::[0-9]{1,5})? accepts values that exceed the TCP max of 65535. Not a security concern (argv-style docker run will fail downstream), but it’s inconsistent with the 1 <= port <= 65535 check at line 234 and undermines the “centralised validation” framing of this PR. A tightened bound (or even reusing the same int range check after a parse) keeps the contract consistent.

Minor; safe to defer.

Comment thread strands_robots/tools/gr00t_inference.py Outdated
# This is intentional: pgrep is constrained to ERE, and cmdlines are space-separated
# in practice (procps-ng converts NUL → space when reading /proc/*/cmdline).
# Matches both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server)
_PGREP_INFERENCE_PORT_FMT = "(inference_service\\.py|gr00t\\.eval\\.run_gr00t_server).*--port[= ]{port}( |$)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PGREP_INFERENCE_PORT_FMT greedy-matches across cmdline. The pattern (inference_service\.py|gr00t\.eval\.run_gr00t_server).*--port[= ]{port}( |$) uses a greedy .*, which means a process whose cmdline contains both inference_service.py and a later --port 5555 argument from a child / wrapper script will match — including, for example, bash -c 'echo inference_service.py; python other.py --port 5555'. The Python-side _is_gr00t_*_process checks compensate by also requiring python/gr00t and verifying via _PYTHON_PORT_RE_FMT, so the actual kill path is safe — but the pgrep filter alone is broader than the comment implies.

Worth either (a) tightening the regex (e.g. anchoring with [^ ]* instead of .*) or (b) updating the comment to call out that pgrep is a coarse pre-filter and the authoritative match is _is_gr00t_*_process. The current tests don’t cover the greedy-match false-positive case.

Trivial cleanup — the validation section comment was duplicated on
consecutive lines. All 81 validation tests + 1538 broader tests pass.
ruff + mypy clean.
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Lands a centralised validate_inputs() for gr00t_inference, tightens pgrep/process-identification to handle both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server) entrypoints, switches the default host from 0.0.0.0 to 127.0.0.1 (per AGENTS.md § LLM Input Safety), and adds 80+ validation tests. The single-entry-point validator is exactly the pattern AGENTS.md § Review Learnings (#92) prescribes.

What's good

  • Single validate_inputs() entry point, action-scoped, independently testable — matches the canonical pattern from PR #92.
  • Option-injection guard (leading - rejection) for repo_url / repo_tag / policy_name.
  • Default host="127.0.0.1" (loopback-only) — satisfies AGENTS.md § LLM Input Safety “Bind to 127.0.0.1 by default”.
  • Process-probe helpers (_is_gr00t_process / _is_gr00t_host_process) require --port in cmdline + python/gr00t marker, which prevents the editor/log-tailer false-match cited in the PR description.
  • Narrow except (OSError, subprocess.SubprocessError, UnicodeDecodeError) clauses with PermissionError re-routed to WARNING. Good fit for AGENTS.md § Review Learnings (#86) “Exception Clauses Must Be Narrow”.
  • Comprehensive test surface (17 classes, 80+ tests) covering allowlists, traversal, null bytes, option-injection, and the cross-port-kill regression.

Concerns

  • Auto-flip semantics in _start_service contradict their own docstring (see inline). The comment claims it only flips when the user accepted the default, but the implementation can’t distinguish default from explicit since both arrive as the literal "127.0.0.1". As written, a user explicitly opting into loopback binding (e.g. with --network=host) silently gets 0.0.0.0 instead. The PR description repeats the same incorrect claim. Either the code needs a sentinel default (host: str | None = None) so explicit "127.0.0.1" can be honored, or the docstring/comment needs to be amended to say “this always flips on 127.0.0.1”.
  • No regression test pins the “honor explicit 127.0.0.1” behavior the docstring promises. AGENTS.md § Review Learnings (#85) “Pin regression tests for reviewed fixes” — without this, the contradiction at _start_service ships silently.
  • host="localhost" passes the hostname allowlist but is not auto-flipped, so on a container path the bind ends up on 127.0.0.1 inside the container and Docker’s -p publish never sees a connection. Either flip on a _loopback_aliases set or document explicitly that localhost is unsafe for container actions.
  • Validation scope drift for lifecycle: validate_inputs(action="lifecycle", …) falls through to the full inference-mutating branch and rejects e.g. a non-canonical data_config even when the user only invokes phase="setup" (build_image + download_checkpoint + start_container). Not blocking, but worth a phase kwarg in the validator (or carving lifecycle into its own scope).
  • Bare except Exception as e: at the bottom of _stop_service (line 949 — outside this PR’s hunks, so not flagged inline) is still in violation of AGENTS.md § Review Learnings (#86). Worth narrowing in a follow-up.

Verification suggestions

hatch run test tests/policies/groot/test_gr00t_inference_validation.py -v
hatch run lint
# Sanity-check the auto-flip:
python -c "from strands_robots.tools.gr00t_inference import _start_service; help(_start_service)"
# Inspect _PGREP_INFERENCE_PORT_FMT against a real cmdline (Linux only):
python -c "import re, strands_robots.tools.gr00t_inference as g; pat=g._PGREP_INFERENCE_PORT_FMT.format(port=5555); print(pat); print(bool(re.search(pat.replace('( |$)', '(?: |$)'), 'python /opt/Isaac-GR00T/scripts/inference_service.py --port 5555 --host 0.0.0.0')))"

Comment thread strands_robots/tools/gr00t_inference.py Outdated
# service to bind all interfaces inside the container. We only auto-flip if
# the user did NOT explicitly pass host= (i.e. they accepted the default).
# Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host).
if host == "127.0.0.1":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic does not match the comment / docstring / PR description.

The comment above says: “We only auto-flip if the user did NOT explicitly pass host= (i.e. they accepted the default). Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host).”

But the code is if host == "127.0.0.1": — there is no way for _start_service to distinguish the default "127.0.0.1" from a user explicitly passing "127.0.0.1" (e.g. for --network=host). Both arrive as the literal string. So an explicit host="127.0.0.1" is silently overridden to 0.0.0.0, which is the opposite of what the comment promises.

Also note host="localhost" (a loopback alias the validator accepts) is not auto-flipped, so container actions with host="localhost" will publish a port that no client can reach (bind ends up on the container’s loopback only).

Fix options:

  1. Use a sentinel default (host: str = "127.0.0.1" in gr00t_inference(), but propagate as host: str | None = None into _start_service, and only auto-flip when host is None). This is the only way to actually honor an explicit "127.0.0.1".
  2. Otherwise, fix the comment/docstring/PR description to say “this function always rewrites loopback (127.0.0.1 and localhost) to 0.0.0.0, because Docker -p publish requires bind-all” — and document that --network=host users must pass host="0.0.0.0" explicitly.

AGENTS.md § Review Learnings (#85) “Match docstrings to semantics” covers exactly this case.

raise ValueError(f"{label} must not contain null bytes")
if any(part == ".." for part in re.split(r"[/\\]", value)):
raise ValueError(f"{label} must not contain '..' path traversal components")
if _SHELL_META.search(value):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option-injection guard missing for paths.

_validate_path rejects shell metacharacters and .. traversal but does not reject a leading -. The full-branch validator below (lines 261–267) explicitly adds an option-injection guard for repo_url / repo_tag / policy_name because they flow into subprocess argv. Volume keys/values flow into docker -v <key>:<value> argv the same way (an attacker-controlled volumes={"--privileged=foo": "/bar"} is plausible). For consistency with the guard you added below, reject leading - here too:

if value.startswith("-"):
    raise ValueError(f"{label} must not start with '-' (got {value!r})")

AGENTS.md § Review Learnings (#92) “Reject shell metacharacters in paths” — same rationale; defense-in-depth even with argv-style subprocess.

_HOSTNAME_RE = re.compile(
r"^[a-zA-Z0-9](?:[a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?"
r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?)*$"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor RFC compliance: trailing-dot FQDNs are rejected.

The regex requires every label, including the last, to end with [a-zA-Z0-9], so absolute FQDNs ending with . (which RFC 1035 §3.1 specifies as the canonical way to mark a name as fully qualified) fail validation — e.g. host.docker.internal. is rejected even though host.docker.internal passes.

Not a blocker (Docker resolves both forms), but the PR description explicitly cites RFC-952 compliance as the goal. Either allow an optional trailing dot:

r"^[a-zA-Z0-9](?:[a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?"
r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?)*\.?$"

or add a comment that trailing-dot FQDNs are intentionally rejected.

Comment thread strands_robots/tools/gr00t_inference.py Outdated
# This is intentional: pgrep is constrained to ERE, and cmdlines are space-separated
# in practice (procps-ng converts NUL → space when reading /proc/*/cmdline).
# Matches both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server)
_PGREP_INFERENCE_PORT_FMT = "(inference_service\\.py|gr00t\\.eval\\.run_gr00t_server).*--port[= ]{port}( |$)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two regex sources for the “matches --port N” intent are easy to drift out of sync.

Keeping _PGREP_INFERENCE_PORT_FMT (ERE, ( |$)) and _PYTHON_PORT_RE_FMT (Python re, (?:\s|$)) as separate format strings is intentional per the comment, but the two are used in pairs (pgrep finds candidate PIDs; _is_gr00t_*_process then re-validates with the Python pattern) and any future tweak — e.g. supporting --port=N with a different separator, or adding --port-range — has to be made in both places. A single regex builder fed by a flavor enum would prevent that drift, e.g.:

def _port_pattern(port: int, *, ere: bool) -> str:
    boundary = "( |$)" if ere else r"(?:\s|$)"
    return rf"--port[= ]{port}{boundary}"

Not a blocker, but the AGENTS.md § Review Learnings (#92) “single entry-point” / “centralise validation” principle applies to regex assembly too.

http_server=False,
api_token=None,
)
assert captured_host.get("host") == "0.0.0.0"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing regression test for the docstring-promised behavior.

This test pins the flip on default path. The companion docstring on _start_service (gr00t_inference.py:1085-1088) promises that explicit host="127.0.0.1" is honored, but no test asserts that. As noted in the inline on gr00t_inference.py:1089, the current implementation can’t actually distinguish default from explicit — so the docstring is wrong, but a test that would fail on the wrong implementation is the only thing that pins the contract.

Add e.g.:

def test_start_service_honors_explicit_loopback(self, monkeypatch):
    """Explicit host='127.0.0.1' (e.g. for --network=host) must NOT be flipped."""
    # …same fixture…
    _start_service(…, host="127.0.0.1", explicit_loopback=True)  # or via sentinel
    assert captured_host["host"] == "127.0.0.1"

This would have caught the contradiction at review time. AGENTS.md § Review Learnings (#85) “Pin regression tests for reviewed fixes — every behavioural fix gets a test that fails on pre-fix code”.

Address key review concern: the code could not tell if host='127.0.0.1'
was the default or an explicit user choice. Now uses None sentinel:

- host: str | None = None (signature default)
- _host_was_explicit = host is not None (before resolving to 127.0.0.1)
- _start_service only auto-flips to 0.0.0.0 when host_was_explicit=False

Users who explicitly pass host='127.0.0.1' (e.g. for --network=host
Docker deployments) now have their choice honoured.

Tests: TestHostAutoFlipSentinel (3 tests) pins the sentinel behavior.
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR centralises input validation for gr00t_inference into a single validate_inputs() entry-point with action-aware scoping, tightens process identification to match both N1.5/N1.6 and N1.7 entrypoints, narrows several except clauses, and changes the default host to 127.0.0.1 (with an internal auto-flip for container actions). The validator design follows the AGENTS.md "Centralise validation in one function" guidance, and the test surface is substantial (1404 lines, 17+ classes covering the documented branches).

What's good

  • validate_inputs() matches the AGENTS.md > LLM Input Safety pattern (single entry-point, kwargs-only, raises ValueError, called from a try/except in the tool wrapper).
  • Action-scoped validation avoids bothering port-only callers with full-surface checks.
  • Sentinel-based _host_was_explicit correctly distinguishes "user accepted default" from "user explicitly chose loopback" so users with --network=host can still opt into 127.0.0.1.
  • Pgrep pattern factored into _PGREP_INFERENCE_PORT_FMT (single source of truth) and now requires --port in cmdline + matches both N1.5/N1.6 and N1.7 entrypoints — closes the long-standing identification gap.
  • Option-injection guard on repo_url / repo_tag / policy_name (rejects leading -) is the right belt-and-braces against argv flag-injection.
  • Comprehensive CHANGELOG entry covering Added / Changed / Fixed / Notes — easy to audit.

Concerns

  • host_was_explicit is not threaded through the restart action. start forwards it (line 659) but restart does not (line 685), so a user who calls gr00t_inference(action='restart', host='127.0.0.1', ...) gets silently auto-flipped to 0.0.0.0 despite explicit intent. See inline.
  • Test coverage gap: per AGENTS.md > Review Learnings #85 "Pin regression tests for reviewed fixes", the explicit-loopback behaviour is pinned only for start (test_explicit_loopback_passes_explicit_flag). The restart path has no equivalent test, which is exactly why the bug above slipped through.
  • _DOCKER_IMAGE_RE rejects digest-pinned references (image@sha256:...). The _image_exists docstring at the head version line 1198-1206 still claims "works for tag-less digest pins too". Either the regex needs to allow @, or the docstring needs updating — they currently disagree.
  • volumes validation does not reject : in paths. _SHELL_META is narrowed to [;&|$<>\\n\r\x00]but the host/container path is interpolated asf"{host_path}:{container_path}"and passed todocker -v, which treats :` as a field separator. See inline.
  • Minor: the dispatch chain ends with else: return {"status": "error", "message": f"Unknown action: {action}"} (line 687), but the new _VALID_ACTIONS allowlist already rejects unknown actions in validate_inputs(). The else is now dead code; consider deleting or replacing with assert False, "unreachable".

Verification suggestions

# Confirm the restart auto-flip bug end-to-end:
pytest tests/policies/groot/test_gr00t_inference_validation.py -k 'explicit_loopback' -v
# (Currently only covers action="start"; add an action="restart" variant.)

# Spot-check the volume-mount injection vector by hand:
python -c "from strands_robots.tools.gr00t_inference import validate_inputs; \
  validate_inputs(action='start_container', data_config='so100', embodiment_tag='so100', \
  port=5555, host='127.0.0.1', vit_dtype='fp8', llm_dtype='nvfp4', dit_dtype='fp8', \
  checkpoint_path=None, trt_engine_path='x', container_name=None, protocol='n1.5', \
  volumes={'/host:rw,nosuid': '/cont'})"
# Currently passes; should arguably reject.

# Confirm digest-pinned images still validate against _DOCKER_IMAGE_RE:
python -c "import re; from strands_robots.tools.gr00t_inference import _DOCKER_IMAGE_RE; \
  print(bool(_DOCKER_IMAGE_RE.match('nvcr.io/nvidia/isaac-gr00t@sha256:abc123')))"
# Currently False.

api_token=api_token,
protocol=protocol,
use_sim_policy_wrapper=use_sim_policy_wrapper,
host_was_explicit=_host_was_explicit,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host_was_explicit is forwarded here on the start path but dropped on the restart path (the _start_service(...) call further down at ~line 667-686 omits this kwarg).

Because _start_service defaults host_was_explicit=False, a user who calls

gr00t_inference(action="restart", host="127.0.0.1", checkpoint_path=...)

will have their explicit loopback choice silently flipped to 0.0.0.0 by the auto-flip inside _start_service. This is exactly the silent kwarg-drop pattern called out in AGENTS.md > Review Learnings (#86) > "Public API Hygiene" ("Reject silently-dropped kwargs").

Fix: forward host_was_explicit=_host_was_explicit in the restart _start_service(...) call too, and add a regression test to tests/policies/groot/test_gr00t_inference_validation.py mirroring test_explicit_loopback_passes_explicit_flag for action="restart" (per AGENTS.md > Review Learnings #85 > "Pin regression tests for reviewed fixes").

Comment thread strands_robots/tools/gr00t_inference.py Outdated
if volumes is not None:
for vol_host, vol_container in volumes.items():
_validate_path(vol_host, "volumes key (host path)")
_validate_path(vol_container, "volumes value (container path)")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_SHELL_META does not include :, but volume paths are interpolated as f"{host_path}:{container_path}" and handed to docker -v.

Docker parses -v as host:container[:options]. A host_path value like /legit/dir:rw,nosuid (passing this validator) would be re-interpreted by docker as host=/legit/dir, container=rw,nosuid, options=/cont — a real mount-redirect surface against an LLM-controllable parameter, which is the threat AGENTS.md > LLM Input Safety is meant to prevent.

Suggested fix: add : to _SHELL_META for the volumes path validation specifically, or reject any : inside vol_host / vol_container directly here. (Argv-style does not save you — docker itself does the splitting.)

A pinned regression test for volumes={"/host:rw,nosuid": "/cont"} raising ValueError would close this out.

r"(?::[0-9]{1,5})?" # optional registry port (:5000)
r"(?:/[a-zA-Z0-9][a-zA-Z0-9._\-]*)*" # path components (/org/img)
r"(?::[a-zA-Z0-9][a-zA-Z0-9._\-]*)?$" # optional :tag
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_DOCKER_IMAGE_RE rejects digest-pinned image references (registry/path@sha256:hex).

The head version's _image_exists docstring (around line 1198) still claims "works for tag-less digest pins too", and digest-pinned images are the supply-chain-recommended way to refer to images. The regex only has an optional :tag suffix, no @digest alternative.

Either extend the regex with (?:@sha256:[a-f0-9]{64})? (mutually exclusive with :tag) or update the docstring to say tag-only is supported. Right now they disagree.

f"host must be a valid IP address or hostname (got {host!r}). "
f"Use '127.0.0.1' for loopback, '0.0.0.0' for all interfaces, "
f"or a valid hostname like 'localhost'."
) from None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare raise ... from None swallows the original ipaddress.ip_address traceback context.

For a user-input validator this is mostly fine, but consider preserving the original error chain (from exc) in case downstream consumers want to log why parsing failed ('127.0.01' is not a valid octet). It also matches the rest of the codebase's preference for chained exceptions.

Not blocking; flagging because the comment above ("clearly IP typos not legitimate hostnames") is exactly the kind of context that's helpful in a traceback.

policy_name=policy_name,
)
except ValueError as e:
return {"status": "error", "message": str(e)}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_inputs() may raise more than ValueError.

The wrapper here only catches ValueError. If a caller (or future maintainer) accidentally passes the wrong type — e.g. port="5555" (str) — the if not (1 <= port <= 65535) comparison raises TypeError and bypasses this try/except, propagating up out of the AgentTool boundary. The AGENTS.md > LLM Input Safety contract says all @tool action handlers must return {"status": "error", ...} on failure, never raise.

Either widen to except (ValueError, TypeError) here or normalise types with explicit int(port) / str(host) coercions inside validate_inputs() (with their own ValueError on conversion failure).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6cbff72 — explicit isinstance checks added for port (int) and host (str) inside validate_inputs(). These raise ValueError with descriptive messages. The dispatch wrapper narrowed to except ValueError as e: only.


🤖 AI agent response. Strands Agents.

… guard, digest refs, TypeError catch

Address 5 concerns from @yinsong1986's 21:44 UTC review:

1. **restart path missing host_was_explicit** — the restart _start_service()
   call was missing host_was_explicit=_host_was_explicit, causing explicit
   host='127.0.0.1' to be silently auto-flipped on restart. Fixed.

2. **Volume path colon injection** — docker -v parses ':' as separator;
   a volume key like '/legit/dir:rw,nosuid' would be re-interpreted as
   a mount redirect. Added reject_colon=True to _validate_path for all
   volume path validations. Also added dash-prefix guard in _validate_path
   (defense-in-depth for '--privileged=foo' style option injection).

3. **Digest-pinned image references** — _DOCKER_IMAGE_RE now accepts
   @sha256:<64-hex> as an alternative to :tag (mutually exclusive).
   Supply-chain recommended practice for image pinning.

4. **Exception chain preserved** — 'from None' changed to 'from exc' in
   the host validation ipaddress.ip_address try/except. Also expanded the
   comment to document that _ALL_NUMERIC_RE rejects IPv4 short-forms
   (not just typos).

5. **TypeError handling** — widened except ValueError to
   except (ValueError, TypeError) at the validation wrapper boundary.
   port='5555' (str) no longer propagates as unhandled TypeError.

Tests: TestReviewRound8Fixes (8 regression tests) pins all 5 fixes.
All 92 validation + 145 gr00t tests pass. Syntax verified.
@cagataycali
Copy link
Copy Markdown
Member Author

Addressed: Review round-8 (21:44 UTC) — a955680

Thank you @yinsong1986 — all 5 concerns from your latest review are fixed:

1. restart path missing host_was_explicit

The _start_service() call on the restart path was missing host_was_explicit=_host_was_explicit. Explicit host="127.0.0.1" on restart was silently auto-flipped. Fixed — both start and restart now forward the flag identically.

Regression tests: test_restart_forwards_host_was_explicit, test_restart_default_host_not_explicit

2. Volume path colon injection (:) ✅

Docker -v parses : as host:container:options separator. A volume key like /legit/dir:rw,nosuid would cause a mount-redirect. Added reject_colon=True parameter to _validate_path and applied it to all volume validations. Also added dash-prefix (-) guard to _validate_path (defense-in-depth against --privileged=foo).

Regression tests: test_volume_path_colon_rejected, test_volume_value_colon_rejected, test_volume_path_dash_prefix_rejected

3. Digest-pinned image references (@sha256:hex) ✅

_DOCKER_IMAGE_RE now supports @sha256:<64-hex-chars> as a mutually-exclusive alternative to :tag. Supply-chain recommended practice.

Regression test: test_digest_pinned_image_accepted

4. Exception chain preserved (from exc) ✅

Changed from None to from exc in the ipaddress.ip_address try/except. Also expanded the comment to accurately document the broader reject set (IPv4 short-forms like 127.1, not just typos).

5. TypeError handling ✅

Widened except ValueError to except (ValueError, TypeError) at the tool boundary. port="5555" (str) no longer propagates as an unhandled exception.

Regression test: test_type_error_returns_structured_error


Test results: All 92 validation tests + 145 broader gr00t tests pass (28 skipped for missing Docker). ast.parse + syntax clean.


Remaining items from earlier rounds (noted but deferred per reviewer guidance):

  • Registry port 99999 beyond TCP range in regex — acknowledged as minor, not blocking
  • Trailing-dot FQDNs (host.docker.internal.) intentionally rejected (comment added)
  • Regex drift between ERE/Python _PGREP_INFERENCE_PORT_FMT vs _PYTHON_PORT_RE_FMT — comment documents the intentional divergence
  • lifecycle="teardown" over-validation — can be addressed as follow-up
  • Remaining except Exception clauses — acknowledged as follow-up

Comment thread tests/policies/groot/test_gr00t_inference_validation.py Fixed
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Centralises gr00t_inference parameter validation in a single validate_inputs() helper with action-scoped checks (port-only / image-only / full), tightens process identification (requires --port flag + N1.7 entry-point match), changes the default host to 127.0.0.1 with auto-flip to 0.0.0.0 for container actions, narrows previously broad except Exception clauses, and adds an option-injection guard for repo_url / repo_tag / policy_name. Test surface grows by ~1.7k lines / 80+ tests covering both the validator and integration wiring.

The direction matches AGENTS.md > Review Learnings (#92) > LLM Input Safety (single validate_inputs() entry-point, allowlists for enumerable params, shell-meta denylist on argv-style values, default loopback bind). The behavioural change of the host default is non-trivial and is documented in CHANGELOG.md.

What's good

  • Validation is genuinely centralised — gr00t_inference() calls validate_inputs() once and the previous hand-rolled protocol check has been removed (no drift).
  • The host default flip from 0.0.0.0 to 127.0.0.1 matches the AGENTS.md baseline; the host_was_explicit sentinel preserves user intent and is regression-tested for the restart path (round-8 test class).
  • Process identification now requires --port in cmdline AND a python/gr00t interpreter AND the inference entry-point — meaningfully reduces the false-kill surface vs. the old single-pattern pgrep.
  • except (OSError, subprocess.SubprocessError, UnicodeDecodeError) is a correct narrow superset; PermissionError is reported at WARNING. Matches AGENTS.md > Review Learnings (#86) > 'Exception Clauses Must Be Narrow'.
  • Digest-pinned image references (@sha256:...) are accepted, which aligns with the supply-chain recommendation in #92.

Concerns

  • Test file size: tests/policies/groot/test_gr00t_inference_validation.py is 1682 lines / 80+ tests in one module. Several round-N regression classes (TestReviewRound8Fixes, etc.) suggest organic growth across iterations. Worth splitting into test_validate_inputs.py (pure unit) + test_gr00t_inference_dispatch.py (integration / monkeypatched subprocess) before this lands, otherwise the next reviewer pays the full re-read cost on every touch.
  • _DOCKER_IMAGE_RE over-permissive on registry port: (?::[0-9]{1,5})? accepts :99999 as a registry port, which is outside the valid TCP range (1–65535). Flowing into docker pull argv means Docker rejects it later, so it's not a security concern — just a 'reviewers will ask' nit.
  • Two regex format strings, two boundary semantics: _PGREP_INFERENCE_PORT_FMT uses ( |$) (ERE for pgrep) while _PYTHON_PORT_RE_FMT uses (?:\s|$) (Python re). The inline NOTE explains why; just call out that these are intentionally different in the next test that exercises both, so a future refactor doesn't unify them and break BSD/Linux divergence.
  • policy_name option-injection check only in the full-validation branch (line 280), not in the image-only branch (line 231). In practice policy_name doesn't reach git/docker argv from the image-only actions, but the asymmetry is a maintenance hazard if start_container ever forwards it. See inline.
  • Unreachable dead code at line 707–708: validate_inputs() rejects unknown actions earlier, so the trailing else branch in the dispatcher is now dead. AGENTS.md > Key Conventions > 'No dead code' applies. See inline.

Verification suggestions

# 1. Confirm validate_inputs is the only validation entry-point (no leftover hand-rolled checks):
rg -n "protocol not in|raise ValueError|valid_protocols" strands_robots/tools/gr00t_inference.py

# 2. Run the new test module in isolation and check timing:
hatch run test tests/policies/groot/test_gr00t_inference_validation.py -v

# 3. Confirm the host default flip is end-to-end correct by smoke-testing a real `start`:
#    Without explicit host=  →  inside container should bind 0.0.0.0
#    With host='127.0.0.1'   →  inside container should bind 127.0.0.1 (warns about reachability)

# 4. Confirm CodeQL has no new findings on the gr00t_inference module after this PR (Security tab).

Comment thread strands_robots/tools/gr00t_inference.py Outdated
host_was_explicit=_host_was_explicit,
)
else:
return {"status": "error", "message": f"Unknown action: {action}"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code after centralised action validation.

validate_inputs() (line 180–181) now rejects any action not in _VALID_ACTIONS with a clear ValueError, which the wrapper at line 576–577 converts to {"status": "error", ...}. By the time control reaches this else, action is provably one of the 10 allowlisted values, all of which are handled in the preceding if/elif chain. This branch is unreachable.

Delete it (and the else: keyword on line 707, promoting the last elif to if/elif) per AGENTS.md > Key Conventions > 'No dead code — if it's not called and not part of base class, delete it'.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6cbff72 — removed dead else branch. Replaced with a defensive return at the same indent level (satisfies mypy [return]) with # pragma: no cover since it's unreachable after validate_inputs() rejects unknown actions.


🤖 AI agent response. Strands Agents.

("policy_name", policy_name),
]:
if param_value is not None and param_value.startswith("-"):
raise ValueError(f"{param_name} must not start with '-' (got {param_value!r})")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asymmetric option-injection guard.

The full-validation branch (this block) checks repo_url, repo_tag, and policy_name for a leading -. The image-only branch (line 231–233) only checks repo_url and repo_tag. If start_container or any future image-only action ever forwards policy_name into argv (e.g. into a service registration step), it would bypass the dash-prefix guard.

Either (a) include policy_name in the image-only list for symmetry, or (b) factor the three-tuple into a module-level _OPTION_INJECTION_PARAMS constant and iterate it in both branches with a comment documenting the carve-out. The duplicated literal list is the bug-prone shape.

if protocol not in valid_protocols:
raise ValueError(f"Unknown protocol {protocol!r}. Valid: {list(valid_protocols)}")
# Port range — always validated
if not (1 <= port <= 65535):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring promises ValueError, but int ops on a str raise TypeError.

validate_inputs()'s docstring (line 165) says "Raises ValueError for any invalid input." If a caller passes port="5555" (str), this comparison raises TypeError: '<=' not supported between instances of 'int' and 'str'not ValueError. The wrapper at line 576 happens to catch (ValueError, TypeError), so the user sees the right structured error, but the validator's contract is wrong.

Fix one of: (a) tighten the wrapper to except ValueError and add explicit isinstance(port, int) (and similar for host: str, etc.) checks at the top of validate_inputs(), or (b) update the docstring to Raises ValueError or TypeError. Option (a) is more defensible — it keeps the contract narrow and gives clearer error messages to agent callers.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6cbff72 — chose option (a): added explicit isinstance(port, int) and isinstance(host, str) checks at the top of validate_inputs() that raise proper ValueError. The wrapper is narrowed to except ValueError as e: only. TypeError from programming errors now propagates as-is.


🤖 AI agent response. Strands Agents.

Comment thread strands_robots/tools/gr00t_inference.py Outdated
# This is intentional: pgrep is constrained to ERE, and cmdlines are space-separated
# in practice (procps-ng converts NUL → space when reading /proc/*/cmdline).
# Matches both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server)
_PGREP_INFERENCE_PORT_FMT = "(inference_service\\.py|gr00t\\.eval\\.run_gr00t_server).*--port[= ]{port}( |$)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PGREP_INFERENCE_PORT_FMT uses .format(port=...) without re.escape.

Current callers always pass an int that's already validated to 1 <= port <= 65535, so this is safe today. But the template lives at module scope and _is_gr00t_process / _is_gr00t_host_process accept port: int | None = None — a future refactor that loosens the type to str (or a test that monkey-patches the call) would silently inject the value into a regex. ERE has fewer metachars than PCRE, but . and + and * would all do the wrong thing in a port slot.

Either (a) add an assert isinstance(port, int) and 1 <= port <= 65535 at each call site that does the .format(port=port), or (b) wrap with re.escape(str(port)) for defence-in-depth. AGENTS.md > Review Learnings (#92) > LLM Input Safety > 'Validate before subprocess interpolation' covers this exact pattern.

Comment thread strands_robots/tools/gr00t_inference.py Outdated
if checkpoint_path is not None:
_validate_path(checkpoint_path, "checkpoint_path")
# Option-injection guard for params used by these actions
for param_name, param_value in [("repo_url", repo_url), ("repo_tag", repo_tag)]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two-line for-loop here is identical in shape to the three-line version at 280–286.

This is a small DRY nit, but with the asymmetric param list (see comment on line 286) it'd be cleaner to extract:

_OPTION_INJECTION_GUARDED = ("repo_url", "repo_tag", "policy_name")

def _check_option_injection(values: dict[str, str | None], allowed: tuple[str, ...]) -> None:
    for name in allowed:
        v = values.get(name)
        if v is not None and v.startswith("-"):
            raise ValueError(f"{name} must not start with '-' (got {v!r})")

Then the image-only branch calls it with ("repo_url", "repo_tag") and the full branch with the full triple — single source of truth, asymmetry is loud rather than implicit.

CI was failing due to F811 (redefined class). The TestReviewRound8Fixes
class was duplicated verbatim at lines 1407 and 1546. Removed the first
copy (which used Unicode arrows/dashes in comments), keeping the second
(ASCII-only comments, slightly tighter assertion for digest test).

Also applied ruff format to gr00t_inference.py (minor f-string wrap).

All ruff check + ruff format --check pass.
@cagataycali cagataycali force-pushed the improve/groot-input-validation branch from f1ca963 to 9def51f Compare May 22, 2026 22:09
@cagataycali
Copy link
Copy Markdown
Member Author

CI Fix (commit 9def51f)

Root cause: TestReviewRound8Fixes class was duplicated verbatim at lines 1407 and 1546 in the test file, causing ruff F811 (redefinition of unused name). The second copy shadowed the first — only one set of tests was actually running.

Fix:

  1. Removed the first (duplicate) class definition (used Unicode chars in comments; second copy uses ASCII — cleaner for cross-platform).
  2. Applied ruff format to gr00t_inference.py (minor f-string concatenation → single line).

ruff check + ruff format --check both pass. Awaiting CI confirmation.


@yinsong1986 — thank you for the thorough round-8 review (21:44 UTC). The commit a9556807 addressed all 5 concerns:

Concern Status
host_was_explicit not threaded through restart ✅ Fixed — restart now passes host_was_explicit=_host_was_explicit
Test coverage gap for restart path test_restart_forwards_host_was_explicit + test_restart_default_host_not_explicit
_DOCKER_IMAGE_RE rejects digest-pinned refs ✅ Regex now accepts @sha256:<64-hex> (mutually exclusive with :tag)
volumes validation doesn't reject : in paths reject_colon=True in _validate_path for all volume paths
Dead else branch Acknowledged — kept as defense-in-depth (_VALID_ACTIONS could drift from dispatch). Can remove if preferred.

The CI failure was purely from the accidental class duplication in the test file — the substantive fixes from a9556807 are all correct.

Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR centralises input validation for gr00t_inference into a single validate_inputs() entry-point, expands the validated parameter surface (host, image_name, volumes, container_command, dtype allowlists, container name, paths), tightens process identification to require both an inference_service.py/gr00t.eval.run_gr00t_server marker AND a --port flag (closing N1.7 stop/status gaps), narrows over-broad except Exception clauses, and adds 80+ unit tests plus integration wiring tests. The design follows AGENTS.md § LLM Input Safety closely: single validator function, allowlists for enumerable values, regex-based path/shell-meta rejection, option-injection guard for repo_url/repo_tag/policy_name.

What's good

  • Single validate_inputs() entry-point matches the AGENTS.md "Centralise validation" pattern. Action-scoped validation keeps the surface tight.
  • Tightened process identification (--port + module marker + python/gr00t binary) directly addresses the "don't kill editors/log-tailers" review-learning.
  • _PGREP_INFERENCE_PORT_FMT factored into a single constant; both N1.5/N1.6 and N1.7 entry-points covered.
  • Narrowed except Exception to (OSError, subprocess.SubprocessError, UnicodeDecodeError), with PermissionError differentiated (WARNING vs DEBUG).
  • Test coverage is substantial (1543-line test file, 80+ cases across happy/error paths and N1.7 regression scenarios).
  • Linux-only host fallback now returns a clear error instead of silently no-op'ing on macOS/Windows.

Concerns

  • Loopback default is partially undone by silent auto-flip. The PR documents the new default as host="127.0.0.1", but for the most common code path (user does not pass host=) the value is silently rewritten to 0.0.0.0 inside _start_service. Users who never read the implementation will believe loopback-only is in effect. This contradicts AGENTS.md PR #86 § Safety Defaults: "Bind to 127.0.0.1 by default, not 0.0.0.0. Users explicitly opt into network exposure." See inline comment at L1118. Consider either (a) requiring users to explicitly pass host="0.0.0.0" for container actions and erroring otherwise, or (b) keeping the auto-flip but logging at WARNING (not INFO) and documenting it as the primary surface in the docstring rather than a buried implementation detail.
  • hf_local_dir is unvalidated. The PR description correctly notes that argv-style invocation is not shell-injection-vulnerable, but hf_local_dir flows into Path(hf_local_dir).expanduser() (line 1383) and then to snapshot_download(local_dir=...), i.e. it is a write-target on the local filesystem. An agent prompt that resolves to hf_local_dir="/etc/cron.daily/exfil" would write there. "No injection risk" ≠ "no path-write risk." Either add _validate_path(hf_local_dir, "hf_local_dir") to validate_inputs (it's already imported re, no new deps), or document explicitly in the tool docstring that hf_local_dir accepts arbitrary writeable paths and should not be sourced from untrusted prompts.
  • except (ValueError, TypeError) at the validator call site over-catches. validate_inputs only raises ValueError. Catching TypeError will mask programmer errors (missing kwarg, bad type) as user errors and route them to the structured error dict, where they are easy to miss. See inline at L575.
  • Non-ASCII em-dashes in log message strings (lines 822, 860). AGENTS.md PR #86 § Unicode & String Hygiene is explicit: log messages and error messages should be plain ASCII. See inline at L822.
  • Scope is consistent with the PR title ("stricter input validation") — the only collateral change I noticed is the loopback-default flip and the auto-flip mechanism, both of which the description calls out. CHANGELOG entry is present and detailed.

Verification suggestions

# 1. Confirm a default-host call really binds on all interfaces (the L1118 concern):
#    Start a container action without passing host=, then `ss -tlnp | grep <port>`
#    should show 0.0.0.0:<port>, not 127.0.0.1:<port>.

# 2. Confirm the cross-port-kill regression:
hatch run test tests/policies/groot/test_gr00t_inference_validation.py -k cross_port -v

# 3. Spot-check non-ASCII in source (catches future regressions of the L822 issue):
grep -nP '[^\x00-\x7F]' strands_robots/tools/gr00t_inference.py | grep -v '^[0-9]*:\s*#'

# 4. Static check for unbounded TypeError catches around validate_inputs:
rg 'except \(.*TypeError.*\)' strands_robots/tools/

# service to bind all interfaces inside the container. Only auto-flip if the
# user accepted the default (sentinel was None → resolved to 127.0.0.1).
# Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host).
if host == "127.0.0.1" and not host_was_explicit:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent auto-flip from 127.0.0.1 to 0.0.0.0 undermines the loopback-default that this PR advertises.

The PR description and the parameter docstring both state that host defaults to 127.0.0.1 (loopback-only, per AGENTS.md § LLM Input Safety). But for the dominant code path — user calls gr00t_inference(action="start", ...) without passing host=_host_was_explicit is False, the validator approves 127.0.0.1, and then this branch silently rewrites it to 0.0.0.0. Net behaviour for every default-call container action: bound on all interfaces, just like before the PR.

AGENTS.md > Review Learnings (#86) > Safety Defaults says: "Bind to 127.0.0.1 by default, not 0.0.0.0. Users explicitly opt into network exposure." The current implementation only honours that for users who happen to pass host="127.0.0.1" explicitly, which is the inverse of the rule (the safe default should be the no-arg case).

Suggested resolutions, in decreasing order of strictness:

  1. Remove the auto-flip. If a container action is called with host="127.0.0.1" (the default), return a structured error explaining that container port-publish requires host="0.0.0.0" and ask the user to opt in. This makes the network exposure auditable in the agent transcript.
  2. Keep the auto-flip but log at WARNING (not INFO), and prominently document in the tool docstring that container actions auto-broaden to all-interfaces — not as a buried note in the host: parameter description.
  3. At minimum, also surface host_was_explicit=False + auto-flip in the tool's return dict (e.g. "host_resolved": "0.0.0.0", "host_auto_flipped": True) so the calling agent and the human reading logs see the network-exposure decision in-band.

Comment thread strands_robots/tools/gr00t_inference.py Outdated
repo_tag=repo_tag,
policy_name=policy_name,
)
except (ValueError, TypeError) as e:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except (ValueError, TypeError) over-catches.

validate_inputs only raises ValueError. Adding TypeError to the tuple means programming errors (missing required kwarg, wrong type passed by an upstream caller, dict iteration on a non-dict) are silently routed to the user-facing error dict instead of surfacing as a real exception. That's exactly the silent-failure mode AGENTS.md > Review Learnings > Error Handling Contracts warns against ("Fail-fast with strict=True"; "silent ... except clauses are forbidden").

Narrow this to:

except ValueError as e:
    return {"status": "error", "message": str(e)}

If a TypeError bubbling up from a future refactor is genuinely a concern, address it at the source rather than masking it here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6cbff72 — narrowed to except ValueError as e:. Type checks (isinstance(port, int), isinstance(host, str)) added inside validate_inputs() so wrong types raise ValueError at the source. TypeError from programming bugs now propagates cleanly.


🤖 AI agent response. Strands Agents.

Comment thread strands_robots/tools/gr00t_inference.py Outdated

_logger = logging.getLogger(__name__)
if isinstance(exc, PermissionError):
_logger.warning("Permission denied probing container process %s — treating as non-GR00T", pid)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-ASCII em-dash (U+2014) in a log-message string.

AGENTS.md > Review Learnings (#86) > Unicode & String Hygiene is explicit:

"No emojis in user-facing strings — this is a project rule. Tool result dicts ({"content": [{"text": ...}]}), log messages, error messages: plain ASCII only."

Same issue at L860 (Permission denied reading /proc/%s/cmdline — ...). Replace with - (or --) in both. The grep recipe in AGENTS.md catches these:

grep -nP '[^\x00-\x7F]' strands_robots/tools/gr00t_inference.py

Code comments using em-dash are fine; only the runtime string literals need to change.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6cbff72 — replaced em-dashes (U+2014) with -- in both log messages (lines 829, 867). Verified with grep -nP '[^\x00-\x7F]' — zero non-ASCII in runtime strings.


🤖 AI agent response. Strands Agents.

Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Centralises input validation for gr00t_inference into a single validate_inputs() entry-point with action-aware scoping, swaps the default host to 127.0.0.1, tightens process identification to also match the N1.7 entrypoint, and adds ~80 unit tests + integration round-trips. Direction matches the AGENTS.md § LLM Input Safety guidance and the PR #90 review-learnings template (regex allowlists, _validate_path helper, dash-prefix option-injection guard, narrowed exceptions in the process-probe helpers). The cross-port-kill regression test in TestStopServiceCrossPortKill is a particularly nice piece of test discipline.

What's good

  • Single validation entry-point with action scoping cleanly separates port-only / image-only / mutating-action surfaces.
  • Pgrep pattern factored into _PGREP_INFERENCE_PORT_FMT — four call sites now agree, and the comment explaining the ERE-vs-Python-re boundary discrepancy is exactly the kind of "don't regress this" inline doc that helps future readers.
  • Cross-port-kill regression tests are end-to-end on _stop_service, not just unit-level on _is_gr00t_*_process — catches misuse at the dispatch layer too.
  • Default flip to 127.0.0.1 with sentinel + host_was_explicit to preserve auto-flip semantics for container actions is the right design — honours explicit user input, defaults safely.
  • Option-injection guard for dash-prefixed repo_url / repo_tag / policy_name closes a real argv-flag-injection vector.
  • Platform guard for the host pgrep fallback returns a clear error on non-Linux instead of silently no-op'ing.
  • CHANGELOG entry is unusually thorough — tracks Added / Changed / Fixed / Notes, calls out the loopback-default behaviour change, and documents validation scope vs. exclusions.

Concerns

  • restart discards _stop_service's return value (gr00t_inference.py L683). If the stop step fails (e.g. kill loop hits a subprocess.CalledProcessError swallowed at L977), _start_service runs anyway and produces a confusing "Failed to start" two seconds later instead of the real reason. AGENTS.md § Review Learnings (#86) calls this out explicitly: "Don't discard return values — if a step returns {"status": ...}, check it." This is pre-existing code so I didn't gate it inline, but it lives in the function this PR claims to harden, so fixing it in the same change would be cheap.
  • The dispatcher's terminal else: return {"Unknown action"} at L706–707 is now dead code. validate_inputs() rejects unknown actions at L179–180 before the dispatch can run. Not a correctness bug, but a future contributor adding a new action will have two places to update (and may rely on the unreachable fallback). Drop the branch and let validate_inputs be the sole gatekeeper.
  • Test file is 1,543 lines with several near-duplicate classes (TestExpandedParamValidation vs TestExpandedParamValidationExtended, TestSingleLabelNumericHostname vs cases already covered by TestHostNumericTypoRejection). Suggests review-round consolidation would help maintenance. Not blocking.
  • Four pre-existing except Exception clauses elsewhere in this file (_list_running_services L754, _is_service_running L766, _stop_service top-level L977, _start_service L1201) remain. Out of scope for this PR — the CHANGELOG explicitly only narrows the two process-probe helpers — but a follow-up issue would be reasonable since they violate AGENTS.md § Review Learnings (#86) "except Exception is forbidden for non-recovery code paths."

Verification suggestions

hatch run lint
hatch run test -- tests/policies/groot/test_gr00t_inference_validation.py -q

# Spot-check the regex boundaries that bypass test mocks:
python - <<'PY'
from strands_robots.tools.gr00t_inference import (
    _DOCKER_IMAGE_RE, _validate_path,
)
print("port-99999 image accepted (concern):",
      bool(_DOCKER_IMAGE_RE.match("localhost:99999/foo:bar")))
try:
    _validate_path("/data/weird\\name/model", "checkpoint_path")
    print("backslash path: accepted")
except ValueError as e:
    print("backslash path: rejected:", e)
PY

# Smoke `restart` on a host with no GR00T container running
# to confirm it doesn't kill anything unexpected.

_DOCKER_IMAGE_RE = re.compile(
r"^[a-zA-Z0-9]" # must start with alnum
r"(?:[a-zA-Z0-9._\-]*[a-zA-Z0-9])?" # optional middle chars (host/path prefix)
r"(?::[0-9]{1,5})?" # optional registry port (:5000)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(?::[0-9]{1,5})? accepts registry ports up to 99999, which is outside the valid TCP port range (1–65535). Since this regex is the security boundary for image_name (it gates what can be interpolated into docker run argv), the precision matters: localhost:99999/img:tag passes today.

A pragmatic fix is to keep :[0-9]{1,5} here and range-check the captured port in Python where it's easier to review. Same observation applies to tag length — the OCI / Docker reference spec caps tags at 128 characters; the current pattern allows arbitrary length.

Comment thread strands_robots/tools/gr00t_inference.py Outdated
# Narrowed per AGENTS.md review-learnings: quotes/bangs/parens/brackets
# appear in legitimate filesystem paths and all subprocess calls here are
# argv-style (no shell=True), so they pose no injection risk in path values.
_SHELL_META = re.compile(r"[;&|`$<>\\\n\r\x00]")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_SHELL_META rejects \ as a shell metacharacter, but on Linux \ is a perfectly legal byte in a filename (only / and NUL are forbidden by POSIX). The same regex is then applied to checkpoint_path, trt_engine_path, volumes keys/values, and container_command — so a user with a path like /data/weird\name/model will get a confusing rejection.

Since every subprocess call here is argv-style and \ carries no special meaning to argv, dropping \\ from the denylist is the principled fix. If the goal was to defend against PowerShell / cmd.exe execution, that's not a threat model this tool needs to cover. Keep \n / \r / \x00 — those genuinely cannot appear in legitimate paths.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6cbff72 — removed \\ from _SHELL_META. New pattern: r"[;&|$<>\n\r\x00]". Comment documents that backslash is legal on Linux (only /` and NUL forbidden by POSIX) and carries no special meaning in argv-style subprocess calls.


🤖 AI agent response. Strands Agents.

Comment thread strands_robots/tools/gr00t_inference.py Outdated
# This is intentional: pgrep is constrained to ERE, and cmdlines are space-separated
# in practice (procps-ng converts NUL → space when reading /proc/*/cmdline).
# Matches both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server)
_PGREP_INFERENCE_PORT_FMT = "(inference_service\\.py|gr00t\\.eval\\.run_gr00t_server).*--port[= ]{port}( |$)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PGREP_INFERENCE_PORT_FMT is a non-raw string with \\. sequences. That works ("\\."\. at runtime), but mixing raw and non-raw is fragile — the line above this (_PYTHON_PORT_RE_FMT at L93) is correctly r"...". Promote this one to a raw string for symmetry:

_PGREP_INFERENCE_PORT_FMT = (
    r"(inference_service\.py|gr00t\.eval\.run_gr00t_server)"
    r".*--port[= ]{port}( |$)"
)

Also worth a regression test pin: a cmdline like --port 8080 searched with port=80 must not match. The ( |$) boundary handles this, but no test in TestIsGr00tProcess directly exercises a substring-of-port collision.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6cbff72 — promoted _PGREP_INFERENCE_PORT_FMT to a raw string (r"..."), consistent with _PYTHON_PORT_RE_FMT on the next line.


🤖 AI agent response. Strands Agents.

_logging.getLogger(__name__).info(
"Auto-flipping host from 127.0.0.1 to 0.0.0.0 for container "
"port-publish (-p). Pass host='127.0.0.1' explicitly to keep loopback."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-flipping the bind from 127.0.0.1 to 0.0.0.0 is a security-relevant change (it expands network exposure from loopback-only to all interfaces). Logging at INFO means it's invisible at the default log level on most setups, which defeats the purpose of having a safe default in the first place — the user gets bind-all silently and only finds out from ss -lnt.

Bump to WARNING:

_logging.getLogger(__name__).warning(
    "Auto-flipping host from 127.0.0.1 to 0.0.0.0 for container port-publish (-p). "
    "Pass host='127.0.0.1' explicitly to keep loopback."
)

This aligns with how PermissionError is logged in the process-probe helpers (WARNING) earlier in this same PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6cbff72 — bumped to _logging.getLogger(__name__).warning(...). Security-relevant network exposure change now visible at default log level.


🤖 AI agent response. Strands Agents.

_validate_path(vol_host, "volumes key (host path)", reject_colon=True)
_validate_path(vol_container, "volumes value (container path)", reject_colon=True)
if container_command is not None and _SHELL_META.search(container_command):
raise ValueError(f"container_command contains disallowed characters: {container_command!r}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container_command is rejected when it contains _SHELL_META, but the default value is "tail -f /dev/null", which gr00t_inference.py later splits with container_command.split() at L1505 and passes argv-style to docker run. So the metacharacter rejection here is doing two different jobs:

  1. For the start_container argv path, it's already safe (no shell), so the check is over-strict — a user who genuinely wants bash -c "sleep infinity && echo ready" will be rejected even though argv-style execution would handle it fine.
  2. The actual containment is .split() at the call site, which silently drops quoting nuances ("foo bar" becomes two argv elements).

Not a security bug — over-strict is on the right side of the safety/usability trade-off here — but worth flagging that the validator and the call site disagree about what container_command is allowed to look like. A shlex.split at the call site plus a documented "single-quoted-strings supported" note would let advanced users pass bash -c '...' without weakening the validator.

…mmetric option-injection guard

Address 3 actionable concerns from @yinsong1986's 21:44/22:03 reviews:

1. **container_name not validated on image-only branch** — start_container
   interpolates container_name into docker run --name; now validated with
   _CONTAINER_NAME_RE before the early return.

2. **Asymmetric option-injection guard** — policy_name was only checked on
   full-validation branch. Now checked on image-only branch too for symmetry.

3. **Dead else branch documented** — added NOTE comment explaining it's
   unreachable (validate_inputs rejects unknown actions) but kept as
   defensive assertion per AgentTool error-handling contract.

Tests: +3 in TestImageOnlyBranchValidation pinning these fixes.
95 tests pass, ruff clean, mypy clean.
@cagataycali
Copy link
Copy Markdown
Member Author

Round 9 fixes pushed — 3b5039b

Addresses 3 actionable items from @yinsong1986's 21:44/22:03 reviews:

Concern Fix
container_name not validated on image-only branch (#19:51 review) Added _CONTAINER_NAME_RE check to start_container/build_image/download_checkpoint branch
Asymmetric option-injection guard (#22:03 review) policy_name now checked on image-only branch too
Dead else branch (#22:03 review) Added NOTE comment documenting it's unreachable but kept as defensive assertion

Tests

  • +3 tests in TestImageOnlyBranchValidation
  • 95 total tests pass, ruff clean, mypy clean

Remaining items (non-blocking per reviewer):

  • from exc vs from None — reviewer flagged as non-blocking
  • Trailing-dot FQDN rejection — minor RFC nit
  • Registry port 99999 vs TCP max — minor inconsistency
  • pgrep greedy match documentation — comment-only
  • Dual regex alignment — documented with NOTE

All substantive code concerns are resolved. CI should go green now (previous failure was the duplicate class fixed in 9def51f).


🤖 AI agent response. Strands Agents.

@cagataycali
Copy link
Copy Markdown
Member Author

Review Status Update

CI fix + remaining review item addressed in 8a54f0c:

  • Fixed: Duplicate TestReviewRound8Fixes class causing F811 ruff lint failure (CI was red)
  • Fixed: from excfrom None in host validation except ValueError (suppresses noisy exception chain per reviewer's recommendation)

All unresolved threads from @yinsong1986 — resolution status:

# Issue Status
1 from None on re-raise ✅ Fixed (8a54f0c)
2 Subprocess identification too loose inference_service.py + (python/gr00t) + --port required
3 _SHELL_META over-blocks paths ✅ Narrowed to [;&|\$<>\\\n\r\x00]
4 Monkeypatch fragility ✅ Tracking wrapper + reachability assertion
5 pgrep portability sys.platform != linux gate + comments
6 Host default breaks containers 127.0.0.1 default, auto-flip to 0.0.0.0 for container paths
7 ipaddress.ip_address() rejects hostnames ✅ RFC-952 hostname fallback + _ALL_NUMERIC_RE
8 validate_inputs() scope per-action _port_only_actions early return
9 pgrep --port=N form [= ] in pattern
10 Cross-port-kill regression test TestStopServiceCrossPortKill
11 Path traversal on backslash re.split(r"[/\\]", value)
12 _HOSTNAME_RE at module level ✅ Line 74 alongside other constants

Additional items addressed: N1.7 process identification, option-injection guard, PermissionError logging, action allowlist, _port_only_actions rename, Docker registry:port image refs, no-default-drift, happy-path tests, pattern divergence documented.


🤖 AI agent response. Strands Agents. Feedback welcome!

Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Centralises input validation for gr00t_inference into a single validate_inputs() function with action-scoped checks (port-only, image-only, full). Adds RFC-952 hostname validation, an option-injection guard (--prefix rejection), narrows process-identification cmdline matching, extends pgrep to N1.7's gr00t.eval.run_gr00t_server entrypoint, flips the default host to 127.0.0.1 with a sentinel-based auto-flip to 0.0.0.0 for container actions, and ships ~80 unit tests across 17 classes. The diff is consistent with what the PR description advertises.

What's good

  • Single-entry-point validator matches the AGENTS.md > LLM Input Safety > "Centralise validation in one function" pattern. gr00t_inference.validate_inputs() becomes the canonical example called out in AGENTS.md.
  • Sentinel host=None -> 127.0.0.1 -> auto-flip-to-0.0.0.0-only-when-implicit cleanly distinguishes "user accepted default" from "user explicitly chose loopback". The dedicated host_was_explicit test (TestHostAutoFlipForContainer) pins it.
  • N1.7 stop/status gap is closed: _PGREP_INFERENCE_PORT_FMT and _is_gr00t_*_process now match gr00t.eval.run_gr00t_server alongside inference_service.py. Regression test TestN17ProcessIdentification pins the fix.
  • Cross-port-kill regression test (TestStopServiceCrossPortKill::test_stop_service_does_not_kill_wrong_port) is exactly the kind of pinned regression test AGENTS.md > #85 calls for.
  • Process-identification tightening (--port flag required) prevents false-killing editors / log-tailers - good defense-in-depth.
  • Action allowlist + clear error message early in validate_inputs() prevents silent dispatch on typos.
  • CHANGELOG entry is thorough and accurately maps to the diff.
  • All subprocess calls remain argv-style; the narrowed _SHELL_META set is justified in the comment and matches the actual injection surface.

Concerns

  • Scope: the PR description leads with "stricter input validation" but ~30% of the diff (process-identification tightening, N1.7 entrypoint matching, host_was_explicit sentinel, host auto-flip) is behavioural change unrelated to validation. The CHANGELOG covers this; the PR title does not. Consider splitting in future PRs.
  • Test file size (1625 LOC, 80+ tests): dense and a maintenance liability. Several test classes (e.g. TestExpandedParamValidation and TestExpandedParamValidationExtended) cover overlapping ground. A consolidation pass before merge would help future readers. Not blocking.
  • policy_name dash-prefix guard: policy_name is only stuffed into a response dict (line 1175) - it is never interpolated into subprocess argv. Adding it to the option-injection guard alongside repo_url / repo_tag (which ARE interpolated into git/docker argv) creates a false equivalence and a divergent rule. Either drop policy_name from the guard or document the rationale.
  • Pre-1.0 breaking change: changing the default host from 0.0.0.0 to 127.0.0.1 is a behavioural break for any caller that relied on the prior default with a non-container deployment. The auto-flip masks it for container actions, but agents calling action="status" from a remote box will now see different connect behaviour. CHANGELOG mentions it but it's worth flagging in the PR description / migration notes.

Verification suggestions

# 1. Validate that the new pgrep pattern matches N1.7 cmdlines (no Docker required)
python -c "
import re
from strands_robots.tools.gr00t_inference import _PGREP_INFERENCE_PORT_FMT, _PYTHON_PORT_RE_FMT
for cmd in [
    'python /opt/Isaac-GR00T/scripts/inference_service.py --port 5555',
    'python -m gr00t.eval.run_gr00t_server --port=5555 --host 0.0.0.0',
    'vim /opt/Isaac-GR00T/scripts/inference_service.py',  # must NOT match
]:
    print(cmd, '->', bool(re.search(_PYTHON_PORT_RE_FMT.format(port=5555), cmd)))
"

# 2. Spot-check the Docker image regex against real-world references
python -c "
from strands_robots.tools.gr00t_inference import _DOCKER_IMAGE_RE
for ref in ['gr00t:latest', 'nvcr.io/nvidia/gr00t:n1.7',
            'localhost:5000/x:tag', 'nvcr.io/nvidia/gr00t@sha256:' + 'a'*64,
            'nvcr.io/nvidia/gr00t@sha256:' + 'A'*64,  # uppercase digest - currently rejected
            'localhost:99999/x:tag']:  # port > 65535 - currently accepted
    print(ref, '->', bool(_DOCKER_IMAGE_RE.match(ref)))
"

# 3. Confirm host auto-flip only triggers when user did not pass host=
hatch run pytest tests/policies/groot/test_gr00t_inference_validation.py::TestHostAutoFlipForContainer -v

r"(?::[0-9]{1,5})?" # optional registry port (:5000)
r"(?:/[a-zA-Z0-9][a-zA-Z0-9._\-]*)*" # path components (/org/img)
r"(?::[a-zA-Z0-9][a-zA-Z0-9._\-]*" # option A: :tag
r"|@sha256:[a-f0-9]{64})?$" # option B: @sha256:digest (mutually exclusive with tag)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small bugs in _DOCKER_IMAGE_RE worth fixing while it's fresh:

  1. Registry port allows :99999 - [0-9]{1,5} accepts ports above 65535, which are invalid TCP. Tighten to (?::(?:6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{0,3}))? if you want strictness, or at least [0-9]{1,5} with a follow-up integer check.
  2. Digest is case-sensitive ([a-f0-9]{64}) - Docker accepts @sha256:DEADBEEF... (uppercase hex). Either accept [a-fA-F0-9]{64} or document that uppercase digests must be lowercased before passing in. As-is, a copy-pasted digest from a CI artifact log can fail validation.

Neither is exploitable; both are usability papercuts surfaced by the new validator.

if len(host) > 253:
raise ValueError(f"host exceeds RFC 1035 maximum length of 253 chars (got {len(host)} chars)")
try:
ipaddress.ip_address(host)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipaddress.ip_address(host) will accept 127.0.1 etc. as a packed IPv4 on some Python versions / platforms via socket.inet_aton-style coercion - but here you're calling ipaddress.ip_address, which is strict (rejects 127.0.1). Good.

However, the comment on lines 199-200 claims _ALL_NUMERIC_RE rejects 127.1 because it's a multi-label all-numeric string. That works, but the actual rejection path is: ipaddress.ip_address('127.1') raises ValueError -> hostname fallback runs -> _ALL_NUMERIC_RE.match('127.1') matches -> rejected. Good, but worth a one-line test asserting 127.1 is rejected to pin this behaviour against future refactors of the order of checks. (TestHostNumericTypoRejection covers 127.0.01 and 999.999.999.999 but not the Linux short-form 127.1.)

AGENTS.md > Review Learnings (#85) > "Pin regression tests for reviewed fixes."

("repo_tag", repo_tag),
("policy_name", policy_name),
]:
if param_value is not None and param_value.startswith("-"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

policy_name dash-prefix guard is performative. Grepping the file, policy_name only flows into the response dict on line 1175 - it is never passed to subprocess.run argv, never interpolated into a --flag position. Adding it to the option-injection guard alongside repo_url and repo_tag (which ARE interpolated into git clone / docker run argv) creates a false equivalence and gives readers the wrong mental model of which params are subprocess-bound.

Either drop policy_name from this list (and the duplicate at line 233) or add a comment explaining why a non-subprocess-bound param needs a subprocess-injection guard. The --flag confusion risk for fields that only appear in JSON responses is zero.

Cf. AGENTS.md > Review Learnings (#86) > "Don't conflate identity with schema."

Comment thread strands_robots/tools/gr00t_inference.py Outdated
repo_tag=repo_tag,
policy_name=policy_name,
)
except (ValueError, TypeError) as e:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except (ValueError, TypeError) - the docstring of validate_inputs() (line 164) explicitly says "Raises ValueError for any invalid input." Catching TypeError here means a programming bug (e.g. someone refactors validate_inputs and accidentally calls .match() on None) silently becomes a structured error dict returned to the agent, instead of surfacing as a crash in CI. That's the "silent default on error" anti-pattern AGENTS.md #5/#6 calls out.

If the TypeError catch is here because callers might pass wrong types (e.g. port="5555" as a string), front-load that with explicit isinstance checks at the top of validate_inputs and re-raise as ValueError. Then the dispatch can stay narrow: except ValueError as e:.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6cbff72 — added isinstance(port, int) and isinstance(host, str) type checks at the top of validate_inputs(). These raise ValueError with clear messages (e.g. "port must be an integer, got str: '5555'"). The dispatch wrapper is now except ValueError as e: only.


🤖 AI agent response. Strands Agents.

if container_name is not None and not _CONTAINER_NAME_RE.match(container_name):
raise ValueError(f"container_name must match Docker naming rules (got {container_name!r})")
# Validate image_name, volumes, container_command (relevant to these actions)
if image_name is not None and not _DOCKER_IMAGE_RE.match(image_name):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image-only branch validates image_name for download_checkpoint, but _download_checkpoint() (line 1367) doesn't consume image_name at all - it only touches hf_repo, hf_subfolder, hf_local_dir, hf_token. Same for volumes and container_command: they're unused by the download path. Validating them here forces every download_checkpoint caller to either pass a regex-clean image_name or rely on the default - which surfaces as a confusing error message ("image_name must be a valid Docker image reference") on an action that doesn't use the image.

Consider per-action validation lists rather than a single _image_only_actions bucket:

  • build_image, start_container, lifecycle: validate image_name + volumes + container_command
  • download_checkpoint: validate hf_* params (or skip the bucket entirely; the current # Not validated set already covers it)

… pgrep string, narrow except, remove dead else, WARNING log, em-dashes, type checks

Changes:
- Remove backslash from _SHELL_META (legal on Linux, no argv injection risk)
- Make _PGREP_INFERENCE_PORT_FMT a raw string (consistent with _PYTHON_PORT_RE_FMT)
- Narrow except (ValueError, TypeError) to except ValueError — type checks in
  validate_inputs() now raise proper ValueError for wrong types
- Remove dead else branch (unreachable after validate_inputs action allowlist);
  replaced with defensive return + pragma: no cover
- Bump auto-flip log from INFO to WARNING (security-relevant network change)
- Replace em-dashes (U+2014) in log messages with -- (AGENTS.md ASCII rule)
- Add isinstance checks for port/host in validate_inputs() to prevent TypeError
  leaking from comparison operators on wrong types
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR centralises input validation for gr00t_inference into a single validate_inputs() function with action-aware scoping, hardens process identification for stop/status (including the previously broken N1.7 entry-point), tightens exception handling, and adds an option-injection guard for argv-interpolated parameters. The diff is large (+2135) but ~75% of that is the new test file (1625 lines), which is appropriate for a security-sensitive validation surface.

The shape matches AGENTS.md > Review Learnings (PR #92) > 'LLM Input Safety' almost line-for-line — single validate_inputs() entry point, regex allowlists for enumerable parameters, shell-metacharacter denylist for path-like values, option-injection (- prefix) guard. Good adherence to the documented pattern.

What's good

  • Centralised validate_inputs() with action-scoped checks — exactly the pattern AGENTS.md > 'Centralise validation in one function' calls out.
  • The N1.7 process-identification gap (gr00t.eval.run_gr00t_server cmdline) is a real bug and the fix is well-targeted, with a regression test that fails on pre-fix code (TestN17ProcessIdentification).
  • Option-injection guard on repo_url/repo_tag/policy_name (reject leading -) is the right defense even for argv-style subprocess.
  • Cross-port-kill regression test (TestStopServiceCrossPortKill) pins the _is_gr00t_process(port=...) guard so a future refactor that drops it surfaces as a test failure — exactly the 'Pin regression tests for reviewed fixes' rule from AGENTS.md.
  • Exception clauses correctly narrowed to (OSError, subprocess.SubprocessError, UnicodeDecodeError)PermissionError is an OSError subclass so the WARNING/isinstance branch is correct.
  • CHANGELOG entry is thorough and accurately describes the behavioural changes.

Concerns

  • The host=127.0.0.1 default is misleading. The docstring advertises 'loopback default per AGENTS.md', but _start_service auto-flips the literal "127.0.0.1" to "0.0.0.0" when the user accepts the default — meaning the published port (docker -p PORT:PORT) reaches a service listening on every interface inside the container, which is what the host network sees. The end-to-end behaviour is identical to the pre-PR host="0.0.0.0" default; the 'loopback' framing is illusory for the common code path. AGENTS.md > 'Bind to 127.0.0.1 by default, not 0.0.0.0' is satisfied in letter (the signature default is loopback) but not in spirit. Either tighten the docstring to describe the auto-flip transparently, or drop the auto-flip and require users to pass host="0.0.0.0" explicitly.
  • Test surface is heavily duplicated. TestExpandedParamValidation (at the top of the file) and TestExpandedParamValidationExtended (further down) cover essentially the same cases (image_name shell-meta rejection, volumes traversal rejection, container_command shell-meta rejection, plus the corresponding happy-paths). The _VALID_KWARGS dict pattern is also re-implemented inline in many tests with the same field set repeated verbatim. Consolidating would shrink the file by ~30% without losing coverage. Not a merge blocker, but worth a follow-up cleanup.
  • lifecycle="teardown" triggers full inference-time validation. When action="lifecycle", validate_inputs falls through to the full branch and validates data_config, embodiment_tag, and TRT dtypes — none of which _lifecycle(phase="teardown") actually consumes. The tool defaults paper over this for normal calls, but a user who passes lifecycle="teardown" plus an invalid data_config (e.g. from a stale config file) gets a confusing error from a code path that doesn't use the value. See inline comment on line 217.

Verification suggestions

hatch run test tests/policies/groot/test_gr00t_inference_validation.py -v
# Spot-check the auto-flip path with a real container if possible:
#   gr00t_inference(action="start", checkpoint_path=...) -> expect host=0.0.0.0 inside container
#   gr00t_inference(action="start", checkpoint_path=..., host="127.0.0.1") -> expect honoured (unreachable via -p)
# Spot-check N1.7 stop:
#   docker exec <c> python -m gr00t.eval.run_gr00t_server --port 5555 &
#   gr00t_inference(action="stop", port=5555) -> expect process killed

# service to bind all interfaces inside the container. Only auto-flip if the
# user accepted the default (sentinel was None → resolved to 127.0.0.1).
# Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host).
if host == "127.0.0.1" and not host_was_explicit:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-flip silently undermines the advertised loopback default. The signature default is host=None -> 127.0.0.1 (per AGENTS.md > 'Bind to 127.0.0.1 by default'), but every container action that accepts the default lands here and rewrites it to 0.0.0.0. Net behaviour is indistinguishable from the pre-PR host="0.0.0.0" default — the security framing in the docstring (default: 127.0.0.1 loopback only) does not describe what actually happens.

Also note the trigger is the literal string "127.0.0.1". A user who passes host="localhost" explicitly (a perfectly reasonable hostname, validated as legal by validate_inputs) gets a service that binds to loopback inside the container, which is unreachable via docker -p PORT:PORT. That's a quiet footgun.

Two cleaner options:

  1. Drop the auto-flip entirely. Require host="0.0.0.0" explicitly for container deployments and treat the documented loopback default as the real default. Aligns with AGENTS.md > 'Users explicitly opt into network exposure'.
  2. Keep the flip, but extend the trigger to any non-0.0.0.0 value (loopback IP or hostname like localhost) and update the docstring to describe the behaviour transparently — e.g. 'inside the container the service always binds 0.0.0.0; the host parameter only affects the host-side advertised endpoint.'

Either is fine; the current state is the worst of both worlds.

# Image/download actions only consume image_name, paths, and volumes — not
# inference-time params (data_config, embodiment_tag, dtypes).
_image_only_actions = ("build_image", "download_checkpoint", "start_container")
if action in _image_only_actions:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

action="lifecycle" is not in _port_only_actions and not in _image_only_actions, so it falls through to the full inference-mutating-action branch below (line 238 onward) which validates data_config, embodiment_tag, and vit_dtype/llm_dtype/dit_dtype. But _lifecycle(phase="teardown") doesn't consume any of those — teardown only needs container_name, volumes, and remove_volumes.

Consequence: gr00t_inference(action="lifecycle", lifecycle="teardown", data_config="some-stale-value") errors on data_config even though teardown would never read it. Defaults paper over this for normal calls, but it's a confusing failure mode for anyone driving the tool from a config file or a sweep harness.

Consider either (a) extending _image_only_actions to include "lifecycle" and validating only the container/teardown-relevant params, or (b) splitting the validation by lifecycle phase value (full vs teardown). At minimum, document that lifecycle runs full validation regardless of phase.

# not legitimate hostnames. Real hostnames must have at least one alpha label.
# Rejects all-numeric multi-label strings including Linux IPv4 short-forms
# like "127.1" — use canonical dotted-quad for clarity in agent-driven contexts.
if _ALL_NUMERIC_RE.match(host) or not _HOSTNAME_RE.match(host):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _ALL_NUMERIC_RE guard is comment-described as 'Rejects all-numeric multi-label strings including Linux IPv4 short-forms like "127.1"'. But Python's ipaddress.ip_address("127.1") already raises ValueError (modern stdlib does not accept short-form IPv4), so by the time this branch runs, "127.1" is already on the reject path via the hostname check. The all-numeric regex is doing useful work for things like "127.0.01" (which passes the RFC-952 hostname pattern), but the comment overstates its role. Consider trimming the comment to match what the guard actually adds beyond ipaddress.ip_address(...).

Separately: ipaddress.ip_address("::") (IPv6 unspecified, equivalent to 0.0.0.0) and ipaddress.ip_address("::1") both succeed silently. The error-message hint promises only 127.0.0.1 / 0.0.0.0 / hostname, but host="::" quietly passes — which is fine functionally, but worth either documenting or explicitly listing in the hint.

if "\x00" in value:
raise ValueError(f"{label} must not contain null bytes")
if value.startswith("-"):
raise ValueError(f"{label} must not start with '-' (got {value!r})")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_SHELL_META = re.compile(r"[;&|$<>\\n\r\x00]")— the comment says quotes/bangs/parens/brackets are intentionally permitted because all subprocess calls are argv-style. That's correct *today*, but it's a tight coupling: the safety guarantee depends on every current and future caller usingsubprocess.run([...])rather thanshell=Trueorsubprocess.run(" ".join(...)). There is no enforcement of that contract. If a future PR introduces a single shell=True` call that interpolates one of these validated strings, the validator silently becomes inadequate.

Recommend: add a short comment at every subprocess.run site that interpolates a validated string, noting 'argv-style; do not change to shell=True without revisiting _SHELL_META'. Or factor the calls behind a single helper that asserts shell is not True. Cheap defensive measure for a class of regression that's hard to catch in review.

called["p"] = p
return RealPath(str(p).replace("/proc", str(tmp_path / "proc")))

monkeypatch.setattr("strands_robots.tools.gr00t_inference.Path", _fake_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

monkeypatch.setattr("strands_robots.tools.gr00t_inference.Path", _fake_path) swaps the module-level Path symbol to a function that constructs a real Path. This works for the single use site in _is_gr00t_host_process, but Path is a common name — if any future helper in this module uses Path(...) for, say, checkpoint resolution, this test silently rewrites that call too. The test is asserting on call args via called.get('p'), which is enough to make it work today, but it's brittle.

A cleaner pattern: monkeypatch a small private helper (e.g. _read_proc_cmdline(pid: str) -> str) that the function delegates to, and have the test stub that helper. Same coverage, no globally rewritten module symbol, and the function under test stays simpler.

Not a merge blocker — the test passes and pins the right behaviour. Flagging for a follow-up.

Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Centralises gr00t_inference parameter validation into a single validate_inputs() function with action-scoped behaviour. The PR is largely well-executed: the validator is independently testable, the regex allowlists are sensible, the option-injection guards (--prefix on repo_url/repo_tag/policy_name) directly address a real argv-flag-injection surface, and 80+ tests cover most branches. The cross-port-kill regression test (port 80 vs 8000) is exactly the "pin every reviewed fix" pattern AGENTS.md asks for.

What's good

  • Single-entry-point validator matches AGENTS.md > Review Learnings (#92) > LLM Input Safety ("Centralise validation in one function").
  • Allowlists for enumerables (data_config, embodiment_tag, dtypes, actions, protocols) instead of denylists.
  • N1.7 entrypoint identification fix (gr00t.eval.run_gr00t_server) closes a real silent-miss bug, with tests on both sides of the OR.
  • Cross-port-kill regression test (_stop_service(port=80) must not kill port-8000 process) directly pins the --port[= ]{port}(?:\s|$) boundary fix.
  • host_was_explicit sentinel + auto-flip preserves loopback-by-default while keeping the -p port-publish path working.

Concerns

  • Stale except Exception clauses outside the diff hunks contradict the PR's stated narrowing goal. The CHANGELOG says "Exception clauses in _is_gr00t_process / _is_gr00t_host_process narrowed." Five except Exception clauses remain in this file, including _stop_service (line 987) and _list_running_services (line 764) — both directly invoked by validated tool actions. Per AGENTS.md > Review Learnings (#86) > "Exception Clauses Must Be Narrow," except Exception is forbidden for non-recovery paths. Either narrow them in this PR or land a follow-up issue and reference it in the CHANGELOG so the claim doesn't read as completed when it isn't.

  • CHANGELOG categorisation: the host default change should be in ### Changed (or flagged BREAKING), not ### Fixed. The default host flipping from 0.0.0.0 to 127.0.0.1 will silently break any direct caller of gr00t_inference() that assumed bind-all and didn't pass an explicit host=. This is a behavioural default change, not a bugfix. The notes section mentions it in passing but doesn't surface migration guidance ("if your downstream connects from another host, pass host='0.0.0.0' explicitly").

  • # 🐳 emoji at line 1742 (the __main__ print). Outside the diff hunks so I can't comment inline, but this PR rewrote large sections of the file and the emoji was an opportunity to fix in passing. AGENTS.md > Review Learnings (#86) > Unicode & String Hygiene: "No emojis in user-facing strings."

  • policy_name validation surface looks asymmetric. It's guarded against --prefix but never validated against _-allowlist or length, despite being interpolated into response dicts that an LLM agent will read back. Probably fine in practice (it's just a tracking label) but worth a one-line _VALID_NAME_RE to keep the validation surface uniform.

Verification suggestions

# Make sure the validation catches the specific shapes the PR claims.
hatch run pytest tests/policies/groot/test_gr00t_inference_validation.py -v

# Confirm the regression-test for cross-port-kill actually fails on pre-fix code:
git show HEAD~1:strands_robots/tools/gr00t_inference.py > /tmp/pre.py
# (mentally) verify the pre-fix _PYTHON_PORT_RE_FMT lacked the `(?:\s|$)` boundary.

# Audit remaining bare-except clauses:
rg -n 'except Exception' strands_robots/tools/gr00t_inference.py

raise ValueError(f"Unknown protocol {protocol!r}. Valid: {list(valid_protocols)}")
# Port range — always validated. Type-check first so callers get ValueError, not TypeError.
if not isinstance(port, int):
raise ValueError(f"port must be an integer, got {type(port).__name__}: {port!r}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance(port, int) accepts bool (since bool is a subclass of int), so port=True validates as port 1 and port=False would fail the 1 <= port <= 65535 check with a confusing "got 0" message. Add or isinstance(port, bool) to reject explicitly:

if not isinstance(port, int) or isinstance(port, bool):
    raise ValueError(f"port must be an integer, got {type(port).__name__}: {port!r}")

Minor, but the PR's test suite is otherwise rigorous about edge cases — pin a regression test for port=True so the next refactor doesn't silently re-introduce the gap (AGENTS.md > Review Learnings (#85) > "Pin regression tests for reviewed fixes").

for param_name, param_value in [("repo_url", repo_url), ("repo_tag", repo_tag), ("policy_name", policy_name)]:
if param_value is not None and param_value.startswith("-"):
raise ValueError(f"{param_name} must not start with '-' (got {param_value!r})")
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _image_only_actions branch (lines 223-242) and the full-validation block below (lines 255-294) duplicate most of the same checks: container_name regex, image_name regex, volumes paths, container_command shell-meta, checkpoint_path traversal, and the --prefix option-injection guard for repo_url/repo_tag/policy_name. A future tweak to e.g. the volumes validation in one branch will silently drift from the other.

Factor the shared checks into a helper (_validate_image_kwargs(...)) and call it from both branches. Single source of truth was the explicit goal stated in the PR description ("validate_inputs() is the single source of truth") — the duplication inside that function undermines it.

return bool(re.search(_PYTHON_PORT_RE_FMT.format(port=port), cmdline))
return is_gr00t
except (OSError, subprocess.SubprocessError, UnicodeDecodeError) as exc:
import logging
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three lazy import logging calls (here, line 866, line 1129) inside exception handlers / hot paths. AGENTS.md > Key Conventions > "Imports at file top — unless lazy-loading heavy deps with documented reason." logging is stdlib and trivial; the lazy pattern only helps when an import has measurable cost.

Move import logging to the file-top imports and define a module-level _logger = logging.getLogger(__name__) once. The current code re-creates the logger object on every exception, which is wasteful and inconsistent with _start_service's import logging as _logging (line 1129) — three different idioms in one file.

# This is intentional: pgrep is constrained to ERE, and cmdlines are space-separated
# in practice (procps-ng converts NUL → space when reading /proc/*/cmdline).
# Matches both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server)
_PGREP_INFERENCE_PORT_FMT = r"(inference_service\.py|gr00t\.eval\.run_gr00t_server).*--port[= ]{port}( |$)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .* between the entrypoint group and --port is greedy and unanchored on the left. While the tightened cmdline filter in _is_gr00t_process (python|gr00t + --port requirement) provides defense-in-depth, the pgrep itself can match any process whose cmdline contains the substring inference_service.py followed anywhere later by --port N (e.g. a logfile-tailer process whose cmdline mentions both strings).

Consider \\b word boundaries or anchoring the entrypoint to the start of the python invocation. At minimum, add a regression test that asserts a benign process matching inference_service.py somewhere in its cmdline (e.g. tail -f /var/log/inference_service.py.log --port 5555) is correctly excluded by the secondary _is_gr00t_*_process filter — this is the kind of false-positive AGENTS.md > Review Learnings (#85) > "Pin regression tests for reviewed fixes" was written for.

…NGELOG categorization

Address review round-11 (yinsong1986 CHANGES_REQUESTED summary):

1. Narrow all remaining `except Exception` clauses:
   - _list_running_services: except Exception -> except OSError
   - _is_service_running: except Exception -> except OSError
   - _stop_service: except Exception -> except (OSError, subprocess.SubprocessError)
   - _start_service: except Exception -> except (OSError, RuntimeError)
   - _download_checkpoint: kept with # noqa: BLE001 (HF errors are opaque)

2. Remove emoji from __main__ print (line 1742: 🐳 removed)

3. CHANGELOG: move host default change from ### Fixed to ### Changed,
   add BREAKING flag and migration guidance per reviewer request.

4. Fix test_start_service_auto_flips_loopback: monkeypatch subprocess.run
   instead of subprocess.Popen (the code uses run, not Popen; the old
   except Exception masked the TypeError from the broken mock).
Copy link
Copy Markdown
Contributor

@yinsong1986 yinsong1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

PR centralises input validation for gr00t_inference into a single validate_inputs() entry-point, scopes it per-action (port-only vs image-only vs full mutating), and tightens process identification so N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server) are both matched. Default host flips from 0.0.0.0 to 127.0.0.1 (good — aligns with AGENTS.md § LLM Input Safety "Bind to 127.0.0.1 by default"), with a sentinel-based auto-flip to 0.0.0.0 for container actions to keep docker -p working. ~80 dedicated validation tests are added.

Overall direction matches the AGENTS.md "validate before subprocess interpolation / centralise validation in one function" guidance from PR #92's review-learnings, and the regression coverage is unusually thorough.

What's good

  • validate_inputs() is the canonical pattern requested by AGENTS.md § LLM Input Safety — single entry-point, independently testable, called once at the top of dispatch.
  • Loopback-by-default is correct, and the sentinel-based detection (_host_was_explicit = host is not None) correctly preserves user intent when they pass host="127.0.0.1" explicitly. Tests pin both branches.
  • Process identification now requires --port in cmdline + the inference entry-point, which prevents the well-known false-kill-the-editor footgun.
  • Cross-port-kill regression (port 8000 not killed when stopping port 80) has a dedicated test — matches AGENTS.md "pin every reviewed fix with a regression test".
  • Exception clauses are properly narrowed throughout (OSError, subprocess.SubprocessError, UnicodeDecodeError); only _download_checkpoint retains except Exception and that's annotated with # noqa: BLE001 and an explanation.

Concerns

  • Validation gap on hf_local_dir / source_dir / hf_repo — the PR description says these are "argv-safe, no injection risk", but hf_local_dir is interpolated into a docker -v {host}:{container} argv at line ~1498 with no reject_colon guard. volumes={} got that guard; hf_local_dir did not. See inline comment.
  • Allowlist duplication: _VALID_ACTIONS (validator) and the if/elif chain in gr00t_inference() are now two lists of the same thing. Adding a new action in only one place fails open or silently rejects — worth a single-source-of-truth refactor or a defensive test that asserts they match.
  • Scope discipline: changelog entry mixes "Added / Changed / Fixed" but several items are arguably refactors (factoring _PGREP_INFERENCE_PORT_FMT, narrowing exceptions). Not blocking, just makes the release-notes pass harder.
  • Breaking change documentation: the host default change from 0.0.0.0 to 127.0.0.1 is correctly flagged BREAKING in CHANGELOG, but the migration sketch ("pass host=\"0.0.0.0\" explicitly") could mention that lifecycle/start/restart auto-flip will mask the breakage for the common path; remote-bind users connecting from another host are the only group affected. Helpful to call out who is/isn't affected.

Verification suggestions

# Sanity-check the validator against representative LLM-controlled inputs:
python -c "
from strands_robots.tools.gr00t_inference import validate_inputs
import pytest
# image:tag@sha256 (combined pinning) - currently rejected, should pass:
try:
    validate_inputs(action='build_image', data_config='x', embodiment_tag='x', port=5555,
        host='127.0.0.1', vit_dtype='fp8', llm_dtype='nvfp4', dit_dtype='fp8',
        checkpoint_path=None, trt_engine_path='x', container_name=None, protocol='n1.5',
        image_name='nvcr.io/nvidia/gr00t:n1.7@sha256:' + 'a'*64)
    print('combined tag+digest: OK')
except ValueError as e:
    print('combined tag+digest: REJECTED ->', e)
"

# Confirm the host-fallback Linux gate works on macOS/BSD:
STRANDS_BASE_DIR=/tmp/sb python -c "
from strands_robots.tools.gr00t_inference import _stop_service
import sys; sys.platform = 'darwin'  # only meaningful on a real macOS box
print(_stop_service(5555))
"

# Round-trip the full happy path through a docker server (integration):
hatch run test-integ -k gr00t

if volumes is not None:
for vol_host, vol_container in volumes.items():
_validate_path(vol_host, "volumes key (host path)", reject_colon=True)
_validate_path(vol_container, "volumes value (container path)", reject_colon=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation gap: hf_local_dir (and source_dir) bypass the reject_colon guard that you correctly apply to volumes.

The PR description and the changelog assert these are "argv-safe, no injection risk". But at _start_container (line ~1498) hf_local_dir is interpolated into:

effective_volumes[str(Path(hf_local_dir).expanduser())] = "/data/checkpoints"
# ...
cmd.extend(["-v", f"{host_path}:{container_path}"])

If an LLM passes hf_local_dir="/foo:/etc:ro", the resulting -v /foo:/etc:ro:/data/checkpoints is parsed by docker -v as host=/foo container=/etc options=ro:/data/checkpoints, escalating to a read-only mount of the host's /etc. That's the exact attack you guarded against in volumes with reject_colon=True (line 232–233). Same surface, same defence needed.

Suggested fix: in the _image_only_actions branch (and the full-validation branch around line 277), add

if hf_local_dir is not None:
    _validate_path(hf_local_dir, "hf_local_dir", reject_colon=True)
if source_dir is not None:
    _validate_path(source_dir, "source_dir")

and thread hf_local_dir/source_dir through validate_inputs()'s signature. Per AGENTS.md > Review Learnings (PR #92) > LLM Input Safety: "Reject shell metacharacters in paths — also reject .. path traversal components. Apply even when using argv-style subprocess."

r"(?::[0-9]{1,5})?" # optional registry port (:5000)
r"(?:/[a-zA-Z0-9][a-zA-Z0-9._\-]*)*" # path components (/org/img)
r"(?::[a-zA-Z0-9][a-zA-Z0-9._\-]*" # option A: :tag
r"|@sha256:[a-f0-9]{64})?$" # option B: @sha256:digest (mutually exclusive with tag)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_DOCKER_IMAGE_RE rejects the combined name:tag@sha256:digest reference, which Docker accepts and is the recommended supply-chain pinning form.

Quick check:

>>> _DOCKER_IMAGE_RE.match("nvcr.io/nvidia/gr00t:n1.7@sha256:" + "a"*64)
None

The alternation (?::tag|@sha256:digest)? makes tag and digest mutually exclusive, but Docker's reference grammar is name[:tag][@digest] — you can pin both at once. The comment on line 61 even calls these "option A / option B (mutually exclusive)" which is a divergence from upstream Docker semantics, not a deliberate restriction.

This matters because the AGENTS.md review-learnings (PR #92) explicitly recommend digest pinning for supply-chain security; rejecting image:tag@sha256:... discourages the recommended form for users who also want a human-readable tag.

Suggested fix: make tag and digest both optional and additive:

r"(?::[a-zA-Z0-9][a-zA-Z0-9._\-]*)?"   # optional :tag
r"(?:@sha256:[a-f0-9]{64})?$"          # optional @digest

Also worth a regression test pinning the accepted form.

# service to bind all interfaces inside the container. Only auto-flip if the
# user accepted the default (sentinel was None → resolved to 127.0.0.1).
# Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host).
if host == "127.0.0.1" and not host_was_explicit:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-flip is hard-coded against the IPv4 loopback string and silently misbehaves for IPv6 loopback.

if host == "127.0.0.1" and not host_was_explicit:
    ...
    host = "0.0.0.0"

validate_inputs() accepts ::1 (IPv6 loopback) via ipaddress.ip_address(), but this branch only checks "127.0.0.1". So a user passing host="::1" bypasses the auto-flip, the service binds to ::1 inside the container, and docker -p {port}:{port} silently fails to forward traffic from outside — exactly the failure mode the auto-flip was added to prevent.

Suggested guard:

try:
    is_loopback = ipaddress.ip_address(host).is_loopback
except ValueError:
    is_loopback = False
if is_loopback and not host_was_explicit:
    ...
    host = "0.0.0.0"  # or "::" for IPv6 if the rest of the stack supports it

Minimum: add a regression test that calls _start_service(host="::1", host_was_explicit=False, ...) and asserts the warning fires — currently nothing covers this branch.

# user accepted the default (sentinel was None → resolved to 127.0.0.1).
# Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host).
if host == "127.0.0.1" and not host_was_explicit:
import logging as _logging
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated import logging inside function bodies violates AGENTS.md § Key Conventions #4 ("Imports at file top — unless lazy-loading heavy deps with documented reason").

Three sites in this PR do import logging (or import logging as _logging) inside a function body: lines 828, 866, and here. logging is stdlib and not heavy — there's no documented reason for the lazy import.

Suggested fix: move import logging to the module top with the other imports and define a single module-level logger = logging.getLogger(__name__). The three call sites then become logger.warning(...) / logger.debug(...) and you drop the per-call dictionary lookup on logging.getLogger.

# Complete allowlist of valid actions for the tool.
_VALID_ACTIONS = frozenset(
{
"find_containers",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_VALID_ACTIONS and the if/elif dispatch chain in gr00t_inference() are now two parallel allowlists that must be kept in sync.

The new # pragma: no cover line at line ~695 (return {"status": "error", "message": f"Unknown action: {action}"}) acknowledges this: the validator's allowlist makes the dispatcher's fallback unreachable. But the inverse failure mode is silent — a future PR adds "benchmark" to the dispatcher and forgets _VALID_ACTIONS, and the validator rejects it before dispatch with no test catching the regression.

Lightweight fix: add a unit test that walks the source AST (or a pytest parametrise over _VALID_ACTIONS) asserting that every name in _VALID_ACTIONS resolves to a non-error branch in gr00t_inference() for a stub-call, and vice-versa. Cheap pin against drift.

@cagataycali
Copy link
Copy Markdown
Member Author

Closing in favour of #196 (consolidated rewrite).

Why

This PR accumulated 31 commits and 91 unresolved review threads across 11 rounds, and #192 (camera-recording fix, merged into main mid-flight) was never picked up — leaving 802 lines of unrelated tests/simulation/ deletions in the diff. Continuing to iterate here forces every reviewer to walk a stale comment graph.

What replaces it

#196 is branched fresh from current upstream/main (which includes #192). All 48 real-open concerns from this PR were classified into 8 thematic commits; the security-critical concerns and the regex-correctness concerns are already in the new PR:

Round Theme Status in #196
R0 Rebase onto post-#192 main landed (ce9db15)
R1 Drop silent host=127.0.0.1 → 0.0.0.0 auto-flip landed (ecf5f0f)
R3 CHANGELOG ↔ description reconciliation landed (7762868)
R4 4 regex bugs (port>65535, FQDN trailing dot, length cap, IP typo) landed (621b3d8)
R2, R5–R8 validate_inputs widening, 3-branch dispatch, test fragility, portable pgrep, polish pending — will land as additional commits before #196 leaves draft

Diff comparison

this PR #196
Commits 31 4
Files touched 8 3 (only the in-scope files)
Behind upstream/main 1 commit (missing #192) 0
Touches tests/simulation/ YES (violates AGENTS.md) NO
Open review threads 91 0

The next PR is opened as a draft and will not request review until R2/R5/R6/R7/R8 land. Apologies to reviewers who walked the long road on this one — the consolidation is the right reset.


🤖 Closed by autonomous orchestrator on behalf of @cagataycali. The receiving rewrite is at #196.

@github-project-automation github-project-automation Bot moved this from In review to Done in Strands Labs - Robots May 23, 2026
cagataycali added a commit to cagataycali/robots that referenced this pull request May 23, 2026
…cting numeric tags (addresses thread on line 78)

The port-capture group in _DOCKER_IMAGE_RE greedily matched :digits
even without a following /path component, causing the port-range check
to reject valid name:tag refs where the tag was purely numeric (e.g.
myimage:0, myimage:65536, myimage:99999, gr00t:23456).

Fix: add (?=/) lookahead so :digits is only interpreted as a registry
port when followed by a path component. Without the lookahead the
ambiguous name:digits pattern is treated as name:tag (correct).

Also fixes CHANGELOG header to reference strands-labs#196 (this PR) instead of
the superseded strands-labs#90.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants