improve: stricter input validation in gr00t_inference#90
Conversation
awsarron
left a comment
There was a problem hiding this comment.
would be good to add tests for these cases. gr00t_inference doesn't have any test coverage at the moment
Status Check1 unresolved thread — but the fix was already implemented in commit @awsarron's nit ("extract validation into single function") was addressed: all validation logic is now in a dedicated CI: ✅ All checks passing @awsarron could you resolve the thread and re-review? This is a small, self-contained improvement. 🤖 Pipeline analysis by AI agent. Strands Agents. Feedback welcome! |
df78bde to
91f1dab
Compare
🔄 Re-review Request — All Feedback AddressedHi @awsarron @sundargthb — friendly ping on this one. ✅ Current State
⏰ Timing ConsiderationThis PR has been waiting since April 1 (~6 weeks). There are now 4 new PRs from @yinsong1986 (#149-152) in-flight as part of the N1.7 alignment work (issue #148). Two of them (#150, #152) touch the same Since PR #90 is a small, additive validation layer (+138/-8 net), merging it first would be cleanest — it avoids rebase churn in either direction and gives yinsong's larger refactors a validated foundation to build on. If yinsong's PRs land first, I'm happy to rebase, but the ordering-first approach saves everyone work. 🎯 AskCould you either:
Thanks! 🙏 🤖 Agent-assisted analysis. Strands Agents. |
53cc20c to
2140c24
Compare
|
6d09a45 to
bfc257d
Compare
2263518 to
bcb752d
Compare
|
@awsarron @sundargthb — friendly check-in. Both review threads are resolved, CI green, branch is clean on latest main. This is a small additive validation layer (+138/-8). Happy to address any remaining concerns if you have them. 🙏 |
bcb752d to
fd45f87
Compare
fd45f87 to
6624ba6
Compare
All pgrep -f patterns now use '( |$)' after the port number to prevent substring matches (e.g. port 80 matching a process on port 8000). Applies to both container-side (docker exec pgrep) and host-system fallback paths. The _is_gr00t_process/_is_gr00t_host_process verification was already correct, but the pgrep filter itself could return false positives that would be filtered downstream — this tightens the initial candidate set as defense-in-depth. Addresses review feedback from @sundargthb on PR strands-labs#90.
…rence
Improvements to the gr00t_inference tool:
1. Input validation for all user-supplied parameters:
- data_config and embodiment_tag validated against strict alphanumeric
patterns (they are enumerable values from the docstring).
- checkpoint_path and trt_engine_path reject shell metacharacters,
null bytes, and '..' traversal components.
- container_name validated against Docker naming rules.
- dtype values checked against explicit allowlists.
- Port range validated (1-65535).
2. Default host changed from 0.0.0.0 to 127.0.0.1 (loopback):
- Inference services should default to localhost-only binding.
- Users can still explicitly pass host='0.0.0.0' when network
access is needed.
3. Process verification for stop action:
- Added _is_gr00t_process() to verify a PID belongs to a GR00T
inference process before sending signals.
- Host-system fallback now uses pgrep -f with the inference_service
pattern instead of lsof (which matches any process on the port).
…, action-scoped over-validation, centralise dash-prefix guard Changes: - Add RFC 1035 §2.3.4 hostname total-length cap (253 chars) in validate_inputs() - Scope validation per action: build_image/download_checkpoint/start_container no longer validate data_config/embodiment_tag/dtypes they don't consume - Move option-injection dash-prefix guard into validate_inputs() (centralised) - Log at INFO when host auto-flips 127.0.0.1→0.0.0.0 for container actions (addresses silent-kwarg-override concern — user sees the rewrite) - Document _ALL_NUMERIC_RE broader reject set (IPv4 short-forms) - Document regex boundary divergence (ERE vs Python re) as intentional - Update validate_inputs() docstring with caller contract (must wrap ValueError) - Fix test_valid_all_interfaces: use '0.0.0.0' not '127.0.0.1' (copy-paste bug) - Fix test_read_only_action_accepts_any_data_config: use value that WOULD fail for action='start' (Has-Hyphens-And-Caps) to actually pin the skip behaviour All 81 validation tests pass, ruff + mypy clean.
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
Centralises gr00t_inference parameter validation in a new validate_inputs() helper, scoped per action (port-only, image-only, full). Tightens process identification to require --port plus a Python/gr00t entrypoint match (closing the N1.7 stop/status gap), narrows several exception clauses, adds an option-injection guard for repo_url/repo_tag/policy_name, hardens path-traversal handling, gates the host-system pgrep fallback on Linux, and changes the default host to 127.0.0.1 (with _start_service auto-flipping to 0.0.0.0 for the -p-publish docker case). Ships ~80 unit tests across 17 classes and a CHANGELOG entry. The validation surface and the AGENTS.md validate_inputs(...) pattern itself are well executed.
What's good
- Single-entry-point validation matches the AGENTS.md PR #92 review-learnings ("Centralise validation in one function").
- Strong test coverage: regression tests for the cross-port-kill bug, N1.7 cmdline detection, host-typo rejection, single-label numeric hostnames, option-injection, registry-port image refs, and integration tests that drive
gr00t_inference()end-to-end and assert validators wire through to{"status": "error"}. - Exception narrowing in
_is_gr00t_process/_is_gr00t_host_processplus structured logging at WARNING/DEBUG levels. - CHANGELOG entry documents both the broadened (RFC-952 hostnames) and stricter (multi-label all-numeric) host validation, plus the
hostdefault change.
Concerns
- The headline "loopback default" is silently undone on the only path that matters. Defaulting
host="127.0.0.1"is good, but_start_serviceunconditionally flips it back to"0.0.0.0"whenever the value equals"127.0.0.1"— there is no way for a user who explicitly passeshost="127.0.0.1"(e.g. running with--network=hostor a side-car proxy) to keep loopback. The inline comment claims "only auto-flip if the user did NOT explicitly pass host=", but the code can’t tell the difference. This violates AGENTS.md PR #86 “Reject silently-dropped kwargs” and partly negates the safety improvement the PR is selling. Inline below. container_nameis not validated forstart_container. The image-only branch invalidate_inputsskips the_CONTAINER_NAME_REcheck, but_start_containerinterpolates it intodocker run --name <name>. The full-validation branch does check it. The asymmetry is the kind of “silent drop” AGENTS.md flags. Inline below.lifecycle="teardown"is forced through the full-validation branch. Teardown only consumescontainer_name+remove_volumes, but it lands in thestart/restart/lifecyclebranch and getsdata_config/embodiment_tag/vit_dtype/llm_dtype/dit_dtype/trt_engine_pathvalidated. Defaults pass, but a staledata_config="FOO"from a prior call gives a confusing error for an action that doesn’t need it. There are also no tests forlifecycle="teardown"(or for any of the four image-only branches end-to-end).- Several
except Exception:clauses left untouched in_list_running_services(line 727),_is_service_running(line 739), and the outer_stop_service(line 950). The PR’s changelog claims “Exception clauses in_is_gr00t_process/_is_gr00t_host_processnarrowed fromexcept Exceptionto specific exception types,” which is true — but the surrounding functions in the same file still have the pattern AGENTS.md PR #86 explicitly forbids. Worth a follow-up if not in scope here. _DOCKER_IMAGE_REallows registry ports up to99999((?::[0-9]{1,5})?). 5-digit upper bound is wrong for TCP ports; not a security issue (argv-styledocker runwill reject), just an inconsistency with theport1–65535 check elsewhere invalidate_inputs().
Verification suggestions
- Manually call
gr00t_inference(action="start", host="127.0.0.1", …)against a real container and confirm whether the user’s explicit loopback choice survives or is silently flipped. Then runss -tlnp | grep <port>on the host — if it shows0.0.0.0:<port>afterhost="127.0.0.1", the silent override is confirmed. pytest tests/policies/groot/test_gr00t_inference_validation.py -vand check that the lifecycle="teardown" path has zero tests — confirming the gap.rg "except Exception" strands_robots/tools/gr00t_inference.pyto see how many catch-all clauses remain.
| # service to bind all interfaces inside the container. We only auto-flip if | ||
| # the user did NOT explicitly pass host= (i.e. they accepted the default). | ||
| # Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host). | ||
| if host == "127.0.0.1": |
There was a problem hiding this comment.
Silent override of explicit user input. The comment three lines up says "We only auto-flip if the user did NOT explicitly pass host=", but the code can’t actually tell the difference — it just checks if host == "127.0.0.1". A user who deliberately passes host="127.0.0.1" (e.g. running the container with --network=host and wanting loopback-only binding, or fronting it with a reverse proxy) will silently get bound to 0.0.0.0 instead, which is the opposite of what they asked for. This both contradicts the docstring/comment and violates AGENTS.md PR #86 “Reject silently-dropped kwargs”.
It also partly defeats the headline change in this PR: the host default became 127.0.0.1 for safety, but on the actual start/restart/lifecycle paths (the only ones that bind a port) it is unconditionally reverted to 0.0.0.0.
Suggested fix: make this opt-in. Either
- Add a sentinel default (
host: str | None = None) and only auto-flip whenhost is None, or - Drop the auto-flip and document that container actions require
host="0.0.0.0"— raiseValueErrorwith a clear message whenhost="127.0.0.1"is passed without--network=host.
Either way, the _start_service test test_start_service_auto_flips_loopback (asserting the silent flip) should be updated/removed accordingly.
| for param_name, param_value in [("repo_url", repo_url), ("repo_tag", repo_tag)]: | ||
| if param_value is not None and param_value.startswith("-"): | ||
| raise ValueError(f"{param_name} must not start with '-' (got {param_value!r})") | ||
| return |
There was a problem hiding this comment.
container_name is not validated on this branch. The image-only branch (covering build_image, download_checkpoint, start_container) returns here without running _CONTAINER_NAME_RE.match(container_name). But _start_container then interpolates container_name into docker run --name <name> (and _remove_container does docker rm -f <name>). The full-validation branch below at line ~229 does validate it for start/restart/lifecycle.
The asymmetry means gr00t_inference(action="start_container", container_name="--privileged") reaches docker argv unvalidated. argv-style is defence-in-depth here so this is not a direct shell injection, but it is exactly the AGENTS.md PR #92 “allowlist enumerable values” pattern — if the value is enumerable, validate it on every entry path, not just on the inference-mutating ones. Same rationale as the repo_url/repo_tag guard you already added on lines 212–214.
Fix: hoist the container_name check (lines 229–230) above the early return blocks so all mutating actions share it.
|
|
||
| # Container command — reject shell metacharacters | ||
| if container_command is not None and _SHELL_META.search(container_command): | ||
| raise ValueError(f"container_command contains disallowed characters: {container_command!r}") |
There was a problem hiding this comment.
lifecycle action lands in the full-validation branch even for phase="teardown". The dispatch path for action="lifecycle" reaches line 280 below and validates data_config, embodiment_tag, vit_dtype, llm_dtype, dit_dtype, trt_engine_path, etc. — all of which are unused by _lifecycle when phase="teardown" (which only needs container_name + remove_volumes).
Default values pass, so the happy path works, but a user retrying a teardown after a failed start (with data_config="FOO" from the previous prompt context) gets a confusing data_config must be lowercase alphanumeric/underscore error for an operation that doesn’t care. Consider gating on lifecycle == "teardown" and treating it the same as the port-only actions (validate container_name if provided, then return). There is also no test for lifecycle="teardown" in this PR — worth pinning, especially since the validate_inputs matrix is part of the contract.
| """Reject paths containing shell metacharacters, null bytes, or traversal sequences.""" | ||
| if "\x00" in value: | ||
| raise ValueError(f"{label} must not contain null bytes") | ||
| if any(part == ".." for part in re.split(r"[/\\]", value)): |
There was a problem hiding this comment.
Registry port allows up to 5 digits (99999). (?::[0-9]{1,5})? accepts values that exceed the TCP max of 65535. Not a security concern (argv-style docker run will fail downstream), but it’s inconsistent with the 1 <= port <= 65535 check at line 234 and undermines the “centralised validation” framing of this PR. A tightened bound (or even reusing the same int range check after a parse) keeps the contract consistent.
Minor; safe to defer.
| # This is intentional: pgrep is constrained to ERE, and cmdlines are space-separated | ||
| # in practice (procps-ng converts NUL → space when reading /proc/*/cmdline). | ||
| # Matches both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server) | ||
| _PGREP_INFERENCE_PORT_FMT = "(inference_service\\.py|gr00t\\.eval\\.run_gr00t_server).*--port[= ]{port}( |$)" |
There was a problem hiding this comment.
_PGREP_INFERENCE_PORT_FMT greedy-matches across cmdline. The pattern (inference_service\.py|gr00t\.eval\.run_gr00t_server).*--port[= ]{port}( |$) uses a greedy .*, which means a process whose cmdline contains both inference_service.py and a later --port 5555 argument from a child / wrapper script will match — including, for example, bash -c 'echo inference_service.py; python other.py --port 5555'. The Python-side _is_gr00t_*_process checks compensate by also requiring python/gr00t and verifying via _PYTHON_PORT_RE_FMT, so the actual kill path is safe — but the pgrep filter alone is broader than the comment implies.
Worth either (a) tightening the regex (e.g. anchoring with [^ ]* instead of .*) or (b) updating the comment to call out that pgrep is a coarse pre-filter and the authoritative match is _is_gr00t_*_process. The current tests don’t cover the greedy-match false-positive case.
Trivial cleanup — the validation section comment was duplicated on consecutive lines. All 81 validation tests + 1538 broader tests pass. ruff + mypy clean.
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
Lands a centralised validate_inputs() for gr00t_inference, tightens pgrep/process-identification to handle both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server) entrypoints, switches the default host from 0.0.0.0 to 127.0.0.1 (per AGENTS.md § LLM Input Safety), and adds 80+ validation tests. The single-entry-point validator is exactly the pattern AGENTS.md § Review Learnings (#92) prescribes.
What's good
- Single
validate_inputs()entry point, action-scoped, independently testable — matches the canonical pattern from PR #92. - Option-injection guard (leading
-rejection) forrepo_url/repo_tag/policy_name. - Default
host="127.0.0.1"(loopback-only) — satisfies AGENTS.md § LLM Input Safety “Bind to 127.0.0.1 by default”. - Process-probe helpers (
_is_gr00t_process/_is_gr00t_host_process) require--portin cmdline + python/gr00t marker, which prevents the editor/log-tailer false-match cited in the PR description. - Narrow
except (OSError, subprocess.SubprocessError, UnicodeDecodeError)clauses withPermissionErrorre-routed to WARNING. Good fit for AGENTS.md § Review Learnings (#86) “Exception Clauses Must Be Narrow”. - Comprehensive test surface (17 classes, 80+ tests) covering allowlists, traversal, null bytes, option-injection, and the cross-port-kill regression.
Concerns
- Auto-flip semantics in
_start_servicecontradict their own docstring (see inline). The comment claims it only flips when the user accepted the default, but the implementation can’t distinguish default from explicit since both arrive as the literal"127.0.0.1". As written, a user explicitly opting into loopback binding (e.g. with--network=host) silently gets0.0.0.0instead. The PR description repeats the same incorrect claim. Either the code needs a sentinel default (host: str | None = None) so explicit"127.0.0.1"can be honored, or the docstring/comment needs to be amended to say “this always flips on127.0.0.1”. - No regression test pins the “honor explicit
127.0.0.1” behavior the docstring promises. AGENTS.md § Review Learnings (#85) “Pin regression tests for reviewed fixes” — without this, the contradiction at_start_serviceships silently. host="localhost"passes the hostname allowlist but is not auto-flipped, so on a container path the bind ends up on127.0.0.1inside the container and Docker’s-ppublish never sees a connection. Either flip on a_loopback_aliasesset or document explicitly thatlocalhostis unsafe for container actions.- Validation scope drift for
lifecycle:validate_inputs(action="lifecycle", …)falls through to the full inference-mutating branch and rejects e.g. a non-canonicaldata_configeven when the user only invokes phase="setup" (build_image + download_checkpoint + start_container). Not blocking, but worth aphasekwarg in the validator (or carving lifecycle into its own scope). - Bare
except Exception as e:at the bottom of_stop_service(line 949 — outside this PR’s hunks, so not flagged inline) is still in violation of AGENTS.md § Review Learnings (#86). Worth narrowing in a follow-up.
Verification suggestions
hatch run test tests/policies/groot/test_gr00t_inference_validation.py -v
hatch run lint
# Sanity-check the auto-flip:
python -c "from strands_robots.tools.gr00t_inference import _start_service; help(_start_service)"
# Inspect _PGREP_INFERENCE_PORT_FMT against a real cmdline (Linux only):
python -c "import re, strands_robots.tools.gr00t_inference as g; pat=g._PGREP_INFERENCE_PORT_FMT.format(port=5555); print(pat); print(bool(re.search(pat.replace('( |$)', '(?: |$)'), 'python /opt/Isaac-GR00T/scripts/inference_service.py --port 5555 --host 0.0.0.0')))"| # service to bind all interfaces inside the container. We only auto-flip if | ||
| # the user did NOT explicitly pass host= (i.e. they accepted the default). | ||
| # Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host). | ||
| if host == "127.0.0.1": |
There was a problem hiding this comment.
Logic does not match the comment / docstring / PR description.
The comment above says: “We only auto-flip if the user did NOT explicitly pass host= (i.e. they accepted the default). Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host).”
But the code is if host == "127.0.0.1": — there is no way for _start_service to distinguish the default "127.0.0.1" from a user explicitly passing "127.0.0.1" (e.g. for --network=host). Both arrive as the literal string. So an explicit host="127.0.0.1" is silently overridden to 0.0.0.0, which is the opposite of what the comment promises.
Also note host="localhost" (a loopback alias the validator accepts) is not auto-flipped, so container actions with host="localhost" will publish a port that no client can reach (bind ends up on the container’s loopback only).
Fix options:
- Use a sentinel default (
host: str = "127.0.0.1"ingr00t_inference(), but propagate ashost: str | None = Noneinto_start_service, and only auto-flip whenhost is None). This is the only way to actually honor an explicit"127.0.0.1". - Otherwise, fix the comment/docstring/PR description to say “this function always rewrites loopback (
127.0.0.1andlocalhost) to0.0.0.0, because Docker-ppublish requires bind-all” — and document that--network=hostusers must passhost="0.0.0.0"explicitly.
AGENTS.md § Review Learnings (#85) “Match docstrings to semantics” covers exactly this case.
| raise ValueError(f"{label} must not contain null bytes") | ||
| if any(part == ".." for part in re.split(r"[/\\]", value)): | ||
| raise ValueError(f"{label} must not contain '..' path traversal components") | ||
| if _SHELL_META.search(value): |
There was a problem hiding this comment.
Option-injection guard missing for paths.
_validate_path rejects shell metacharacters and .. traversal but does not reject a leading -. The full-branch validator below (lines 261–267) explicitly adds an option-injection guard for repo_url / repo_tag / policy_name because they flow into subprocess argv. Volume keys/values flow into docker -v <key>:<value> argv the same way (an attacker-controlled volumes={"--privileged=foo": "/bar"} is plausible). For consistency with the guard you added below, reject leading - here too:
if value.startswith("-"):
raise ValueError(f"{label} must not start with '-' (got {value!r})")AGENTS.md § Review Learnings (#92) “Reject shell metacharacters in paths” — same rationale; defense-in-depth even with argv-style subprocess.
| _HOSTNAME_RE = re.compile( | ||
| r"^[a-zA-Z0-9](?:[a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?" | ||
| r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?)*$" | ||
| ) |
There was a problem hiding this comment.
Minor RFC compliance: trailing-dot FQDNs are rejected.
The regex requires every label, including the last, to end with [a-zA-Z0-9], so absolute FQDNs ending with . (which RFC 1035 §3.1 specifies as the canonical way to mark a name as fully qualified) fail validation — e.g. host.docker.internal. is rejected even though host.docker.internal passes.
Not a blocker (Docker resolves both forms), but the PR description explicitly cites RFC-952 compliance as the goal. Either allow an optional trailing dot:
r"^[a-zA-Z0-9](?:[a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?"
r"(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?)*\.?$"or add a comment that trailing-dot FQDNs are intentionally rejected.
| # This is intentional: pgrep is constrained to ERE, and cmdlines are space-separated | ||
| # in practice (procps-ng converts NUL → space when reading /proc/*/cmdline). | ||
| # Matches both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server) | ||
| _PGREP_INFERENCE_PORT_FMT = "(inference_service\\.py|gr00t\\.eval\\.run_gr00t_server).*--port[= ]{port}( |$)" |
There was a problem hiding this comment.
Two regex sources for the “matches --port N” intent are easy to drift out of sync.
Keeping _PGREP_INFERENCE_PORT_FMT (ERE, ( |$)) and _PYTHON_PORT_RE_FMT (Python re, (?:\s|$)) as separate format strings is intentional per the comment, but the two are used in pairs (pgrep finds candidate PIDs; _is_gr00t_*_process then re-validates with the Python pattern) and any future tweak — e.g. supporting --port=N with a different separator, or adding --port-range — has to be made in both places. A single regex builder fed by a flavor enum would prevent that drift, e.g.:
def _port_pattern(port: int, *, ere: bool) -> str:
boundary = "( |$)" if ere else r"(?:\s|$)"
return rf"--port[= ]{port}{boundary}"Not a blocker, but the AGENTS.md § Review Learnings (#92) “single entry-point” / “centralise validation” principle applies to regex assembly too.
| http_server=False, | ||
| api_token=None, | ||
| ) | ||
| assert captured_host.get("host") == "0.0.0.0" |
There was a problem hiding this comment.
Missing regression test for the docstring-promised behavior.
This test pins the flip on default path. The companion docstring on _start_service (gr00t_inference.py:1085-1088) promises that explicit host="127.0.0.1" is honored, but no test asserts that. As noted in the inline on gr00t_inference.py:1089, the current implementation can’t actually distinguish default from explicit — so the docstring is wrong, but a test that would fail on the wrong implementation is the only thing that pins the contract.
Add e.g.:
def test_start_service_honors_explicit_loopback(self, monkeypatch):
"""Explicit host='127.0.0.1' (e.g. for --network=host) must NOT be flipped."""
# …same fixture…
_start_service(…, host="127.0.0.1", explicit_loopback=True) # or via sentinel
assert captured_host["host"] == "127.0.0.1"This would have caught the contradiction at review time. AGENTS.md § Review Learnings (#85) “Pin regression tests for reviewed fixes — every behavioural fix gets a test that fails on pre-fix code”.
Address key review concern: the code could not tell if host='127.0.0.1' was the default or an explicit user choice. Now uses None sentinel: - host: str | None = None (signature default) - _host_was_explicit = host is not None (before resolving to 127.0.0.1) - _start_service only auto-flips to 0.0.0.0 when host_was_explicit=False Users who explicitly pass host='127.0.0.1' (e.g. for --network=host Docker deployments) now have their choice honoured. Tests: TestHostAutoFlipSentinel (3 tests) pins the sentinel behavior.
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
This PR centralises input validation for gr00t_inference into a single validate_inputs() entry-point with action-aware scoping, tightens process identification to match both N1.5/N1.6 and N1.7 entrypoints, narrows several except clauses, and changes the default host to 127.0.0.1 (with an internal auto-flip for container actions). The validator design follows the AGENTS.md "Centralise validation in one function" guidance, and the test surface is substantial (1404 lines, 17+ classes covering the documented branches).
What's good
validate_inputs()matches the AGENTS.md > LLM Input Safety pattern (single entry-point, kwargs-only, raisesValueError, called from atry/exceptin the tool wrapper).- Action-scoped validation avoids bothering port-only callers with full-surface checks.
- Sentinel-based
_host_was_explicitcorrectly distinguishes "user accepted default" from "user explicitly chose loopback" so users with--network=hostcan still opt into 127.0.0.1. - Pgrep pattern factored into
_PGREP_INFERENCE_PORT_FMT(single source of truth) and now requires--portin cmdline + matches both N1.5/N1.6 and N1.7 entrypoints — closes the long-standing identification gap. - Option-injection guard on
repo_url/repo_tag/policy_name(rejects leading-) is the right belt-and-braces against argv flag-injection. - Comprehensive CHANGELOG entry covering Added / Changed / Fixed / Notes — easy to audit.
Concerns
host_was_explicitis not threaded through therestartaction.startforwards it (line 659) butrestartdoes not (line 685), so a user who callsgr00t_inference(action='restart', host='127.0.0.1', ...)gets silently auto-flipped to0.0.0.0despite explicit intent. See inline.- Test coverage gap: per AGENTS.md > Review Learnings #85 "Pin regression tests for reviewed fixes", the explicit-loopback behaviour is pinned only for
start(test_explicit_loopback_passes_explicit_flag). Therestartpath has no equivalent test, which is exactly why the bug above slipped through. _DOCKER_IMAGE_RErejects digest-pinned references (image@sha256:...). The_image_existsdocstring at the head version line 1198-1206 still claims "works for tag-less digest pins too". Either the regex needs to allow@, or the docstring needs updating — they currently disagree.volumesvalidation does not reject:in paths._SHELL_METAis narrowed to[;&|$<>\\n\r\x00]but the host/container path is interpolated asf"{host_path}:{container_path}"and passed todocker -v, which treats:` as a field separator. See inline.- Minor: the dispatch chain ends with
else: return {"status": "error", "message": f"Unknown action: {action}"}(line 687), but the new_VALID_ACTIONSallowlist already rejects unknown actions invalidate_inputs(). Theelseis now dead code; consider deleting or replacing withassert False, "unreachable".
Verification suggestions
# Confirm the restart auto-flip bug end-to-end:
pytest tests/policies/groot/test_gr00t_inference_validation.py -k 'explicit_loopback' -v
# (Currently only covers action="start"; add an action="restart" variant.)
# Spot-check the volume-mount injection vector by hand:
python -c "from strands_robots.tools.gr00t_inference import validate_inputs; \
validate_inputs(action='start_container', data_config='so100', embodiment_tag='so100', \
port=5555, host='127.0.0.1', vit_dtype='fp8', llm_dtype='nvfp4', dit_dtype='fp8', \
checkpoint_path=None, trt_engine_path='x', container_name=None, protocol='n1.5', \
volumes={'/host:rw,nosuid': '/cont'})"
# Currently passes; should arguably reject.
# Confirm digest-pinned images still validate against _DOCKER_IMAGE_RE:
python -c "import re; from strands_robots.tools.gr00t_inference import _DOCKER_IMAGE_RE; \
print(bool(_DOCKER_IMAGE_RE.match('nvcr.io/nvidia/isaac-gr00t@sha256:abc123')))"
# Currently False.| api_token=api_token, | ||
| protocol=protocol, | ||
| use_sim_policy_wrapper=use_sim_policy_wrapper, | ||
| host_was_explicit=_host_was_explicit, |
There was a problem hiding this comment.
host_was_explicit is forwarded here on the start path but dropped on the restart path (the _start_service(...) call further down at ~line 667-686 omits this kwarg).
Because _start_service defaults host_was_explicit=False, a user who calls
gr00t_inference(action="restart", host="127.0.0.1", checkpoint_path=...)will have their explicit loopback choice silently flipped to 0.0.0.0 by the auto-flip inside _start_service. This is exactly the silent kwarg-drop pattern called out in AGENTS.md > Review Learnings (#86) > "Public API Hygiene" ("Reject silently-dropped kwargs").
Fix: forward host_was_explicit=_host_was_explicit in the restart _start_service(...) call too, and add a regression test to tests/policies/groot/test_gr00t_inference_validation.py mirroring test_explicit_loopback_passes_explicit_flag for action="restart" (per AGENTS.md > Review Learnings #85 > "Pin regression tests for reviewed fixes").
| if volumes is not None: | ||
| for vol_host, vol_container in volumes.items(): | ||
| _validate_path(vol_host, "volumes key (host path)") | ||
| _validate_path(vol_container, "volumes value (container path)") |
There was a problem hiding this comment.
_SHELL_META does not include :, but volume paths are interpolated as f"{host_path}:{container_path}" and handed to docker -v.
Docker parses -v as host:container[:options]. A host_path value like /legit/dir:rw,nosuid (passing this validator) would be re-interpreted by docker as host=/legit/dir, container=rw,nosuid, options=/cont — a real mount-redirect surface against an LLM-controllable parameter, which is the threat AGENTS.md > LLM Input Safety is meant to prevent.
Suggested fix: add : to _SHELL_META for the volumes path validation specifically, or reject any : inside vol_host / vol_container directly here. (Argv-style does not save you — docker itself does the splitting.)
A pinned regression test for volumes={"/host:rw,nosuid": "/cont"} raising ValueError would close this out.
| r"(?::[0-9]{1,5})?" # optional registry port (:5000) | ||
| r"(?:/[a-zA-Z0-9][a-zA-Z0-9._\-]*)*" # path components (/org/img) | ||
| r"(?::[a-zA-Z0-9][a-zA-Z0-9._\-]*)?$" # optional :tag | ||
| ) |
There was a problem hiding this comment.
_DOCKER_IMAGE_RE rejects digest-pinned image references (registry/path@sha256:hex).
The head version's _image_exists docstring (around line 1198) still claims "works for tag-less digest pins too", and digest-pinned images are the supply-chain-recommended way to refer to images. The regex only has an optional :tag suffix, no @digest alternative.
Either extend the regex with (?:@sha256:[a-f0-9]{64})? (mutually exclusive with :tag) or update the docstring to say tag-only is supported. Right now they disagree.
| f"host must be a valid IP address or hostname (got {host!r}). " | ||
| f"Use '127.0.0.1' for loopback, '0.0.0.0' for all interfaces, " | ||
| f"or a valid hostname like 'localhost'." | ||
| ) from None |
There was a problem hiding this comment.
Bare raise ... from None swallows the original ipaddress.ip_address traceback context.
For a user-input validator this is mostly fine, but consider preserving the original error chain (from exc) in case downstream consumers want to log why parsing failed ('127.0.01' is not a valid octet). It also matches the rest of the codebase's preference for chained exceptions.
Not blocking; flagging because the comment above ("clearly IP typos not legitimate hostnames") is exactly the kind of context that's helpful in a traceback.
| policy_name=policy_name, | ||
| ) | ||
| except ValueError as e: | ||
| return {"status": "error", "message": str(e)} |
There was a problem hiding this comment.
validate_inputs() may raise more than ValueError.
The wrapper here only catches ValueError. If a caller (or future maintainer) accidentally passes the wrong type — e.g. port="5555" (str) — the if not (1 <= port <= 65535) comparison raises TypeError and bypasses this try/except, propagating up out of the AgentTool boundary. The AGENTS.md > LLM Input Safety contract says all @tool action handlers must return {"status": "error", ...} on failure, never raise.
Either widen to except (ValueError, TypeError) here or normalise types with explicit int(port) / str(host) coercions inside validate_inputs() (with their own ValueError on conversion failure).
There was a problem hiding this comment.
Fixed in 6cbff72 — explicit isinstance checks added for port (int) and host (str) inside validate_inputs(). These raise ValueError with descriptive messages. The dispatch wrapper narrowed to except ValueError as e: only.
🤖 AI agent response. Strands Agents.
… guard, digest refs, TypeError catch Address 5 concerns from @yinsong1986's 21:44 UTC review: 1. **restart path missing host_was_explicit** — the restart _start_service() call was missing host_was_explicit=_host_was_explicit, causing explicit host='127.0.0.1' to be silently auto-flipped on restart. Fixed. 2. **Volume path colon injection** — docker -v parses ':' as separator; a volume key like '/legit/dir:rw,nosuid' would be re-interpreted as a mount redirect. Added reject_colon=True to _validate_path for all volume path validations. Also added dash-prefix guard in _validate_path (defense-in-depth for '--privileged=foo' style option injection). 3. **Digest-pinned image references** — _DOCKER_IMAGE_RE now accepts @sha256:<64-hex> as an alternative to :tag (mutually exclusive). Supply-chain recommended practice for image pinning. 4. **Exception chain preserved** — 'from None' changed to 'from exc' in the host validation ipaddress.ip_address try/except. Also expanded the comment to document that _ALL_NUMERIC_RE rejects IPv4 short-forms (not just typos). 5. **TypeError handling** — widened except ValueError to except (ValueError, TypeError) at the validation wrapper boundary. port='5555' (str) no longer propagates as unhandled TypeError. Tests: TestReviewRound8Fixes (8 regression tests) pins all 5 fixes. All 92 validation + 145 gr00t tests pass. Syntax verified.
Addressed: Review round-8 (21:44 UTC) —
|
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
Centralises gr00t_inference parameter validation in a single validate_inputs() helper with action-scoped checks (port-only / image-only / full), tightens process identification (requires --port flag + N1.7 entry-point match), changes the default host to 127.0.0.1 with auto-flip to 0.0.0.0 for container actions, narrows previously broad except Exception clauses, and adds an option-injection guard for repo_url / repo_tag / policy_name. Test surface grows by ~1.7k lines / 80+ tests covering both the validator and integration wiring.
The direction matches AGENTS.md > Review Learnings (#92) > LLM Input Safety (single validate_inputs() entry-point, allowlists for enumerable params, shell-meta denylist on argv-style values, default loopback bind). The behavioural change of the host default is non-trivial and is documented in CHANGELOG.md.
What's good
- Validation is genuinely centralised —
gr00t_inference()callsvalidate_inputs()once and the previous hand-rolled protocol check has been removed (no drift). - The host default flip from
0.0.0.0to127.0.0.1matches the AGENTS.md baseline; thehost_was_explicitsentinel preserves user intent and is regression-tested for therestartpath (round-8 test class). - Process identification now requires
--portin cmdline AND a python/gr00t interpreter AND the inference entry-point — meaningfully reduces the false-kill surface vs. the old single-patternpgrep. except (OSError, subprocess.SubprocessError, UnicodeDecodeError)is a correct narrow superset;PermissionErroris reported at WARNING. Matches AGENTS.md > Review Learnings (#86) > 'Exception Clauses Must Be Narrow'.- Digest-pinned image references (
@sha256:...) are accepted, which aligns with the supply-chain recommendation in #92.
Concerns
- Test file size:
tests/policies/groot/test_gr00t_inference_validation.pyis 1682 lines / 80+ tests in one module. Several round-N regression classes (TestReviewRound8Fixes, etc.) suggest organic growth across iterations. Worth splitting intotest_validate_inputs.py(pure unit) +test_gr00t_inference_dispatch.py(integration / monkeypatched subprocess) before this lands, otherwise the next reviewer pays the full re-read cost on every touch. _DOCKER_IMAGE_REover-permissive on registry port:(?::[0-9]{1,5})?accepts:99999as a registry port, which is outside the valid TCP range (1–65535). Flowing intodocker pullargv means Docker rejects it later, so it's not a security concern — just a 'reviewers will ask' nit.- Two regex format strings, two boundary semantics:
_PGREP_INFERENCE_PORT_FMTuses( |$)(ERE for pgrep) while_PYTHON_PORT_RE_FMTuses(?:\s|$)(Python re). The inline NOTE explains why; just call out that these are intentionally different in the next test that exercises both, so a future refactor doesn't unify them and break BSD/Linux divergence. policy_nameoption-injection check only in the full-validation branch (line 280), not in the image-only branch (line 231). In practicepolicy_namedoesn't reach git/docker argv from the image-only actions, but the asymmetry is a maintenance hazard ifstart_containerever forwards it. See inline.- Unreachable dead code at line 707–708:
validate_inputs()rejects unknown actions earlier, so the trailingelsebranch in the dispatcher is now dead. AGENTS.md > Key Conventions > 'No dead code' applies. See inline.
Verification suggestions
# 1. Confirm validate_inputs is the only validation entry-point (no leftover hand-rolled checks):
rg -n "protocol not in|raise ValueError|valid_protocols" strands_robots/tools/gr00t_inference.py
# 2. Run the new test module in isolation and check timing:
hatch run test tests/policies/groot/test_gr00t_inference_validation.py -v
# 3. Confirm the host default flip is end-to-end correct by smoke-testing a real `start`:
# Without explicit host= → inside container should bind 0.0.0.0
# With host='127.0.0.1' → inside container should bind 127.0.0.1 (warns about reachability)
# 4. Confirm CodeQL has no new findings on the gr00t_inference module after this PR (Security tab).| host_was_explicit=_host_was_explicit, | ||
| ) | ||
| else: | ||
| return {"status": "error", "message": f"Unknown action: {action}"} |
There was a problem hiding this comment.
Dead code after centralised action validation.
validate_inputs() (line 180–181) now rejects any action not in _VALID_ACTIONS with a clear ValueError, which the wrapper at line 576–577 converts to {"status": "error", ...}. By the time control reaches this else, action is provably one of the 10 allowlisted values, all of which are handled in the preceding if/elif chain. This branch is unreachable.
Delete it (and the else: keyword on line 707, promoting the last elif to if/elif) per AGENTS.md > Key Conventions > 'No dead code — if it's not called and not part of base class, delete it'.
There was a problem hiding this comment.
Fixed in 6cbff72 — removed dead else branch. Replaced with a defensive return at the same indent level (satisfies mypy [return]) with # pragma: no cover since it's unreachable after validate_inputs() rejects unknown actions.
🤖 AI agent response. Strands Agents.
| ("policy_name", policy_name), | ||
| ]: | ||
| if param_value is not None and param_value.startswith("-"): | ||
| raise ValueError(f"{param_name} must not start with '-' (got {param_value!r})") |
There was a problem hiding this comment.
Asymmetric option-injection guard.
The full-validation branch (this block) checks repo_url, repo_tag, and policy_name for a leading -. The image-only branch (line 231–233) only checks repo_url and repo_tag. If start_container or any future image-only action ever forwards policy_name into argv (e.g. into a service registration step), it would bypass the dash-prefix guard.
Either (a) include policy_name in the image-only list for symmetry, or (b) factor the three-tuple into a module-level _OPTION_INJECTION_PARAMS constant and iterate it in both branches with a comment documenting the carve-out. The duplicated literal list is the bug-prone shape.
| if protocol not in valid_protocols: | ||
| raise ValueError(f"Unknown protocol {protocol!r}. Valid: {list(valid_protocols)}") | ||
| # Port range — always validated | ||
| if not (1 <= port <= 65535): |
There was a problem hiding this comment.
Docstring promises ValueError, but int ops on a str raise TypeError.
validate_inputs()'s docstring (line 165) says "Raises ValueError for any invalid input." If a caller passes port="5555" (str), this comparison raises TypeError: '<=' not supported between instances of 'int' and 'str' — not ValueError. The wrapper at line 576 happens to catch (ValueError, TypeError), so the user sees the right structured error, but the validator's contract is wrong.
Fix one of: (a) tighten the wrapper to except ValueError and add explicit isinstance(port, int) (and similar for host: str, etc.) checks at the top of validate_inputs(), or (b) update the docstring to Raises ValueError or TypeError. Option (a) is more defensible — it keeps the contract narrow and gives clearer error messages to agent callers.
There was a problem hiding this comment.
Fixed in 6cbff72 — chose option (a): added explicit isinstance(port, int) and isinstance(host, str) checks at the top of validate_inputs() that raise proper ValueError. The wrapper is narrowed to except ValueError as e: only. TypeError from programming errors now propagates as-is.
🤖 AI agent response. Strands Agents.
| # This is intentional: pgrep is constrained to ERE, and cmdlines are space-separated | ||
| # in practice (procps-ng converts NUL → space when reading /proc/*/cmdline). | ||
| # Matches both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server) | ||
| _PGREP_INFERENCE_PORT_FMT = "(inference_service\\.py|gr00t\\.eval\\.run_gr00t_server).*--port[= ]{port}( |$)" |
There was a problem hiding this comment.
_PGREP_INFERENCE_PORT_FMT uses .format(port=...) without re.escape.
Current callers always pass an int that's already validated to 1 <= port <= 65535, so this is safe today. But the template lives at module scope and _is_gr00t_process / _is_gr00t_host_process accept port: int | None = None — a future refactor that loosens the type to str (or a test that monkey-patches the call) would silently inject the value into a regex. ERE has fewer metachars than PCRE, but . and + and * would all do the wrong thing in a port slot.
Either (a) add an assert isinstance(port, int) and 1 <= port <= 65535 at each call site that does the .format(port=port), or (b) wrap with re.escape(str(port)) for defence-in-depth. AGENTS.md > Review Learnings (#92) > LLM Input Safety > 'Validate before subprocess interpolation' covers this exact pattern.
| if checkpoint_path is not None: | ||
| _validate_path(checkpoint_path, "checkpoint_path") | ||
| # Option-injection guard for params used by these actions | ||
| for param_name, param_value in [("repo_url", repo_url), ("repo_tag", repo_tag)]: |
There was a problem hiding this comment.
The two-line for-loop here is identical in shape to the three-line version at 280–286.
This is a small DRY nit, but with the asymmetric param list (see comment on line 286) it'd be cleaner to extract:
_OPTION_INJECTION_GUARDED = ("repo_url", "repo_tag", "policy_name")
def _check_option_injection(values: dict[str, str | None], allowed: tuple[str, ...]) -> None:
for name in allowed:
v = values.get(name)
if v is not None and v.startswith("-"):
raise ValueError(f"{name} must not start with '-' (got {v!r})")Then the image-only branch calls it with ("repo_url", "repo_tag") and the full branch with the full triple — single source of truth, asymmetry is loud rather than implicit.
CI was failing due to F811 (redefined class). The TestReviewRound8Fixes class was duplicated verbatim at lines 1407 and 1546. Removed the first copy (which used Unicode arrows/dashes in comments), keeping the second (ASCII-only comments, slightly tighter assertion for digest test). Also applied ruff format to gr00t_inference.py (minor f-string wrap). All ruff check + ruff format --check pass.
f1ca963 to
9def51f
Compare
CI Fix (commit
|
| Concern | Status |
|---|---|
host_was_explicit not threaded through restart |
✅ Fixed — restart now passes host_was_explicit=_host_was_explicit |
| Test coverage gap for restart path | ✅ test_restart_forwards_host_was_explicit + test_restart_default_host_not_explicit |
_DOCKER_IMAGE_RE rejects digest-pinned refs |
✅ Regex now accepts @sha256:<64-hex> (mutually exclusive with :tag) |
volumes validation doesn't reject : in paths |
✅ reject_colon=True in _validate_path for all volume paths |
Dead else branch |
Acknowledged — kept as defense-in-depth (_VALID_ACTIONS could drift from dispatch). Can remove if preferred. |
The CI failure was purely from the accidental class duplication in the test file — the substantive fixes from a9556807 are all correct.
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
This PR centralises input validation for gr00t_inference into a single validate_inputs() entry-point, expands the validated parameter surface (host, image_name, volumes, container_command, dtype allowlists, container name, paths), tightens process identification to require both an inference_service.py/gr00t.eval.run_gr00t_server marker AND a --port flag (closing N1.7 stop/status gaps), narrows over-broad except Exception clauses, and adds 80+ unit tests plus integration wiring tests. The design follows AGENTS.md § LLM Input Safety closely: single validator function, allowlists for enumerable values, regex-based path/shell-meta rejection, option-injection guard for repo_url/repo_tag/policy_name.
What's good
- Single
validate_inputs()entry-point matches the AGENTS.md "Centralise validation" pattern. Action-scoped validation keeps the surface tight. - Tightened process identification (
--port+ module marker + python/gr00t binary) directly addresses the "don't kill editors/log-tailers" review-learning. _PGREP_INFERENCE_PORT_FMTfactored into a single constant; both N1.5/N1.6 and N1.7 entry-points covered.- Narrowed
except Exceptionto(OSError, subprocess.SubprocessError, UnicodeDecodeError), withPermissionErrordifferentiated (WARNING vs DEBUG). - Test coverage is substantial (1543-line test file, 80+ cases across happy/error paths and N1.7 regression scenarios).
- Linux-only host fallback now returns a clear error instead of silently no-op'ing on macOS/Windows.
Concerns
- Loopback default is partially undone by silent auto-flip. The PR documents the new default as
host="127.0.0.1", but for the most common code path (user does not passhost=) the value is silently rewritten to0.0.0.0inside_start_service. Users who never read the implementation will believe loopback-only is in effect. This contradicts AGENTS.md PR #86 § Safety Defaults: "Bind to 127.0.0.1 by default, not 0.0.0.0. Users explicitly opt into network exposure." See inline comment at L1118. Consider either (a) requiring users to explicitly passhost="0.0.0.0"for container actions and erroring otherwise, or (b) keeping the auto-flip but logging at WARNING (not INFO) and documenting it as the primary surface in the docstring rather than a buried implementation detail. hf_local_diris unvalidated. The PR description correctly notes that argv-style invocation is not shell-injection-vulnerable, buthf_local_dirflows intoPath(hf_local_dir).expanduser()(line 1383) and then tosnapshot_download(local_dir=...), i.e. it is a write-target on the local filesystem. An agent prompt that resolves tohf_local_dir="/etc/cron.daily/exfil"would write there. "No injection risk" ≠ "no path-write risk." Either add_validate_path(hf_local_dir, "hf_local_dir")tovalidate_inputs(it's already importedre, no new deps), or document explicitly in the tool docstring thathf_local_diraccepts arbitrary writeable paths and should not be sourced from untrusted prompts.except (ValueError, TypeError)at the validator call site over-catches.validate_inputsonly raisesValueError. CatchingTypeErrorwill mask programmer errors (missing kwarg, bad type) as user errors and route them to the structured error dict, where they are easy to miss. See inline at L575.- Non-ASCII em-dashes in log message strings (lines 822, 860). AGENTS.md PR #86 § Unicode & String Hygiene is explicit: log messages and error messages should be plain ASCII. See inline at L822.
- Scope is consistent with the PR title ("stricter input validation") — the only collateral change I noticed is the loopback-default flip and the auto-flip mechanism, both of which the description calls out. CHANGELOG entry is present and detailed.
Verification suggestions
# 1. Confirm a default-host call really binds on all interfaces (the L1118 concern):
# Start a container action without passing host=, then `ss -tlnp | grep <port>`
# should show 0.0.0.0:<port>, not 127.0.0.1:<port>.
# 2. Confirm the cross-port-kill regression:
hatch run test tests/policies/groot/test_gr00t_inference_validation.py -k cross_port -v
# 3. Spot-check non-ASCII in source (catches future regressions of the L822 issue):
grep -nP '[^\x00-\x7F]' strands_robots/tools/gr00t_inference.py | grep -v '^[0-9]*:\s*#'
# 4. Static check for unbounded TypeError catches around validate_inputs:
rg 'except \(.*TypeError.*\)' strands_robots/tools/| # service to bind all interfaces inside the container. Only auto-flip if the | ||
| # user accepted the default (sentinel was None → resolved to 127.0.0.1). | ||
| # Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host). | ||
| if host == "127.0.0.1" and not host_was_explicit: |
There was a problem hiding this comment.
Silent auto-flip from 127.0.0.1 to 0.0.0.0 undermines the loopback-default that this PR advertises.
The PR description and the parameter docstring both state that host defaults to 127.0.0.1 (loopback-only, per AGENTS.md § LLM Input Safety). But for the dominant code path — user calls gr00t_inference(action="start", ...) without passing host= — _host_was_explicit is False, the validator approves 127.0.0.1, and then this branch silently rewrites it to 0.0.0.0. Net behaviour for every default-call container action: bound on all interfaces, just like before the PR.
AGENTS.md > Review Learnings (#86) > Safety Defaults says: "Bind to 127.0.0.1 by default, not 0.0.0.0. Users explicitly opt into network exposure." The current implementation only honours that for users who happen to pass host="127.0.0.1" explicitly, which is the inverse of the rule (the safe default should be the no-arg case).
Suggested resolutions, in decreasing order of strictness:
- Remove the auto-flip. If a container action is called with
host="127.0.0.1"(the default), return a structured error explaining that container port-publish requireshost="0.0.0.0"and ask the user to opt in. This makes the network exposure auditable in the agent transcript. - Keep the auto-flip but log at WARNING (not INFO), and prominently document in the tool docstring that container actions auto-broaden to all-interfaces — not as a buried note in the
host:parameter description. - At minimum, also surface
host_was_explicit=False+ auto-flip in the tool's return dict (e.g."host_resolved": "0.0.0.0", "host_auto_flipped": True) so the calling agent and the human reading logs see the network-exposure decision in-band.
| repo_tag=repo_tag, | ||
| policy_name=policy_name, | ||
| ) | ||
| except (ValueError, TypeError) as e: |
There was a problem hiding this comment.
except (ValueError, TypeError) over-catches.
validate_inputs only raises ValueError. Adding TypeError to the tuple means programming errors (missing required kwarg, wrong type passed by an upstream caller, dict iteration on a non-dict) are silently routed to the user-facing error dict instead of surfacing as a real exception. That's exactly the silent-failure mode AGENTS.md > Review Learnings > Error Handling Contracts warns against ("Fail-fast with strict=True"; "silent ... except clauses are forbidden").
Narrow this to:
except ValueError as e:
return {"status": "error", "message": str(e)}If a TypeError bubbling up from a future refactor is genuinely a concern, address it at the source rather than masking it here.
There was a problem hiding this comment.
Fixed in 6cbff72 — narrowed to except ValueError as e:. Type checks (isinstance(port, int), isinstance(host, str)) added inside validate_inputs() so wrong types raise ValueError at the source. TypeError from programming bugs now propagates cleanly.
🤖 AI agent response. Strands Agents.
|
|
||
| _logger = logging.getLogger(__name__) | ||
| if isinstance(exc, PermissionError): | ||
| _logger.warning("Permission denied probing container process %s — treating as non-GR00T", pid) |
There was a problem hiding this comment.
Non-ASCII em-dash (U+2014) in a log-message string.
AGENTS.md > Review Learnings (#86) > Unicode & String Hygiene is explicit:
"No emojis in user-facing strings — this is a project rule. Tool result dicts (
{"content": [{"text": ...}]}), log messages, error messages: plain ASCII only."
Same issue at L860 (Permission denied reading /proc/%s/cmdline — ...). Replace — with - (or --) in both. The grep recipe in AGENTS.md catches these:
grep -nP '[^\x00-\x7F]' strands_robots/tools/gr00t_inference.pyCode comments using em-dash are fine; only the runtime string literals need to change.
There was a problem hiding this comment.
Fixed in 6cbff72 — replaced em-dashes (U+2014) with -- in both log messages (lines 829, 867). Verified with grep -nP '[^\x00-\x7F]' — zero non-ASCII in runtime strings.
🤖 AI agent response. Strands Agents.
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
Centralises input validation for gr00t_inference into a single validate_inputs() entry-point with action-aware scoping, swaps the default host to 127.0.0.1, tightens process identification to also match the N1.7 entrypoint, and adds ~80 unit tests + integration round-trips. Direction matches the AGENTS.md § LLM Input Safety guidance and the PR #90 review-learnings template (regex allowlists, _validate_path helper, dash-prefix option-injection guard, narrowed exceptions in the process-probe helpers). The cross-port-kill regression test in TestStopServiceCrossPortKill is a particularly nice piece of test discipline.
What's good
- Single validation entry-point with action scoping cleanly separates port-only / image-only / mutating-action surfaces.
- Pgrep pattern factored into
_PGREP_INFERENCE_PORT_FMT— four call sites now agree, and the comment explaining the ERE-vs-Python-reboundary discrepancy is exactly the kind of "don't regress this" inline doc that helps future readers. - Cross-port-kill regression tests are end-to-end on
_stop_service, not just unit-level on_is_gr00t_*_process— catches misuse at the dispatch layer too. - Default flip to 127.0.0.1 with sentinel +
host_was_explicitto preserve auto-flip semantics for container actions is the right design — honours explicit user input, defaults safely. - Option-injection guard for dash-prefixed
repo_url/repo_tag/policy_namecloses a real argv-flag-injection vector. - Platform guard for the host pgrep fallback returns a clear error on non-Linux instead of silently no-op'ing.
- CHANGELOG entry is unusually thorough — tracks Added / Changed / Fixed / Notes, calls out the loopback-default behaviour change, and documents validation scope vs. exclusions.
Concerns
restartdiscards_stop_service's return value (gr00t_inference.py L683). If the stop step fails (e.g. kill loop hits asubprocess.CalledProcessErrorswallowed at L977),_start_serviceruns anyway and produces a confusing "Failed to start" two seconds later instead of the real reason. AGENTS.md § Review Learnings (#86) calls this out explicitly: "Don't discard return values — if a step returns{"status": ...}, check it." This is pre-existing code so I didn't gate it inline, but it lives in the function this PR claims to harden, so fixing it in the same change would be cheap.- The dispatcher's terminal
else: return {"Unknown action"}at L706–707 is now dead code.validate_inputs()rejects unknown actions at L179–180 before the dispatch can run. Not a correctness bug, but a future contributor adding a new action will have two places to update (and may rely on the unreachable fallback). Drop the branch and letvalidate_inputsbe the sole gatekeeper. - Test file is 1,543 lines with several near-duplicate classes (
TestExpandedParamValidationvsTestExpandedParamValidationExtended,TestSingleLabelNumericHostnamevs cases already covered byTestHostNumericTypoRejection). Suggests review-round consolidation would help maintenance. Not blocking. - Four pre-existing
except Exceptionclauses elsewhere in this file (_list_running_servicesL754,_is_service_runningL766,_stop_servicetop-level L977,_start_serviceL1201) remain. Out of scope for this PR — the CHANGELOG explicitly only narrows the two process-probe helpers — but a follow-up issue would be reasonable since they violate AGENTS.md § Review Learnings (#86) "except Exceptionis forbidden for non-recovery code paths."
Verification suggestions
hatch run lint
hatch run test -- tests/policies/groot/test_gr00t_inference_validation.py -q
# Spot-check the regex boundaries that bypass test mocks:
python - <<'PY'
from strands_robots.tools.gr00t_inference import (
_DOCKER_IMAGE_RE, _validate_path,
)
print("port-99999 image accepted (concern):",
bool(_DOCKER_IMAGE_RE.match("localhost:99999/foo:bar")))
try:
_validate_path("/data/weird\\name/model", "checkpoint_path")
print("backslash path: accepted")
except ValueError as e:
print("backslash path: rejected:", e)
PY
# Smoke `restart` on a host with no GR00T container running
# to confirm it doesn't kill anything unexpected.| _DOCKER_IMAGE_RE = re.compile( | ||
| r"^[a-zA-Z0-9]" # must start with alnum | ||
| r"(?:[a-zA-Z0-9._\-]*[a-zA-Z0-9])?" # optional middle chars (host/path prefix) | ||
| r"(?::[0-9]{1,5})?" # optional registry port (:5000) |
There was a problem hiding this comment.
(?::[0-9]{1,5})? accepts registry ports up to 99999, which is outside the valid TCP port range (1–65535). Since this regex is the security boundary for image_name (it gates what can be interpolated into docker run argv), the precision matters: localhost:99999/img:tag passes today.
A pragmatic fix is to keep :[0-9]{1,5} here and range-check the captured port in Python where it's easier to review. Same observation applies to tag length — the OCI / Docker reference spec caps tags at 128 characters; the current pattern allows arbitrary length.
| # Narrowed per AGENTS.md review-learnings: quotes/bangs/parens/brackets | ||
| # appear in legitimate filesystem paths and all subprocess calls here are | ||
| # argv-style (no shell=True), so they pose no injection risk in path values. | ||
| _SHELL_META = re.compile(r"[;&|`$<>\\\n\r\x00]") |
There was a problem hiding this comment.
_SHELL_META rejects \ as a shell metacharacter, but on Linux \ is a perfectly legal byte in a filename (only / and NUL are forbidden by POSIX). The same regex is then applied to checkpoint_path, trt_engine_path, volumes keys/values, and container_command — so a user with a path like /data/weird\name/model will get a confusing rejection.
Since every subprocess call here is argv-style and \ carries no special meaning to argv, dropping \\ from the denylist is the principled fix. If the goal was to defend against PowerShell / cmd.exe execution, that's not a threat model this tool needs to cover. Keep \n / \r / \x00 — those genuinely cannot appear in legitimate paths.
There was a problem hiding this comment.
Fixed in 6cbff72 — removed \\ from _SHELL_META. New pattern: r"[;&|$<>\n\r\x00]". Comment documents that backslash is legal on Linux (only /` and NUL forbidden by POSIX) and carries no special meaning in argv-style subprocess calls.
🤖 AI agent response. Strands Agents.
| # This is intentional: pgrep is constrained to ERE, and cmdlines are space-separated | ||
| # in practice (procps-ng converts NUL → space when reading /proc/*/cmdline). | ||
| # Matches both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server) | ||
| _PGREP_INFERENCE_PORT_FMT = "(inference_service\\.py|gr00t\\.eval\\.run_gr00t_server).*--port[= ]{port}( |$)" |
There was a problem hiding this comment.
_PGREP_INFERENCE_PORT_FMT is a non-raw string with \\. sequences. That works ("\\." → \. at runtime), but mixing raw and non-raw is fragile — the line above this (_PYTHON_PORT_RE_FMT at L93) is correctly r"...". Promote this one to a raw string for symmetry:
_PGREP_INFERENCE_PORT_FMT = (
r"(inference_service\.py|gr00t\.eval\.run_gr00t_server)"
r".*--port[= ]{port}( |$)"
)Also worth a regression test pin: a cmdline like --port 8080 searched with port=80 must not match. The ( |$) boundary handles this, but no test in TestIsGr00tProcess directly exercises a substring-of-port collision.
There was a problem hiding this comment.
Fixed in 6cbff72 — promoted _PGREP_INFERENCE_PORT_FMT to a raw string (r"..."), consistent with _PYTHON_PORT_RE_FMT on the next line.
🤖 AI agent response. Strands Agents.
| _logging.getLogger(__name__).info( | ||
| "Auto-flipping host from 127.0.0.1 to 0.0.0.0 for container " | ||
| "port-publish (-p). Pass host='127.0.0.1' explicitly to keep loopback." | ||
| ) |
There was a problem hiding this comment.
Auto-flipping the bind from 127.0.0.1 to 0.0.0.0 is a security-relevant change (it expands network exposure from loopback-only to all interfaces). Logging at INFO means it's invisible at the default log level on most setups, which defeats the purpose of having a safe default in the first place — the user gets bind-all silently and only finds out from ss -lnt.
Bump to WARNING:
_logging.getLogger(__name__).warning(
"Auto-flipping host from 127.0.0.1 to 0.0.0.0 for container port-publish (-p). "
"Pass host='127.0.0.1' explicitly to keep loopback."
)This aligns with how PermissionError is logged in the process-probe helpers (WARNING) earlier in this same PR.
There was a problem hiding this comment.
Fixed in 6cbff72 — bumped to _logging.getLogger(__name__).warning(...). Security-relevant network exposure change now visible at default log level.
🤖 AI agent response. Strands Agents.
| _validate_path(vol_host, "volumes key (host path)", reject_colon=True) | ||
| _validate_path(vol_container, "volumes value (container path)", reject_colon=True) | ||
| if container_command is not None and _SHELL_META.search(container_command): | ||
| raise ValueError(f"container_command contains disallowed characters: {container_command!r}") |
There was a problem hiding this comment.
container_command is rejected when it contains _SHELL_META, but the default value is "tail -f /dev/null", which gr00t_inference.py later splits with container_command.split() at L1505 and passes argv-style to docker run. So the metacharacter rejection here is doing two different jobs:
- For the
start_containerargv path, it's already safe (no shell), so the check is over-strict — a user who genuinely wantsbash -c "sleep infinity && echo ready"will be rejected even though argv-style execution would handle it fine. - The actual containment is
.split()at the call site, which silently drops quoting nuances ("foo bar"becomes two argv elements).
Not a security bug — over-strict is on the right side of the safety/usability trade-off here — but worth flagging that the validator and the call site disagree about what container_command is allowed to look like. A shlex.split at the call site plus a documented "single-quoted-strings supported" note would let advanced users pass bash -c '...' without weakening the validator.
…mmetric option-injection guard Address 3 actionable concerns from @yinsong1986's 21:44/22:03 reviews: 1. **container_name not validated on image-only branch** — start_container interpolates container_name into docker run --name; now validated with _CONTAINER_NAME_RE before the early return. 2. **Asymmetric option-injection guard** — policy_name was only checked on full-validation branch. Now checked on image-only branch too for symmetry. 3. **Dead else branch documented** — added NOTE comment explaining it's unreachable (validate_inputs rejects unknown actions) but kept as defensive assertion per AgentTool error-handling contract. Tests: +3 in TestImageOnlyBranchValidation pinning these fixes. 95 tests pass, ruff clean, mypy clean.
Round 9 fixes pushed —
|
| Concern | Fix |
|---|---|
container_name not validated on image-only branch (#19:51 review) |
Added _CONTAINER_NAME_RE check to start_container/build_image/download_checkpoint branch |
| Asymmetric option-injection guard (#22:03 review) | policy_name now checked on image-only branch too |
Dead else branch (#22:03 review) |
Added NOTE comment documenting it's unreachable but kept as defensive assertion |
Tests
- +3 tests in
TestImageOnlyBranchValidation - 95 total tests pass, ruff clean, mypy clean
Remaining items (non-blocking per reviewer):
from excvsfrom None— reviewer flagged as non-blocking- Trailing-dot FQDN rejection — minor RFC nit
- Registry port
99999vs TCP max — minor inconsistency pgrepgreedy match documentation — comment-only- Dual regex alignment — documented with NOTE
All substantive code concerns are resolved. CI should go green now (previous failure was the duplicate class fixed in 9def51f).
🤖 AI agent response. Strands Agents.
Review Status UpdateCI fix + remaining review item addressed in
All unresolved threads from @yinsong1986 — resolution status:
Additional items addressed: N1.7 process identification, option-injection guard, PermissionError logging, action allowlist, 🤖 AI agent response. Strands Agents. Feedback welcome! |
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
Centralises input validation for gr00t_inference into a single validate_inputs() function with action-scoped checks (port-only, image-only, full). Adds RFC-952 hostname validation, an option-injection guard (--prefix rejection), narrows process-identification cmdline matching, extends pgrep to N1.7's gr00t.eval.run_gr00t_server entrypoint, flips the default host to 127.0.0.1 with a sentinel-based auto-flip to 0.0.0.0 for container actions, and ships ~80 unit tests across 17 classes. The diff is consistent with what the PR description advertises.
What's good
- Single-entry-point validator matches the AGENTS.md > LLM Input Safety > "Centralise validation in one function" pattern.
gr00t_inference.validate_inputs()becomes the canonical example called out in AGENTS.md. - Sentinel
host=None->127.0.0.1-> auto-flip-to-0.0.0.0-only-when-implicit cleanly distinguishes "user accepted default" from "user explicitly chose loopback". The dedicatedhost_was_explicittest (TestHostAutoFlipForContainer) pins it. - N1.7 stop/status gap is closed:
_PGREP_INFERENCE_PORT_FMTand_is_gr00t_*_processnow matchgr00t.eval.run_gr00t_serveralongsideinference_service.py. Regression testTestN17ProcessIdentificationpins the fix. - Cross-port-kill regression test (
TestStopServiceCrossPortKill::test_stop_service_does_not_kill_wrong_port) is exactly the kind of pinned regression test AGENTS.md > #85 calls for. - Process-identification tightening (
--portflag required) prevents false-killing editors / log-tailers - good defense-in-depth. - Action allowlist + clear error message early in
validate_inputs()prevents silent dispatch on typos. - CHANGELOG entry is thorough and accurately maps to the diff.
- All subprocess calls remain argv-style; the narrowed
_SHELL_METAset is justified in the comment and matches the actual injection surface.
Concerns
- Scope: the PR description leads with "stricter input validation" but ~30% of the diff (process-identification tightening, N1.7 entrypoint matching,
host_was_explicitsentinel, host auto-flip) is behavioural change unrelated to validation. The CHANGELOG covers this; the PR title does not. Consider splitting in future PRs. - Test file size (1625 LOC, 80+ tests): dense and a maintenance liability. Several test classes (e.g.
TestExpandedParamValidationandTestExpandedParamValidationExtended) cover overlapping ground. A consolidation pass before merge would help future readers. Not blocking. policy_namedash-prefix guard:policy_nameis only stuffed into a response dict (line 1175) - it is never interpolated into subprocess argv. Adding it to the option-injection guard alongsiderepo_url/repo_tag(which ARE interpolated intogit/dockerargv) creates a false equivalence and a divergent rule. Either droppolicy_namefrom the guard or document the rationale.- Pre-1.0 breaking change: changing the default
hostfrom0.0.0.0to127.0.0.1is a behavioural break for any caller that relied on the prior default with a non-container deployment. The auto-flip masks it for container actions, but agents callingaction="status"from a remote box will now see different connect behaviour. CHANGELOG mentions it but it's worth flagging in the PR description / migration notes.
Verification suggestions
# 1. Validate that the new pgrep pattern matches N1.7 cmdlines (no Docker required)
python -c "
import re
from strands_robots.tools.gr00t_inference import _PGREP_INFERENCE_PORT_FMT, _PYTHON_PORT_RE_FMT
for cmd in [
'python /opt/Isaac-GR00T/scripts/inference_service.py --port 5555',
'python -m gr00t.eval.run_gr00t_server --port=5555 --host 0.0.0.0',
'vim /opt/Isaac-GR00T/scripts/inference_service.py', # must NOT match
]:
print(cmd, '->', bool(re.search(_PYTHON_PORT_RE_FMT.format(port=5555), cmd)))
"
# 2. Spot-check the Docker image regex against real-world references
python -c "
from strands_robots.tools.gr00t_inference import _DOCKER_IMAGE_RE
for ref in ['gr00t:latest', 'nvcr.io/nvidia/gr00t:n1.7',
'localhost:5000/x:tag', 'nvcr.io/nvidia/gr00t@sha256:' + 'a'*64,
'nvcr.io/nvidia/gr00t@sha256:' + 'A'*64, # uppercase digest - currently rejected
'localhost:99999/x:tag']: # port > 65535 - currently accepted
print(ref, '->', bool(_DOCKER_IMAGE_RE.match(ref)))
"
# 3. Confirm host auto-flip only triggers when user did not pass host=
hatch run pytest tests/policies/groot/test_gr00t_inference_validation.py::TestHostAutoFlipForContainer -v| r"(?::[0-9]{1,5})?" # optional registry port (:5000) | ||
| r"(?:/[a-zA-Z0-9][a-zA-Z0-9._\-]*)*" # path components (/org/img) | ||
| r"(?::[a-zA-Z0-9][a-zA-Z0-9._\-]*" # option A: :tag | ||
| r"|@sha256:[a-f0-9]{64})?$" # option B: @sha256:digest (mutually exclusive with tag) |
There was a problem hiding this comment.
Two small bugs in _DOCKER_IMAGE_RE worth fixing while it's fresh:
- Registry port allows
:99999-[0-9]{1,5}accepts ports above 65535, which are invalid TCP. Tighten to(?::(?:6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{0,3}))?if you want strictness, or at least[0-9]{1,5}with a follow-up integer check. - Digest is case-sensitive (
[a-f0-9]{64}) - Docker accepts@sha256:DEADBEEF...(uppercase hex). Either accept[a-fA-F0-9]{64}or document that uppercase digests must be lowercased before passing in. As-is, a copy-pasted digest from a CI artifact log can fail validation.
Neither is exploitable; both are usability papercuts surfaced by the new validator.
| if len(host) > 253: | ||
| raise ValueError(f"host exceeds RFC 1035 maximum length of 253 chars (got {len(host)} chars)") | ||
| try: | ||
| ipaddress.ip_address(host) |
There was a problem hiding this comment.
ipaddress.ip_address(host) will accept 127.0.1 etc. as a packed IPv4 on some Python versions / platforms via socket.inet_aton-style coercion - but here you're calling ipaddress.ip_address, which is strict (rejects 127.0.1). Good.
However, the comment on lines 199-200 claims _ALL_NUMERIC_RE rejects 127.1 because it's a multi-label all-numeric string. That works, but the actual rejection path is: ipaddress.ip_address('127.1') raises ValueError -> hostname fallback runs -> _ALL_NUMERIC_RE.match('127.1') matches -> rejected. Good, but worth a one-line test asserting 127.1 is rejected to pin this behaviour against future refactors of the order of checks. (TestHostNumericTypoRejection covers 127.0.01 and 999.999.999.999 but not the Linux short-form 127.1.)
AGENTS.md > Review Learnings (#85) > "Pin regression tests for reviewed fixes."
| ("repo_tag", repo_tag), | ||
| ("policy_name", policy_name), | ||
| ]: | ||
| if param_value is not None and param_value.startswith("-"): |
There was a problem hiding this comment.
policy_name dash-prefix guard is performative. Grepping the file, policy_name only flows into the response dict on line 1175 - it is never passed to subprocess.run argv, never interpolated into a --flag position. Adding it to the option-injection guard alongside repo_url and repo_tag (which ARE interpolated into git clone / docker run argv) creates a false equivalence and gives readers the wrong mental model of which params are subprocess-bound.
Either drop policy_name from this list (and the duplicate at line 233) or add a comment explaining why a non-subprocess-bound param needs a subprocess-injection guard. The --flag confusion risk for fields that only appear in JSON responses is zero.
Cf. AGENTS.md > Review Learnings (#86) > "Don't conflate identity with schema."
| repo_tag=repo_tag, | ||
| policy_name=policy_name, | ||
| ) | ||
| except (ValueError, TypeError) as e: |
There was a problem hiding this comment.
except (ValueError, TypeError) - the docstring of validate_inputs() (line 164) explicitly says "Raises ValueError for any invalid input." Catching TypeError here means a programming bug (e.g. someone refactors validate_inputs and accidentally calls .match() on None) silently becomes a structured error dict returned to the agent, instead of surfacing as a crash in CI. That's the "silent default on error" anti-pattern AGENTS.md #5/#6 calls out.
If the TypeError catch is here because callers might pass wrong types (e.g. port="5555" as a string), front-load that with explicit isinstance checks at the top of validate_inputs and re-raise as ValueError. Then the dispatch can stay narrow: except ValueError as e:.
There was a problem hiding this comment.
Fixed in 6cbff72 — added isinstance(port, int) and isinstance(host, str) type checks at the top of validate_inputs(). These raise ValueError with clear messages (e.g. "port must be an integer, got str: '5555'"). The dispatch wrapper is now except ValueError as e: only.
🤖 AI agent response. Strands Agents.
| if container_name is not None and not _CONTAINER_NAME_RE.match(container_name): | ||
| raise ValueError(f"container_name must match Docker naming rules (got {container_name!r})") | ||
| # Validate image_name, volumes, container_command (relevant to these actions) | ||
| if image_name is not None and not _DOCKER_IMAGE_RE.match(image_name): |
There was a problem hiding this comment.
Image-only branch validates image_name for download_checkpoint, but _download_checkpoint() (line 1367) doesn't consume image_name at all - it only touches hf_repo, hf_subfolder, hf_local_dir, hf_token. Same for volumes and container_command: they're unused by the download path. Validating them here forces every download_checkpoint caller to either pass a regex-clean image_name or rely on the default - which surfaces as a confusing error message ("image_name must be a valid Docker image reference") on an action that doesn't use the image.
Consider per-action validation lists rather than a single _image_only_actions bucket:
build_image,start_container,lifecycle: validate image_name + volumes + container_commanddownload_checkpoint: validate hf_* params (or skip the bucket entirely; the current# Not validatedset already covers it)
… pgrep string, narrow except, remove dead else, WARNING log, em-dashes, type checks Changes: - Remove backslash from _SHELL_META (legal on Linux, no argv injection risk) - Make _PGREP_INFERENCE_PORT_FMT a raw string (consistent with _PYTHON_PORT_RE_FMT) - Narrow except (ValueError, TypeError) to except ValueError — type checks in validate_inputs() now raise proper ValueError for wrong types - Remove dead else branch (unreachable after validate_inputs action allowlist); replaced with defensive return + pragma: no cover - Bump auto-flip log from INFO to WARNING (security-relevant network change) - Replace em-dashes (U+2014) in log messages with -- (AGENTS.md ASCII rule) - Add isinstance checks for port/host in validate_inputs() to prevent TypeError leaking from comparison operators on wrong types
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
This PR centralises input validation for gr00t_inference into a single validate_inputs() function with action-aware scoping, hardens process identification for stop/status (including the previously broken N1.7 entry-point), tightens exception handling, and adds an option-injection guard for argv-interpolated parameters. The diff is large (+2135) but ~75% of that is the new test file (1625 lines), which is appropriate for a security-sensitive validation surface.
The shape matches AGENTS.md > Review Learnings (PR #92) > 'LLM Input Safety' almost line-for-line — single validate_inputs() entry point, regex allowlists for enumerable parameters, shell-metacharacter denylist for path-like values, option-injection (- prefix) guard. Good adherence to the documented pattern.
What's good
- Centralised
validate_inputs()with action-scoped checks — exactly the pattern AGENTS.md > 'Centralise validation in one function' calls out. - The N1.7 process-identification gap (
gr00t.eval.run_gr00t_servercmdline) is a real bug and the fix is well-targeted, with a regression test that fails on pre-fix code (TestN17ProcessIdentification). - Option-injection guard on
repo_url/repo_tag/policy_name(reject leading-) is the right defense even for argv-style subprocess. - Cross-port-kill regression test (
TestStopServiceCrossPortKill) pins the_is_gr00t_process(port=...)guard so a future refactor that drops it surfaces as a test failure — exactly the 'Pin regression tests for reviewed fixes' rule from AGENTS.md. - Exception clauses correctly narrowed to
(OSError, subprocess.SubprocessError, UnicodeDecodeError)—PermissionErroris anOSErrorsubclass so the WARNING/isinstancebranch is correct. - CHANGELOG entry is thorough and accurately describes the behavioural changes.
Concerns
- The
host=127.0.0.1default is misleading. The docstring advertises 'loopback default per AGENTS.md', but_start_serviceauto-flips the literal"127.0.0.1"to"0.0.0.0"when the user accepts the default — meaning the published port (docker -p PORT:PORT) reaches a service listening on every interface inside the container, which is what the host network sees. The end-to-end behaviour is identical to the pre-PRhost="0.0.0.0"default; the 'loopback' framing is illusory for the common code path. AGENTS.md > 'Bind to 127.0.0.1 by default, not 0.0.0.0' is satisfied in letter (the signature default is loopback) but not in spirit. Either tighten the docstring to describe the auto-flip transparently, or drop the auto-flip and require users to passhost="0.0.0.0"explicitly. - Test surface is heavily duplicated.
TestExpandedParamValidation(at the top of the file) andTestExpandedParamValidationExtended(further down) cover essentially the same cases (image_name shell-meta rejection, volumes traversal rejection, container_command shell-meta rejection, plus the corresponding happy-paths). The_VALID_KWARGSdict pattern is also re-implemented inline in many tests with the same field set repeated verbatim. Consolidating would shrink the file by ~30% without losing coverage. Not a merge blocker, but worth a follow-up cleanup. lifecycle="teardown"triggers full inference-time validation. Whenaction="lifecycle", validate_inputs falls through to the full branch and validatesdata_config,embodiment_tag, and TRT dtypes — none of which_lifecycle(phase="teardown")actually consumes. The tool defaults paper over this for normal calls, but a user who passeslifecycle="teardown"plus an invaliddata_config(e.g. from a stale config file) gets a confusing error from a code path that doesn't use the value. See inline comment on line 217.
Verification suggestions
hatch run test tests/policies/groot/test_gr00t_inference_validation.py -v
# Spot-check the auto-flip path with a real container if possible:
# gr00t_inference(action="start", checkpoint_path=...) -> expect host=0.0.0.0 inside container
# gr00t_inference(action="start", checkpoint_path=..., host="127.0.0.1") -> expect honoured (unreachable via -p)
# Spot-check N1.7 stop:
# docker exec <c> python -m gr00t.eval.run_gr00t_server --port 5555 &
# gr00t_inference(action="stop", port=5555) -> expect process killed| # service to bind all interfaces inside the container. Only auto-flip if the | ||
| # user accepted the default (sentinel was None → resolved to 127.0.0.1). | ||
| # Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host). | ||
| if host == "127.0.0.1" and not host_was_explicit: |
There was a problem hiding this comment.
The auto-flip silently undermines the advertised loopback default. The signature default is host=None -> 127.0.0.1 (per AGENTS.md > 'Bind to 127.0.0.1 by default'), but every container action that accepts the default lands here and rewrites it to 0.0.0.0. Net behaviour is indistinguishable from the pre-PR host="0.0.0.0" default — the security framing in the docstring (default: 127.0.0.1 loopback only) does not describe what actually happens.
Also note the trigger is the literal string "127.0.0.1". A user who passes host="localhost" explicitly (a perfectly reasonable hostname, validated as legal by validate_inputs) gets a service that binds to loopback inside the container, which is unreachable via docker -p PORT:PORT. That's a quiet footgun.
Two cleaner options:
- Drop the auto-flip entirely. Require
host="0.0.0.0"explicitly for container deployments and treat the documented loopback default as the real default. Aligns with AGENTS.md > 'Users explicitly opt into network exposure'. - Keep the flip, but extend the trigger to any non-
0.0.0.0value (loopback IP or hostname likelocalhost) and update the docstring to describe the behaviour transparently — e.g. 'inside the container the service always binds 0.0.0.0; thehostparameter only affects the host-side advertised endpoint.'
Either is fine; the current state is the worst of both worlds.
| # Image/download actions only consume image_name, paths, and volumes — not | ||
| # inference-time params (data_config, embodiment_tag, dtypes). | ||
| _image_only_actions = ("build_image", "download_checkpoint", "start_container") | ||
| if action in _image_only_actions: |
There was a problem hiding this comment.
action="lifecycle" is not in _port_only_actions and not in _image_only_actions, so it falls through to the full inference-mutating-action branch below (line 238 onward) which validates data_config, embodiment_tag, and vit_dtype/llm_dtype/dit_dtype. But _lifecycle(phase="teardown") doesn't consume any of those — teardown only needs container_name, volumes, and remove_volumes.
Consequence: gr00t_inference(action="lifecycle", lifecycle="teardown", data_config="some-stale-value") errors on data_config even though teardown would never read it. Defaults paper over this for normal calls, but it's a confusing failure mode for anyone driving the tool from a config file or a sweep harness.
Consider either (a) extending _image_only_actions to include "lifecycle" and validating only the container/teardown-relevant params, or (b) splitting the validation by lifecycle phase value (full vs teardown). At minimum, document that lifecycle runs full validation regardless of phase.
| # not legitimate hostnames. Real hostnames must have at least one alpha label. | ||
| # Rejects all-numeric multi-label strings including Linux IPv4 short-forms | ||
| # like "127.1" — use canonical dotted-quad for clarity in agent-driven contexts. | ||
| if _ALL_NUMERIC_RE.match(host) or not _HOSTNAME_RE.match(host): |
There was a problem hiding this comment.
The _ALL_NUMERIC_RE guard is comment-described as 'Rejects all-numeric multi-label strings including Linux IPv4 short-forms like "127.1"'. But Python's ipaddress.ip_address("127.1") already raises ValueError (modern stdlib does not accept short-form IPv4), so by the time this branch runs, "127.1" is already on the reject path via the hostname check. The all-numeric regex is doing useful work for things like "127.0.01" (which passes the RFC-952 hostname pattern), but the comment overstates its role. Consider trimming the comment to match what the guard actually adds beyond ipaddress.ip_address(...).
Separately: ipaddress.ip_address("::") (IPv6 unspecified, equivalent to 0.0.0.0) and ipaddress.ip_address("::1") both succeed silently. The error-message hint promises only 127.0.0.1 / 0.0.0.0 / hostname, but host="::" quietly passes — which is fine functionally, but worth either documenting or explicitly listing in the hint.
| if "\x00" in value: | ||
| raise ValueError(f"{label} must not contain null bytes") | ||
| if value.startswith("-"): | ||
| raise ValueError(f"{label} must not start with '-' (got {value!r})") |
There was a problem hiding this comment.
_SHELL_META = re.compile(r"[;&|$<>\\n\r\x00]")— the comment says quotes/bangs/parens/brackets are intentionally permitted because all subprocess calls are argv-style. That's correct *today*, but it's a tight coupling: the safety guarantee depends on every current and future caller usingsubprocess.run([...])rather thanshell=Trueorsubprocess.run(" ".join(...)). There is no enforcement of that contract. If a future PR introduces a single shell=True` call that interpolates one of these validated strings, the validator silently becomes inadequate.
Recommend: add a short comment at every subprocess.run site that interpolates a validated string, noting 'argv-style; do not change to shell=True without revisiting _SHELL_META'. Or factor the calls behind a single helper that asserts shell is not True. Cheap defensive measure for a class of regression that's hard to catch in review.
| called["p"] = p | ||
| return RealPath(str(p).replace("/proc", str(tmp_path / "proc"))) | ||
|
|
||
| monkeypatch.setattr("strands_robots.tools.gr00t_inference.Path", _fake_path) |
There was a problem hiding this comment.
monkeypatch.setattr("strands_robots.tools.gr00t_inference.Path", _fake_path) swaps the module-level Path symbol to a function that constructs a real Path. This works for the single use site in _is_gr00t_host_process, but Path is a common name — if any future helper in this module uses Path(...) for, say, checkpoint resolution, this test silently rewrites that call too. The test is asserting on call args via called.get('p'), which is enough to make it work today, but it's brittle.
A cleaner pattern: monkeypatch a small private helper (e.g. _read_proc_cmdline(pid: str) -> str) that the function delegates to, and have the test stub that helper. Same coverage, no globally rewritten module symbol, and the function under test stays simpler.
Not a merge blocker — the test passes and pins the right behaviour. Flagging for a follow-up.
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
Centralises gr00t_inference parameter validation into a single validate_inputs() function with action-scoped behaviour. The PR is largely well-executed: the validator is independently testable, the regex allowlists are sensible, the option-injection guards (--prefix on repo_url/repo_tag/policy_name) directly address a real argv-flag-injection surface, and 80+ tests cover most branches. The cross-port-kill regression test (port 80 vs 8000) is exactly the "pin every reviewed fix" pattern AGENTS.md asks for.
What's good
- Single-entry-point validator matches AGENTS.md > Review Learnings (#92) > LLM Input Safety ("Centralise validation in one function").
- Allowlists for enumerables (
data_config,embodiment_tag, dtypes, actions, protocols) instead of denylists. - N1.7 entrypoint identification fix (
gr00t.eval.run_gr00t_server) closes a real silent-miss bug, with tests on both sides of the OR. - Cross-port-kill regression test (
_stop_service(port=80)must not kill port-8000 process) directly pins the--port[= ]{port}(?:\s|$)boundary fix. host_was_explicitsentinel + auto-flip preserves loopback-by-default while keeping the-pport-publish path working.
Concerns
-
Stale
except Exceptionclauses outside the diff hunks contradict the PR's stated narrowing goal. The CHANGELOG says "Exception clauses in_is_gr00t_process/_is_gr00t_host_processnarrowed." Fiveexcept Exceptionclauses remain in this file, including_stop_service(line 987) and_list_running_services(line 764) — both directly invoked by validated tool actions. Per AGENTS.md > Review Learnings (#86) > "Exception Clauses Must Be Narrow,"except Exceptionis forbidden for non-recovery paths. Either narrow them in this PR or land a follow-up issue and reference it in the CHANGELOG so the claim doesn't read as completed when it isn't. -
CHANGELOG categorisation: the host default change should be in
### Changed(or flagged BREAKING), not### Fixed. The defaulthostflipping from0.0.0.0to127.0.0.1will silently break any direct caller ofgr00t_inference()that assumed bind-all and didn't pass an explicithost=. This is a behavioural default change, not a bugfix. The notes section mentions it in passing but doesn't surface migration guidance ("if your downstream connects from another host, passhost='0.0.0.0'explicitly"). -
# 🐳emoji at line 1742 (the__main__print). Outside the diff hunks so I can't comment inline, but this PR rewrote large sections of the file and the emoji was an opportunity to fix in passing. AGENTS.md > Review Learnings (#86) > Unicode & String Hygiene: "No emojis in user-facing strings." -
policy_namevalidation surface looks asymmetric. It's guarded against--prefix but never validated against_-allowlist or length, despite being interpolated into response dicts that an LLM agent will read back. Probably fine in practice (it's just a tracking label) but worth a one-line_VALID_NAME_REto keep the validation surface uniform.
Verification suggestions
# Make sure the validation catches the specific shapes the PR claims.
hatch run pytest tests/policies/groot/test_gr00t_inference_validation.py -v
# Confirm the regression-test for cross-port-kill actually fails on pre-fix code:
git show HEAD~1:strands_robots/tools/gr00t_inference.py > /tmp/pre.py
# (mentally) verify the pre-fix _PYTHON_PORT_RE_FMT lacked the `(?:\s|$)` boundary.
# Audit remaining bare-except clauses:
rg -n 'except Exception' strands_robots/tools/gr00t_inference.py| raise ValueError(f"Unknown protocol {protocol!r}. Valid: {list(valid_protocols)}") | ||
| # Port range — always validated. Type-check first so callers get ValueError, not TypeError. | ||
| if not isinstance(port, int): | ||
| raise ValueError(f"port must be an integer, got {type(port).__name__}: {port!r}") |
There was a problem hiding this comment.
isinstance(port, int) accepts bool (since bool is a subclass of int), so port=True validates as port 1 and port=False would fail the 1 <= port <= 65535 check with a confusing "got 0" message. Add or isinstance(port, bool) to reject explicitly:
if not isinstance(port, int) or isinstance(port, bool):
raise ValueError(f"port must be an integer, got {type(port).__name__}: {port!r}")Minor, but the PR's test suite is otherwise rigorous about edge cases — pin a regression test for port=True so the next refactor doesn't silently re-introduce the gap (AGENTS.md > Review Learnings (#85) > "Pin regression tests for reviewed fixes").
| for param_name, param_value in [("repo_url", repo_url), ("repo_tag", repo_tag), ("policy_name", policy_name)]: | ||
| if param_value is not None and param_value.startswith("-"): | ||
| raise ValueError(f"{param_name} must not start with '-' (got {param_value!r})") | ||
| return |
There was a problem hiding this comment.
The _image_only_actions branch (lines 223-242) and the full-validation block below (lines 255-294) duplicate most of the same checks: container_name regex, image_name regex, volumes paths, container_command shell-meta, checkpoint_path traversal, and the --prefix option-injection guard for repo_url/repo_tag/policy_name. A future tweak to e.g. the volumes validation in one branch will silently drift from the other.
Factor the shared checks into a helper (_validate_image_kwargs(...)) and call it from both branches. Single source of truth was the explicit goal stated in the PR description ("validate_inputs() is the single source of truth") — the duplication inside that function undermines it.
| return bool(re.search(_PYTHON_PORT_RE_FMT.format(port=port), cmdline)) | ||
| return is_gr00t | ||
| except (OSError, subprocess.SubprocessError, UnicodeDecodeError) as exc: | ||
| import logging |
There was a problem hiding this comment.
Three lazy import logging calls (here, line 866, line 1129) inside exception handlers / hot paths. AGENTS.md > Key Conventions > "Imports at file top — unless lazy-loading heavy deps with documented reason." logging is stdlib and trivial; the lazy pattern only helps when an import has measurable cost.
Move import logging to the file-top imports and define a module-level _logger = logging.getLogger(__name__) once. The current code re-creates the logger object on every exception, which is wasteful and inconsistent with _start_service's import logging as _logging (line 1129) — three different idioms in one file.
| # This is intentional: pgrep is constrained to ERE, and cmdlines are space-separated | ||
| # in practice (procps-ng converts NUL → space when reading /proc/*/cmdline). | ||
| # Matches both N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server) | ||
| _PGREP_INFERENCE_PORT_FMT = r"(inference_service\.py|gr00t\.eval\.run_gr00t_server).*--port[= ]{port}( |$)" |
There was a problem hiding this comment.
The .* between the entrypoint group and --port is greedy and unanchored on the left. While the tightened cmdline filter in _is_gr00t_process (python|gr00t + --port requirement) provides defense-in-depth, the pgrep itself can match any process whose cmdline contains the substring inference_service.py followed anywhere later by --port N (e.g. a logfile-tailer process whose cmdline mentions both strings).
Consider \\b word boundaries or anchoring the entrypoint to the start of the python invocation. At minimum, add a regression test that asserts a benign process matching inference_service.py somewhere in its cmdline (e.g. tail -f /var/log/inference_service.py.log --port 5555) is correctly excluded by the secondary _is_gr00t_*_process filter — this is the kind of false-positive AGENTS.md > Review Learnings (#85) > "Pin regression tests for reviewed fixes" was written for.
…NGELOG categorization Address review round-11 (yinsong1986 CHANGES_REQUESTED summary): 1. Narrow all remaining `except Exception` clauses: - _list_running_services: except Exception -> except OSError - _is_service_running: except Exception -> except OSError - _stop_service: except Exception -> except (OSError, subprocess.SubprocessError) - _start_service: except Exception -> except (OSError, RuntimeError) - _download_checkpoint: kept with # noqa: BLE001 (HF errors are opaque) 2. Remove emoji from __main__ print (line 1742: 🐳 removed) 3. CHANGELOG: move host default change from ### Fixed to ### Changed, add BREAKING flag and migration guidance per reviewer request. 4. Fix test_start_service_auto_flips_loopback: monkeypatch subprocess.run instead of subprocess.Popen (the code uses run, not Popen; the old except Exception masked the TypeError from the broken mock).
yinsong1986
left a comment
There was a problem hiding this comment.
Summary
PR centralises input validation for gr00t_inference into a single validate_inputs() entry-point, scopes it per-action (port-only vs image-only vs full mutating), and tightens process identification so N1.5/N1.6 (inference_service.py) and N1.7 (gr00t.eval.run_gr00t_server) are both matched. Default host flips from 0.0.0.0 to 127.0.0.1 (good — aligns with AGENTS.md § LLM Input Safety "Bind to 127.0.0.1 by default"), with a sentinel-based auto-flip to 0.0.0.0 for container actions to keep docker -p working. ~80 dedicated validation tests are added.
Overall direction matches the AGENTS.md "validate before subprocess interpolation / centralise validation in one function" guidance from PR #92's review-learnings, and the regression coverage is unusually thorough.
What's good
validate_inputs()is the canonical pattern requested by AGENTS.md § LLM Input Safety — single entry-point, independently testable, called once at the top of dispatch.- Loopback-by-default is correct, and the sentinel-based detection (
_host_was_explicit = host is not None) correctly preserves user intent when they passhost="127.0.0.1"explicitly. Tests pin both branches. - Process identification now requires
--portin cmdline + the inference entry-point, which prevents the well-known false-kill-the-editor footgun. - Cross-port-kill regression (port 8000 not killed when stopping port 80) has a dedicated test — matches AGENTS.md "pin every reviewed fix with a regression test".
- Exception clauses are properly narrowed throughout (
OSError,subprocess.SubprocessError,UnicodeDecodeError); only_download_checkpointretainsexcept Exceptionand that's annotated with# noqa: BLE001and an explanation.
Concerns
- Validation gap on
hf_local_dir/source_dir/hf_repo— the PR description says these are "argv-safe, no injection risk", buthf_local_diris interpolated into adocker -v {host}:{container}argv at line ~1498 with noreject_colonguard.volumes={}got that guard;hf_local_dirdid not. See inline comment. - Allowlist duplication:
_VALID_ACTIONS(validator) and theif/elifchain ingr00t_inference()are now two lists of the same thing. Adding a new action in only one place fails open or silently rejects — worth a single-source-of-truth refactor or a defensive test that asserts they match. - Scope discipline: changelog entry mixes "Added / Changed / Fixed" but several items are arguably refactors (factoring
_PGREP_INFERENCE_PORT_FMT, narrowing exceptions). Not blocking, just makes the release-notes pass harder. - Breaking change documentation: the
hostdefault change from0.0.0.0to127.0.0.1is correctly flagged BREAKING in CHANGELOG, but the migration sketch ("passhost=\"0.0.0.0\"explicitly") could mention thatlifecycle/start/restartauto-flip will mask the breakage for the common path; remote-bind users connecting from another host are the only group affected. Helpful to call out who is/isn't affected.
Verification suggestions
# Sanity-check the validator against representative LLM-controlled inputs:
python -c "
from strands_robots.tools.gr00t_inference import validate_inputs
import pytest
# image:tag@sha256 (combined pinning) - currently rejected, should pass:
try:
validate_inputs(action='build_image', data_config='x', embodiment_tag='x', port=5555,
host='127.0.0.1', vit_dtype='fp8', llm_dtype='nvfp4', dit_dtype='fp8',
checkpoint_path=None, trt_engine_path='x', container_name=None, protocol='n1.5',
image_name='nvcr.io/nvidia/gr00t:n1.7@sha256:' + 'a'*64)
print('combined tag+digest: OK')
except ValueError as e:
print('combined tag+digest: REJECTED ->', e)
"
# Confirm the host-fallback Linux gate works on macOS/BSD:
STRANDS_BASE_DIR=/tmp/sb python -c "
from strands_robots.tools.gr00t_inference import _stop_service
import sys; sys.platform = 'darwin' # only meaningful on a real macOS box
print(_stop_service(5555))
"
# Round-trip the full happy path through a docker server (integration):
hatch run test-integ -k gr00t| if volumes is not None: | ||
| for vol_host, vol_container in volumes.items(): | ||
| _validate_path(vol_host, "volumes key (host path)", reject_colon=True) | ||
| _validate_path(vol_container, "volumes value (container path)", reject_colon=True) |
There was a problem hiding this comment.
Validation gap: hf_local_dir (and source_dir) bypass the reject_colon guard that you correctly apply to volumes.
The PR description and the changelog assert these are "argv-safe, no injection risk". But at _start_container (line ~1498) hf_local_dir is interpolated into:
effective_volumes[str(Path(hf_local_dir).expanduser())] = "/data/checkpoints"
# ...
cmd.extend(["-v", f"{host_path}:{container_path}"])If an LLM passes hf_local_dir="/foo:/etc:ro", the resulting -v /foo:/etc:ro:/data/checkpoints is parsed by docker -v as host=/foo container=/etc options=ro:/data/checkpoints, escalating to a read-only mount of the host's /etc. That's the exact attack you guarded against in volumes with reject_colon=True (line 232–233). Same surface, same defence needed.
Suggested fix: in the _image_only_actions branch (and the full-validation branch around line 277), add
if hf_local_dir is not None:
_validate_path(hf_local_dir, "hf_local_dir", reject_colon=True)
if source_dir is not None:
_validate_path(source_dir, "source_dir")and thread hf_local_dir/source_dir through validate_inputs()'s signature. Per AGENTS.md > Review Learnings (PR #92) > LLM Input Safety: "Reject shell metacharacters in paths — also reject .. path traversal components. Apply even when using argv-style subprocess."
| r"(?::[0-9]{1,5})?" # optional registry port (:5000) | ||
| r"(?:/[a-zA-Z0-9][a-zA-Z0-9._\-]*)*" # path components (/org/img) | ||
| r"(?::[a-zA-Z0-9][a-zA-Z0-9._\-]*" # option A: :tag | ||
| r"|@sha256:[a-f0-9]{64})?$" # option B: @sha256:digest (mutually exclusive with tag) |
There was a problem hiding this comment.
_DOCKER_IMAGE_RE rejects the combined name:tag@sha256:digest reference, which Docker accepts and is the recommended supply-chain pinning form.
Quick check:
>>> _DOCKER_IMAGE_RE.match("nvcr.io/nvidia/gr00t:n1.7@sha256:" + "a"*64)
NoneThe alternation (?::tag|@sha256:digest)? makes tag and digest mutually exclusive, but Docker's reference grammar is name[:tag][@digest] — you can pin both at once. The comment on line 61 even calls these "option A / option B (mutually exclusive)" which is a divergence from upstream Docker semantics, not a deliberate restriction.
This matters because the AGENTS.md review-learnings (PR #92) explicitly recommend digest pinning for supply-chain security; rejecting image:tag@sha256:... discourages the recommended form for users who also want a human-readable tag.
Suggested fix: make tag and digest both optional and additive:
r"(?::[a-zA-Z0-9][a-zA-Z0-9._\-]*)?" # optional :tag
r"(?:@sha256:[a-f0-9]{64})?$" # optional @digestAlso worth a regression test pinning the accepted form.
| # service to bind all interfaces inside the container. Only auto-flip if the | ||
| # user accepted the default (sentinel was None → resolved to 127.0.0.1). | ||
| # Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host). | ||
| if host == "127.0.0.1" and not host_was_explicit: |
There was a problem hiding this comment.
Auto-flip is hard-coded against the IPv4 loopback string and silently misbehaves for IPv6 loopback.
if host == "127.0.0.1" and not host_was_explicit:
...
host = "0.0.0.0"validate_inputs() accepts ::1 (IPv6 loopback) via ipaddress.ip_address(), but this branch only checks "127.0.0.1". So a user passing host="::1" bypasses the auto-flip, the service binds to ::1 inside the container, and docker -p {port}:{port} silently fails to forward traffic from outside — exactly the failure mode the auto-flip was added to prevent.
Suggested guard:
try:
is_loopback = ipaddress.ip_address(host).is_loopback
except ValueError:
is_loopback = False
if is_loopback and not host_was_explicit:
...
host = "0.0.0.0" # or "::" for IPv6 if the rest of the stack supports itMinimum: add a regression test that calls _start_service(host="::1", host_was_explicit=False, ...) and asserts the warning fires — currently nothing covers this branch.
| # user accepted the default (sentinel was None → resolved to 127.0.0.1). | ||
| # Users who explicitly pass host="127.0.0.1" get it honoured (e.g. --network=host). | ||
| if host == "127.0.0.1" and not host_was_explicit: | ||
| import logging as _logging |
There was a problem hiding this comment.
Repeated import logging inside function bodies violates AGENTS.md § Key Conventions #4 ("Imports at file top — unless lazy-loading heavy deps with documented reason").
Three sites in this PR do import logging (or import logging as _logging) inside a function body: lines 828, 866, and here. logging is stdlib and not heavy — there's no documented reason for the lazy import.
Suggested fix: move import logging to the module top with the other imports and define a single module-level logger = logging.getLogger(__name__). The three call sites then become logger.warning(...) / logger.debug(...) and you drop the per-call dictionary lookup on logging.getLogger.
| # Complete allowlist of valid actions for the tool. | ||
| _VALID_ACTIONS = frozenset( | ||
| { | ||
| "find_containers", |
There was a problem hiding this comment.
_VALID_ACTIONS and the if/elif dispatch chain in gr00t_inference() are now two parallel allowlists that must be kept in sync.
The new # pragma: no cover line at line ~695 (return {"status": "error", "message": f"Unknown action: {action}"}) acknowledges this: the validator's allowlist makes the dispatcher's fallback unreachable. But the inverse failure mode is silent — a future PR adds "benchmark" to the dispatcher and forgets _VALID_ACTIONS, and the validator rejects it before dispatch with no test catching the regression.
Lightweight fix: add a unit test that walks the source AST (or a pytest parametrise over _VALID_ACTIONS) asserting that every name in _VALID_ACTIONS resolves to a non-error branch in gr00t_inference() for a stub-call, and vice-versa. Cheap pin against drift.
|
Closing in favour of #196 (consolidated rewrite). WhyThis PR accumulated 31 commits and 91 unresolved review threads across 11 rounds, and #192 (camera-recording fix, merged into What replaces it#196 is branched fresh from current
Diff comparison
The next PR is opened as a draft and will not request review until R2/R5/R6/R7/R8 land. Apologies to reviewers who walked the long road on this one — the consolidation is the right reset. 🤖 Closed by autonomous orchestrator on behalf of @cagataycali. The receiving rewrite is at #196. |
…cting numeric tags (addresses thread on line 78) The port-capture group in _DOCKER_IMAGE_RE greedily matched :digits even without a following /path component, causing the port-range check to reject valid name:tag refs where the tag was purely numeric (e.g. myimage:0, myimage:65536, myimage:99999, gr00t:23456). Fix: add (?=/) lookahead so :digits is only interpreted as a registry port when followed by a path component. Without the lookahead the ambiguous name:digits pattern is treated as name:tag (correct). Also fixes CHANGELOG header to reference strands-labs#196 (this PR) instead of the superseded strands-labs#90.
Summary
Stricter input validation for the
gr00t_inferencetool — centralised in a singlevalidate_inputs()function with per-action scoping.Changes
validate_inputs()— single entry-point, independently testable.find_containers,list,status,stop) validate onlyport/host/protocol; mutating actions validate the full parameter surface.localhost,host.docker.internal). Rejects typos like127.0.01via_ALL_NUMERIC_RE(multi-label only; single-label numerics are valid per RFC-1123).hostchanged to127.0.0.1(AGENTS.md compliance). Container actions (start/restart/lifecycle) auto-flip to0.0.0.0internally since Docker-pport-publish requires bind-all inside the container._PGREP_INFERENCE_PORT_FMTand_is_gr00t_*_processnow match bothinference_service.py(N1.5/N1.6) andgr00t.eval.run_gr00t_server(N1.7). Closes the N1.7 stop/status identification gap.inference_service.pyORgr00t.eval.run_gr00t_server+python/gr00t+--portflag in cmdline. Prevents false-matching editors/log-tailers.repo_url,repo_tag,policy_namestarting with-are rejected (prevents git flag injection via subprocess argv).(OSError, subprocess.SubprocessError, UnicodeDecodeError).PermissionErrorlogs at WARNING; other errors log at DEBUG./and\separators.Validation Scope
Validated:
port,host,protocol,data_config,embodiment_tag,container_name, TRT dtypes,checkpoint_path,trt_engine_path,image_name,volumes,container_command,repo_url/repo_tag/policy_name(dash-prefix only).Not validated (argv-safe, no injection risk):
hf_repo,hf_subfolder,hf_local_dir,source_dir.Tests
80+ validation tests across 17 test classes covering all branches, integration wiring, regression scenarios, N1.7 identification, and the new security guards.