support docker registry login#537
Conversation
2e9d867 to
7a53504
Compare
7a53504 to
0a88ac5
Compare
dengwx2026
left a comment
There was a problem hiding this comment.
Code Review: Support Docker Registry Login
Overall
This PR adds Docker registry login support via a hook mechanism, allowing users to pass registry credentials when starting a sandbox. The approach of using the existing DeploymentHook infrastructure is reasonable. However, there are several issues that should be addressed before merging.
Security Issues
1. Credentials stored in plain text in DockerDeploymentConfig (Critical)
registry_username and registry_password are stored as plain-text fields on DockerDeploymentConfig, which is a Pydantic model that can be serialized/logged. The project already has an encryption utility (_aes_encryptor used in SandboxManager) — credentials should be encrypted in transit and at rest, or at minimum the password field should be marked with repr=False / exclude=True to prevent accidental exposure in logs or serialized output.
# Current — password is visible in repr/json
registry_password: str | None = None
# Suggested
registry_password: str | None = Field(default=None, repr=False, exclude=True)2. docker login credentials may persist on disk
docker login writes credentials to ~/.docker/config.json by default. In a shared environment (especially admin service nodes), this means:
- Multiple sandboxes could share/overwrite each other's credentials
- Credentials persist after the sandbox is torn down
Consider using --config with a temporary directory per login, or using docker pull with --registry-mirror / credential helpers instead.
Design Issues
3. asyncio.new_event_loop() in DockerLoginHook — unnecessary and fragile
In docker_login.py:29-33, a new event loop is created just to call the async method ImageUtil.parse_registry_and_others(). This is problematic because:
_pull_image()runs in an executor thread (line 298 ofdocker.py), so there's no running event loop — creating one works but is wastefulparse_registry_and_othersis a pure string-parsing function that has no reason to be async- If the hook is ever called from an async context in the future,
new_event_loop()will cause issues
Suggestion: Either make parse_registry_and_others sync (it does no I/O), or extract the registry-parsing logic into a simple sync helper and call it directly.
4. Hook should handle errors gracefully
If DockerUtil.login() fails (bad credentials, network timeout), the exception will propagate and abort the entire sandbox start. The hook should either:
- Catch and re-raise as a domain-specific error (e.g.,
DockerLoginError) with a clear message - Or document that login failure is intentionally fatal
Currently, a subprocess.CalledProcessError or TimeoutExpired will bubble up with a generic stack trace.
5. No docker logout or cleanup
There's no corresponding cleanup when the sandbox is destroyed. The DeploymentHook interface only has on_custom_step — consider whether a lifecycle method like on_cleanup is needed, or if the credential persistence issue (point 2) makes this moot.
Code Quality
6. DockerUtil logger inconsistency
The new login method in rock/utils/docker.py uses the project logger:
logger = logging.getLogger(__name__)But the existing code in this file already uses this pattern. However, this differs from the project convention stated in CLAUDE.md:
from rock.logger import init_logger
logger = init_logger(__name__)The existing DockerUtil already used logging.getLogger, so this is a pre-existing issue, but worth noting.
7. SandboxConfig in SDK gains credential fields but no docs/validation
registry_username and registry_password are added to SandboxConfig but there's no validation that they must be provided together (both or neither). A model_validator should enforce this:
@model_validator(mode="after")
def validate_registry_credentials(self):
if bool(self.registry_username) != bool(self.registry_password):
raise ValueError("registry_username and registry_password must both be provided or both omitted")
return selfThe same validation should be on SandboxStartRequest and DockerDeploymentConfig.
Missing Tests
8. No tests for the new functionality
There are no unit tests for:
DockerLoginHook.on_custom_step()— should test that login is called only on the "Pulling docker image" stepDockerUtil.login()— should test success, failure, and timeout cases (with mocked subprocess)- Validation that credentials flow correctly from
SandboxStartRequest→DockerDeploymentConfig→DockerLoginHook - Edge cases: image with no registry prefix, image with port in registry
Minor
9. String matching for hook dispatch is fragile
The hook dispatches on exact string match "Pulling docker image". If this message ever changes in docker.py, the hook silently stops working. Consider using an enum or constant shared between the hook and DockerDeployment.
Summary
| Category | Items |
|---|---|
| Must fix | #1 (credential exposure), #3 (unnecessary event loop), #7 (paired validation), #8 (tests) |
| Should fix | #2 (credential persistence), #4 (error handling), #9 (fragile string matching) |
| Nice to have | #5 (cleanup), #6 (logger convention) |
The feature direction is good and the hook-based approach is clean. The main concerns are around security (credential handling) and robustness (error handling, validation, tests).
|
必须修复:
建议修复: |
99c1ddd to
b4886ad
Compare
b4886ad to
2607bb4
Compare
Release notes: - Add Docker registry login (#537), phase failure metrics (#552), SWE-bench evaluation demo (#508), count.total metric fix (#527) - Update contributor stats with new entries (Issac-Newton, updated jiaoliao/dengwx) README: - Fix broken Quick Start links (add Getting Started/User Guides/References path prefix) - Fix LICENSE badge link: main -> master (main branch 404) - Fix Configuration doc links: local path -> docs site URL - Add v1.3.0 to News and Latest Updates sections Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…551) * docs: update v1.3.0 release notes with missing changes and PR links - Add missing features: K8s Operator, Docker validation, Sandbox Client close_session, Kata runtime, namespace metrics, image_os, monitor endpoint, auto clear seconds - Add missing enhancements: architecture refactoring, deployment template syntax, proxy HTTP support, configuration changes - Add missing bug fixes: custom install workdir, model service warnings, local admin start, scheduler task detection - Add clickable PR links for all changelog items (github.com/alibaba/ROCK) - Add Contributors section with per-author commit and line change statistics - Fix Examples Directory link to point to alibaba/ROCK Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add recent PRs to v1.3.0 release notes and fix README broken links Release notes: - Add Docker registry login (#537), phase failure metrics (#552), SWE-bench evaluation demo (#508), count.total metric fix (#527) - Update contributor stats with new entries (Issac-Newton, updated jiaoliao/dengwx) README: - Fix broken Quick Start links (add Getting Started/User Guides/References path prefix) - Fix LICENSE badge link: main -> master (main branch 404) - Fix Configuration doc links: local path -> docs site URL - Add v1.3.0 to News and Latest Updates sections Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* support docker login * hidden password and add ut
refs #535