Skip to content

support docker registry login#537

Merged
dengwx2009 merged 2 commits intoalibaba:masterfrom
zhongwen666:dev/support_docker_registry_login_0227
Mar 3, 2026
Merged

support docker registry login#537
dengwx2009 merged 2 commits intoalibaba:masterfrom
zhongwen666:dev/support_docker_registry_login_0227

Conversation

@zhongwen666
Copy link
Collaborator

refs #535

@zhongwen666 zhongwen666 force-pushed the dev/support_docker_registry_login_0227 branch from 2e9d867 to 7a53504 Compare February 28, 2026 08:16
@zhongwen666 zhongwen666 force-pushed the dev/support_docker_registry_login_0227 branch from 7a53504 to 0a88ac5 Compare February 28, 2026 08:21
Copy link
Collaborator

@dengwx2026 dengwx2026 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Support Docker Registry Login

Overall

This PR adds Docker registry login support via a hook mechanism, allowing users to pass registry credentials when starting a sandbox. The approach of using the existing DeploymentHook infrastructure is reasonable. However, there are several issues that should be addressed before merging.


Security Issues

1. Credentials stored in plain text in DockerDeploymentConfig (Critical)

registry_username and registry_password are stored as plain-text fields on DockerDeploymentConfig, which is a Pydantic model that can be serialized/logged. The project already has an encryption utility (_aes_encryptor used in SandboxManager) — credentials should be encrypted in transit and at rest, or at minimum the password field should be marked with repr=False / exclude=True to prevent accidental exposure in logs or serialized output.

# Current — password is visible in repr/json
registry_password: str | None = None

# Suggested
registry_password: str | None = Field(default=None, repr=False, exclude=True)

2. docker login credentials may persist on disk

docker login writes credentials to ~/.docker/config.json by default. In a shared environment (especially admin service nodes), this means:

  • Multiple sandboxes could share/overwrite each other's credentials
  • Credentials persist after the sandbox is torn down

Consider using --config with a temporary directory per login, or using docker pull with --registry-mirror / credential helpers instead.


Design Issues

3. asyncio.new_event_loop() in DockerLoginHook — unnecessary and fragile

In docker_login.py:29-33, a new event loop is created just to call the async method ImageUtil.parse_registry_and_others(). This is problematic because:

  • _pull_image() runs in an executor thread (line 298 of docker.py), so there's no running event loop — creating one works but is wasteful
  • parse_registry_and_others is a pure string-parsing function that has no reason to be async
  • If the hook is ever called from an async context in the future, new_event_loop() will cause issues

Suggestion: Either make parse_registry_and_others sync (it does no I/O), or extract the registry-parsing logic into a simple sync helper and call it directly.

4. Hook should handle errors gracefully

If DockerUtil.login() fails (bad credentials, network timeout), the exception will propagate and abort the entire sandbox start. The hook should either:

  • Catch and re-raise as a domain-specific error (e.g., DockerLoginError) with a clear message
  • Or document that login failure is intentionally fatal

Currently, a subprocess.CalledProcessError or TimeoutExpired will bubble up with a generic stack trace.

5. No docker logout or cleanup

There's no corresponding cleanup when the sandbox is destroyed. The DeploymentHook interface only has on_custom_step — consider whether a lifecycle method like on_cleanup is needed, or if the credential persistence issue (point 2) makes this moot.


Code Quality

6. DockerUtil logger inconsistency

The new login method in rock/utils/docker.py uses the project logger:

logger = logging.getLogger(__name__)

But the existing code in this file already uses this pattern. However, this differs from the project convention stated in CLAUDE.md:

from rock.logger import init_logger
logger = init_logger(__name__)

The existing DockerUtil already used logging.getLogger, so this is a pre-existing issue, but worth noting.

7. SandboxConfig in SDK gains credential fields but no docs/validation

registry_username and registry_password are added to SandboxConfig but there's no validation that they must be provided together (both or neither). A model_validator should enforce this:

@model_validator(mode="after")
def validate_registry_credentials(self):
    if bool(self.registry_username) != bool(self.registry_password):
        raise ValueError("registry_username and registry_password must both be provided or both omitted")
    return self

The same validation should be on SandboxStartRequest and DockerDeploymentConfig.


Missing Tests

8. No tests for the new functionality

There are no unit tests for:

  • DockerLoginHook.on_custom_step() — should test that login is called only on the "Pulling docker image" step
  • DockerUtil.login() — should test success, failure, and timeout cases (with mocked subprocess)
  • Validation that credentials flow correctly from SandboxStartRequestDockerDeploymentConfigDockerLoginHook
  • Edge cases: image with no registry prefix, image with port in registry

Minor

9. String matching for hook dispatch is fragile

The hook dispatches on exact string match "Pulling docker image". If this message ever changes in docker.py, the hook silently stops working. Consider using an enum or constant shared between the hook and DockerDeployment.


Summary

Category Items
Must fix #1 (credential exposure), #3 (unnecessary event loop), #7 (paired validation), #8 (tests)
Should fix #2 (credential persistence), #4 (error handling), #9 (fragile string matching)
Nice to have #5 (cleanup), #6 (logger convention)

The feature direction is good and the hook-based approach is clean. The main concerns are around security (credential handling) and robustness (error handling, validation, tests).

@dengwx2026
Copy link
Collaborator

必须修复:

  1. 凭证明文暴露 — registry_password 存储在 Pydantic model 中,可能被序列化/日志打印,应加
    repr=False, exclude=True
  2. 不必要的 asyncio.new_event_loop() — parse_registry_and_others 是纯字符串解析,没有
    I/O,不应该是 async 的;在 hook 中创建新 event loop 是多余且脆弱的做法
  3. 缺少配对校验 — registry_username 和 registry_password 必须同时提供或同时为空,需要
    model_validator
  4. 缺少测试 — 没有任何单元测试覆盖新功能

建议修复:
5. docker login 凭证会持久化到磁盘 (~/.docker/config.json),在共享环境中有安全风险
6. Hook 错误处理不足 — login 失败时异常会以 CalledProcessError
形式直接上抛,缺少领域特定的错误处理
7. 字符串匹配调度脆弱 — hook 依赖精确匹配 "Pulling docker image"
字符串,建议用枚举或共享常量

@zhongwen666 zhongwen666 requested a review from dengwx2026 March 2, 2026 12:21
@zhongwen666 zhongwen666 force-pushed the dev/support_docker_registry_login_0227 branch 2 times, most recently from 99c1ddd to b4886ad Compare March 3, 2026 02:42
@zhongwen666 zhongwen666 force-pushed the dev/support_docker_registry_login_0227 branch from b4886ad to 2607bb4 Compare March 3, 2026 03:23
@dengwx2009 dengwx2009 merged commit 78ab209 into alibaba:master Mar 3, 2026
6 checks passed
dengwx2026 added a commit that referenced this pull request Mar 3, 2026
Release notes:
- Add Docker registry login (#537), phase failure metrics (#552),
  SWE-bench evaluation demo (#508), count.total metric fix (#527)
- Update contributor stats with new entries (Issac-Newton, updated jiaoliao/dengwx)

README:
- Fix broken Quick Start links (add Getting Started/User Guides/References path prefix)
- Fix LICENSE badge link: main -> master (main branch 404)
- Fix Configuration doc links: local path -> docs site URL
- Add v1.3.0 to News and Latest Updates sections

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dengwx2009 pushed a commit that referenced this pull request Mar 3, 2026
…551)

* docs: update v1.3.0 release notes with missing changes and PR links

- Add missing features: K8s Operator, Docker validation, Sandbox Client close_session,
  Kata runtime, namespace metrics, image_os, monitor endpoint, auto clear seconds
- Add missing enhancements: architecture refactoring, deployment template syntax,
  proxy HTTP support, configuration changes
- Add missing bug fixes: custom install workdir, model service warnings, local admin start,
  scheduler task detection
- Add clickable PR links for all changelog items (github.com/alibaba/ROCK)
- Add Contributors section with per-author commit and line change statistics
- Fix Examples Directory link to point to alibaba/ROCK

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add recent PRs to v1.3.0 release notes and fix README broken links

Release notes:
- Add Docker registry login (#537), phase failure metrics (#552),
  SWE-bench evaluation demo (#508), count.total metric fix (#527)
- Update contributor stats with new entries (Issac-Newton, updated jiaoliao/dengwx)

README:
- Fix broken Quick Start links (add Getting Started/User Guides/References path prefix)
- Fix LICENSE badge link: main -> master (main branch 404)
- Fix Configuration doc links: local path -> docs site URL
- Add v1.3.0 to News and Latest Updates sections

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
zhongwen666 added a commit to zhongwen666/ROCK that referenced this pull request Mar 3, 2026
* support docker login

* hidden password and add ut
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants