Skip to content

docs(integrations): add LlamaIndex integration guide#285

Open
Johnny-zbb wants to merge 3 commits into
TencentCloud:masterfrom
Johnny-zbb:docs/add-llamaindex-integration
Open

docs(integrations): add LlamaIndex integration guide#285
Johnny-zbb wants to merge 3 commits into
TencentCloud:masterfrom
Johnny-zbb:docs/add-llamaindex-integration

Conversation

@Johnny-zbb
Copy link
Copy Markdown

@Johnny-zbb Johnny-zbb commented May 16, 2026

Closes #244

@fslongjin
Copy link
Copy Markdown
Member

/cubebot review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 16, 2026

Claude finished @fslongjin's task in 57s —— View job


Review: LlamaIndex Integration Guide

This is a well-structured documentation PR that adds a bilingual (EN/ZH) LlamaIndex integration guide following the project's contribution guidelines. The code examples are clear and the guide covers the full lifecycle from setup to advanced use cases.

What's good

  • Bilingual compliance: Both English and Chinese files submitted as required by the contribution guidelines.
  • Follows the template: Covers all required sections (Integration Target, Prerequisites, Steps, Key Code, Caveats, References).
  • Index pages updated: Both docs/guide/integrations/index.md and docs/zh/guide/integrations/index.md have the new entry appended.
  • Useful Before/After comparison: The exec()Sandbox.run_code() diff clearly communicates the value proposition.
  • Practical advanced scenarios: Network isolation, persistent sandbox, and pause/resume sections add real depth.
  • Good caveats section: Documents cold start latency, SDK scope limits, and network policy constraints honestly.

Issues to address

  1. Unused import json in the create_cube_tool snippet (docs/guide/integrations/llamaindex.md:53 and the Chinese equivalent):

    import json          # ← unused in this function
    from llama_index.core.tools import FunctionTool
    from cubesandbox import Sandbox, Config

    The json module is only used in the "Going Further" network policy snippet (a separate code block), not in create_cube_tool itself. Remove this import from the create_cube_tool block, or move it to the network policy snippet.

  2. Sandbox.create(metadata=...) without config (llamaindex.md:192-196 and ZH equivalent):

    with Sandbox.create(
        metadata={"network-policy": "deny-all"}
    ) as sb:

    Earlier examples always pass config=cfg to Sandbox.create(). If the sandbox requires Config (template_id, api_url, etc.), this code would fail at runtime. Either:

    • Add a note that Config defaults are used, or
    • Include a config=cfg parameter alongside metadata.
  3. Optional: template_id example value — The doc uses python:3.12-slim as the example template ID. If Cube Sandbox uses a different ID format than Docker image tags, this could mislead users. Worth verifying the actual template ID format.

  4. Optional: Redundant stop-word filter in the WordFreq example (llamaindex.md:157,161):

    self.skip = {'the','a','an','is','are','was','were','in','on','at','to','of','for'}
    # ...
    if w and w not in self.skip and len(w) > 3:

    Since len(w) > 3 already excludes all 1-3 character words, most entries in self.skip (the, a, an, is, in, on, at, to, of, for) will never match. Only are and was (length 4) provide actual filtering. Consider either removing the len(w) > 3 check or trimming the skip set to only 4+ letter stop words — whichever better reflects your intent.


Overall, this is a solid contribution. The content is accurate, well-organized, and follows project conventions. The issues above are minor and relate mostly to code quality in the examples.

@kinwin-ustc
Copy link
Copy Markdown
Collaborator

Can you share your environment? The cold start taking 1-2 seconds is too slow for us, better stateful sandbox processing, as well as domain-based network access control, are in our plan and will be released soon

@fslongjin
Copy link
Copy Markdown
Member

@Johnny-zbb Thanks for submitting this LlamaIndex integration guide — the document structure is solid, and the bilingual (EN/ZH) coverage follows our contribution guidelines.

I want to emphasize that integration tutorials must be verified against a real CubeSandbox deployment before submission, otherwise incorrect examples can mislead other users and cause significant confusion. After reviewing the code examples in this PR, I found several issues that suggest they haven't been run against an actual CubeSandbox instance:

Critical: Template ID format is incorrect

The document uses template_id="python:3.12-slim" in multiple places, but CubeSandbox template IDs are auto-generated in the format tpl-<hex> (e.g. tpl-748094d2f2374b0a8a37e6ec), not Docker image references. Users need to create a template via cubemastercli tpl create-from-image first, then copy the actual template_id from the command output. Passing "python:3.12-slim" would cause the API to return a 404 (TemplateNotFoundError).

See: Creating Templates from OCI Images — Step 3 shows the correct usage: export CUBE_TEMPLATE_ID=tpl-xxx.

Critical: Bare image lacks envd daemon

Even if someone tried to create a template from the python:3.12-slim image, it doesn't include envd — the CubeSandbox protocol endpoint — and would fail the readiness probe (:49983/health → 204). The correct approach is to:

  • Build on top of cubesandbox-base (FROM ghcr.io/tencentcloud/cubesandbox-base:2026.16), or
  • Inject envd and cube-entrypoint.sh via COPY --from=cubesandbox-base in your Dockerfile

See: Bring Your Own Image (envd)

Other code issues

  1. Sandbox.create(metadata=...) missing config — The network isolation examples in "Going Further" call Sandbox.create(metadata={"network-policy": "deny-all"}) without passing template or config, which raises ValueError when env vars aren't set.
  2. Unused import json — The create_cube_tool function imports json but never uses it (it's only used in a later network policy snippet).
  3. Prefer the SDK's native network parameter — Use Sandbox.create(network={"allow_out": [...], "deny_out": [...]}) instead of metadata={"network-policy": ...}. The network parameter is type-checked and more explicit.

Suggestions

To ensure document quality, I'd recommend:

  • Running the examples end-to-end against a real CubeSandbox deployment
  • Replacing all instances of "python:3.12-slim" with either a real tpl-* template ID or the <your-template-id> placeholder, and noting that readers must create a template first
  • Adding a template creation step to the Prerequisites section

Happy to help if you have questions about any of these points!

- Replace invalid template_id='python:3.12-slim' with <your-template-id> placeholder
- Add template creation instructions to Prerequisites section
- Use SDK native network parameter for network isolation (network={"allow_out": [...]})
- Add config=cfg to Sandbox.create() calls in network examples
- Remove unused 'import json' statement
- Fix metadata={} usage to SDK-native parameters
- Complete Python example for LlamaIndex + CubeSandbox integration
- Demonstrates RAG workflow with secure code execution
- Includes network isolation examples
- Bilingual README (English + Chinese)
- Environment configuration templates
@Johnny-zbb
Copy link
Copy Markdown
Author

Can you share your environment? The cold start taking 1-2 seconds is too slow for us, better stateful sandbox processing, as well as domain-based network access control, are in our plan and will be released soon

Do you mean that you also wrote the guide on adding LlamaIndex integration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[good first issue] docs: Help us build integration guides with mainstream AI agent frameworks

3 participants