Skip to content

feat(multimodal): chat attachments end-to-end (schema → upload → LLM & tool forwarding)#1029

Open
dylan-savage wants to merge 21 commits into
feat/chat-sessionsfrom
feat/multimodal-chat-sessions
Open

feat(multimodal): chat attachments end-to-end (schema → upload → LLM & tool forwarding)#1029
dylan-savage wants to merge 21 commits into
feat/chat-sessionsfrom
feat/multimodal-chat-sessions

Conversation

@dylan-savage
Copy link
Copy Markdown
Collaborator

@dylan-savage dylan-savage commented May 29, 2026

Summary

Adds image/file attachments to chat, carried end-to-end: from the chat-ui paperclip → uploaded to the per-user FileStore → through the engine → onto the LLM call (provider-native multimodal blocks) and tool calls. Spans 2 client SDKs (TypeScript + Python), the agent base, the rocketlib dispatcher, and all 13 llm_* provider nodes.

Stacked on feat/chat-sessions — review/merge that first (see base-branch note below).

What's included (by area)

  • Schema — new Attachment type + Question.attachments, mirrored across the TS and Python SDKs and the chat-ui types; Attachment re-exported at the top level of rocketride.
  • Chat send / persistChat.send / client.chat carry attachments; persisted and rehydrated (TS + Python).
  • LLM pathllm_translate.py turns attachments into provider-native content blocks; chat.py dispatches by provider_shape (openai / anthropic / gemini / bedrock); each of the 13 llm_* nodes gets a _chat_blocks override.
  • Agent tool-call pathQuestion.attachments flow onto AgentContext; an attachment→tool-slot picker fills format: rocketride-attachment inputs; rocketlib resolves those paths to {path, mime, bytes} at tool.invoke.
  • chat-ui — paperclip button, chunked upload helper, pending pills, render on the user bubble.

Key design decisions

  • LLM provider forwarding — attachments are forwarded to the LLM call only on perception-capable frameworks (DeepAgent, LangChain) and dropped-and-warned on blind ones (CrewAI, Wave), where the framework can't carry multimodal blocks.
  • provider_shape dispatch — read off the live Chat driver, not node instances; only anthropic/gemini/bedrock declare non-default shapes, the other 10 inherit openai.
  • Tool-side resolution in rocketlib — the dispatcher transparently swaps a format: rocketride-attachment path string for {path, mime, bytes} (top-level properties only), so tool authors write no FileStore boilerplate.
  • No global file-store factory — nodes that accept attachments build their own store in beginGlobal() (the tool_filesystem pattern); the resolver reads self._file_store. Keeps rocketlib dependency-free.

Testing

Kept Python contract/unit tests: test_llm_translate (21), test_attachment_binding (5), test_attachments_routing (3, fixed), test_dispatch_tool_attachments (3, fixed), plus the client SDK tests.

⚠️ The filters.py / chat.py runtime changes — and the canonical (frozen-engine) run of test_dispatch_tool_attachments — only take effect after an engine rebuild (./builder).

Base-branch note

This PR targets feat/chat-sessions (not develop) so the diff is exactly the multimodal feature (51 files). feat/chat-sessions is not yet merged — merge it first, then this.

🤖 Generated with Claude Code

dylan-savage and others added 21 commits May 26, 2026 10:33
Introduce Attachment as a first-class schema type (peer file pattern,
matching Doc.ts). Add optional Attachment[] to QuestionHistory and a
default-empty Attachment[] field to Question, wired through toDict /
fromDict for round-trip serialization. TDD §6.1–§6.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the TS schema addition: introduce Attachment as a peer Pydantic
model (size_bytes >= 0), add an optional List[Attachment] to
QuestionHistory and a default-empty List[Attachment] on Question, and
re-export Attachment from rocketride.schema. TDD §6.1–§6.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…c.ts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Python)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a 2 MB-chunked upload helper that writes via fsOpen/fsWrite/fsClose
and returns an Attachment record. ChatInput grows a paperclip button,
pending pills, and a second optional arg on onSend. Message renders pill
rows on user bubbles.

Activation gated on optional client + chatId props on ChatInput so the
existing parent components remain untouched; user wires the props through
when they're ready.

Adds a minimal Jest+jsdom setup (test runner did not exist for chat-ui
before this slice) so the upload helper unit tests can run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `attachments` field to AgentContext (frozen tuple) and stamp it from
`Question.attachments` at run_agent entry. Thread it onto any synthesized
Question in `call_llm` / `call_llm_json` so the provider-side translators
in LLMBase can auto-forward to the underlying API.

TDD §8.1. Always-assign per Q-D1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LLMBase declares provider_shape ('openai'|'anthropic'|'gemini'|'bedrock')
and provider_name (telemetry tag). ChatBase.chat() routes Questions with
attachments through the per-shape translator; text-only Questions use
the existing single-string fast path unchanged.

Ships the openai shape (covers most providers per TDD §9). Drop-and-warn
for unsupported MIMEs; FileStore IO errors propagate (Q-E1).

TDD §7.1, §7.2, §7.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the three remaining provider block shapes to llm_translate plus their
dispatch branches in ChatBase.chat(). Same drop-and-warn discipline for
unsupported MIMEs; FileStore IO errors propagate per Q-E1.

TDD §7.2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Set the class attributes added in Slice E on each in-tree LLM node so
ChatBase.chat() routes attachments through the right translator shape:
* anthropic: llm_anthropic
* gemini: llm_gemini
* bedrock: llm_bedrock (Nova; Claude-path is a follow-up)
* openai-compat: llm_openai + 9 others sharing the openai block API

Vision-only LLMs (llm_vision_*) intentionally not wired — they don't
accept a Question input field today (Q-G1).

LangChain HumanMessage(content=blocks) pre-translation in _chat_blocks
is deferred — every node still has the default NotImplementedError
guard until the per-node override lands; class-attribute set unblocks
that follow-up.

TDD §7.4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cache @tool_function descriptors at tool.query time and walk the
top-level inputSchema.properties at tool.invoke to resolve any
string-typed property with format: "rocketride-attachment" to a
{path, mime, bytes} dict via the per-account FileStore.

Tool authors who declare attachment-typed inputs receive already-
loaded bytes without writing FileStore boilerplate per tool.

Top-level walk only per Q-H2 (nested/array shapes deferred). The
FileStore is sourced from self._file_store (preferred, injected
by host) or self.instance.fileStore — we do NOT auto-construct a
per-account FileStore here; the IInstance host wires it.

Flag for engine-team review at PR time: this edit lives in
rocketlib-python which the engine vendors.

TDD §10.3, Q-H1, Q-H2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ships tool_sha256 as a fixture for upcoming agent integration tests
(Slices I and J) and a contract test that the dispatcher's path→bytes
resolution works end-to-end through the @tool_function decorator.

The tool declares one format: "rocketride-attachment" input slot and
relies on the rocketlib dispatcher to swap the path string for a
{path, mime, bytes} dict before sha256() runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks the first Attachment from AgentContext.attachments matching each
top-level format: rocketride-attachment property of a tool's
inputSchema. Returns path-by-reference (the dispatcher resolves to
bytes per Slice H). x-rocketride-mimes patterns gate eligibility
(fnmatch); absent patterns accept any mime.

TDD §6.5, §10.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
At the HostTool._run construction site, fill any unset attachment-typed
slot with the picker's path-by-reference choice. LLM-decided values
win on collision (setdefault semantics). The dispatcher (Slice H) then
resolves the path to bytes before calling the underlying tool method.

TDD §8.1, §10.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CrewAI's Task(description=...) and Wave's q.addInstruction() are
plain-string at the upstream API.  Forwarding multimodal blocks
through prompt assembly requires upstream framework work deferred per
TDD §16.  v1: emit a single warn-once-per-run log warning on the LLM
side and drop the attachments from the LLM call.

Tool-call attachment forwarding via TDD §6.5 is unaffected — tools
still receive path-by-reference when the picker matches:
  - CrewAI: HostTool._run now stashes _rr_input_schema and runs the
    picker before delegating to call_tool.
  - Wave: the parallel executor looks up the descriptor by name in
    context.tools.list and merges picker output into args.

TDD §8.2, §8.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contract-level integration covering both layers of the path-by-reference
flow: (a) pick_for_tool_call output shape for tool_sha256's declared
inputSchema, and (b) IInstance.sha256 returning the correct digest
given the dispatcher's {path, mime, bytes} dict.

Full per-framework live-engine e2e is deferred to Slice J smoke tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the per-slice TODOs with structured logger.info('METRIC <name>
key=value ...') lines so the TDD §13 signal set is grep-able from
production logs until a metrics adapter wires them into the
MetricsManager surface. Privacy invariant: MIME and counts only — never
filenames or paths.

Engine-side counters wired (Python logger.info):
- attachment.count_per_message (ChatBase.chat, every send)
- attachment.dropped_unsupported (ChatBase translator drop path)
- llm.attachment_forwarded (ChatBase, per surviving attachment)
- attachment.dropped_agent_unsupported (CrewAI + RocketRide Wave)
- tool.call_with_attachment (LangChain/DeepAgent/CrewAI/Wave picker fill)
- tool.attachment_resolved (rocketlib filters _resolve_attachment_inputs)

Client-side (console.info from uploadAttachment.ts):
- attachment.upload_start, attachment.bytes, attachment.upload_error

Orphan reaper deferred per Q-J1; upload_start + count_per_message
provide the volume-visibility signal (delta = orphan volume) in v1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Integration test sketches the per-FileStore-backend round-trip for the
Feature 2 happy path (upload → persist → reopen → read-back → delete).
Runs as skip-by-default pending a live FileStore + engine harness in
CI; the docstring captures the steps to fill in when the fixture
lands. s3:// and azure:// variants gate on credential env vars so the
matrix stays green in offline contributor environments.

Orphan reaper deferred per Q-J1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two reviewer-checklist items so future PRs that touch agent
integrations or @tool_function authoring are forced to think about the
attachment forwarding contract:

- New Question(...) inside an agent integration must inherit
  ``attachments`` from the entry-point question (TDD §8.1, Slice D).
- New ``format: "rocketride-attachment"`` tools must keep the marker
  at top-level inputSchema.properties (no nesting, arrays, or
  oneOf/anyOf — TDD §10.3, Q-H2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug fixes surfaced during manual provider/agent testing:
- add _chat_blocks overrides on all 13 llm_* chat drivers so the
  attachment dispatch path completes end-to-end instead of raising
- make the CrewAI/Wave attachment drop real (forward_attachments=False);
  previously the drop was logged but attachments still reached the model
- register a sync account file-store factory so attachment-typed tool
  inputs resolve path->bytes at dispatch time
- let the picker own attachment-typed tool slots, overriding any
  model-supplied (often hallucinated) path

Cleanup to reach steady state:
- limit tool-attachment forwarding to perception-capable frameworks;
  remove the picker from CrewAI and RocketRide Wave, keep it on
  DeepAgent/LangChain plus the shared resolver/file-store plumbing
- drop the tool_sha256 reference fixture and its only test consumer
- strip dead provider_shape/provider_name declarations from the 13
  llm_* IInstance classes (dispatch reads provider_shape off the chat
  driver, not the node instance; the decoy decls caused misrouting)
- remove the provider_shape contract test that asserted those decoys
- revert feature-specific additions to the PR template

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cleanup pass over the multimodal-chat-sessions diff (footprint 59 -> 51
files, +3020 -> +2264) before review. No behaviour change beyond the two
fixes noted.

- Restore to baseline: llm_base.py, ai/common/__init__.py, .gitignore
  (stray local ignores incl. checked-in CLAUDE.md/claude/), apps/chat-ui
  package.json + pnpm-lock.yaml.
- Remove the attachment file-store global factory; nodes build their own
  store (tool_filesystem pattern), resolver reads self._file_store.
- Agent drivers: drop the parallel stdlib logging logger for rocketlib
  debug()/warning() (deepagent, langchain, crewai, executor, planner);
  executor's logger was unused.
- rocketlib filters.py: derive attachment MIME from the path, not the
  schema accept-list; drop the overclaiming metrics-adapter comment.
- Fold attachment_picker into agent/_internal/utils.py with no public-API
  growth; delete the standalone module.
- chat-ui: fix attachment pills never rendering on a freshly-sent message
  (userMessage now carries attachments); cut the unwired jest harness
  (test + config + devDeps + lockfile).
- Tests: fix test_attachments_routing IInvokeLLM.Ask stub and
  test_dispatch import so both run under the canonical engine runner; cut
  the unconditionally-skipped filestore placeholder + its package marker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (2)
  • develop
  • release/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f3706a27-4f37-4cdb-b25e-0a665a700d91

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/multimodal-chat-sessions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added module:server C++ engine and server components module:nodes Python pipeline nodes module:client-python Python SDK and MCP client module:ai AI/ML modules labels May 29, 2026
@github-actions
Copy link
Copy Markdown

No description provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ai AI/ML modules module:client-python Python SDK and MCP client module:client-typescript module:nodes Python pipeline nodes module:server C++ engine and server components module:ui Chat UI and Dropper UI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant