feat(multimodal): chat attachments end-to-end (schema → upload → LLM & tool forwarding)#1029
feat(multimodal): chat attachments end-to-end (schema → upload → LLM & tool forwarding)#1029dylan-savage wants to merge 21 commits into
Conversation
Introduce Attachment as a first-class schema type (peer file pattern, matching Doc.ts). Add optional Attachment[] to QuestionHistory and a default-empty Attachment[] field to Question, wired through toDict / fromDict for round-trip serialization. TDD §6.1–§6.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the TS schema addition: introduce Attachment as a peer Pydantic model (size_bytes >= 0), add an optional List[Attachment] to QuestionHistory and a default-empty List[Attachment] on Question, and re-export Attachment from rocketride.schema. TDD §6.1–§6.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…c.ts Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Python) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a 2 MB-chunked upload helper that writes via fsOpen/fsWrite/fsClose and returns an Attachment record. ChatInput grows a paperclip button, pending pills, and a second optional arg on onSend. Message renders pill rows on user bubbles. Activation gated on optional client + chatId props on ChatInput so the existing parent components remain untouched; user wires the props through when they're ready. Adds a minimal Jest+jsdom setup (test runner did not exist for chat-ui before this slice) so the upload helper unit tests can run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add `attachments` field to AgentContext (frozen tuple) and stamp it from `Question.attachments` at run_agent entry. Thread it onto any synthesized Question in `call_llm` / `call_llm_json` so the provider-side translators in LLMBase can auto-forward to the underlying API. TDD §8.1. Always-assign per Q-D1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LLMBase declares provider_shape ('openai'|'anthropic'|'gemini'|'bedrock')
and provider_name (telemetry tag). ChatBase.chat() routes Questions with
attachments through the per-shape translator; text-only Questions use
the existing single-string fast path unchanged.
Ships the openai shape (covers most providers per TDD §9). Drop-and-warn
for unsupported MIMEs; FileStore IO errors propagate (Q-E1).
TDD §7.1, §7.2, §7.3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the three remaining provider block shapes to llm_translate plus their dispatch branches in ChatBase.chat(). Same drop-and-warn discipline for unsupported MIMEs; FileStore IO errors propagate per Q-E1. TDD §7.2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Set the class attributes added in Slice E on each in-tree LLM node so ChatBase.chat() routes attachments through the right translator shape: * anthropic: llm_anthropic * gemini: llm_gemini * bedrock: llm_bedrock (Nova; Claude-path is a follow-up) * openai-compat: llm_openai + 9 others sharing the openai block API Vision-only LLMs (llm_vision_*) intentionally not wired — they don't accept a Question input field today (Q-G1). LangChain HumanMessage(content=blocks) pre-translation in _chat_blocks is deferred — every node still has the default NotImplementedError guard until the per-node override lands; class-attribute set unblocks that follow-up. TDD §7.4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cache @tool_function descriptors at tool.query time and walk the
top-level inputSchema.properties at tool.invoke to resolve any
string-typed property with format: "rocketride-attachment" to a
{path, mime, bytes} dict via the per-account FileStore.
Tool authors who declare attachment-typed inputs receive already-
loaded bytes without writing FileStore boilerplate per tool.
Top-level walk only per Q-H2 (nested/array shapes deferred). The
FileStore is sourced from self._file_store (preferred, injected
by host) or self.instance.fileStore — we do NOT auto-construct a
per-account FileStore here; the IInstance host wires it.
Flag for engine-team review at PR time: this edit lives in
rocketlib-python which the engine vendors.
TDD §10.3, Q-H1, Q-H2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ships tool_sha256 as a fixture for upcoming agent integration tests
(Slices I and J) and a contract test that the dispatcher's path→bytes
resolution works end-to-end through the @tool_function decorator.
The tool declares one format: "rocketride-attachment" input slot and
relies on the rocketlib dispatcher to swap the path string for a
{path, mime, bytes} dict before sha256() runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks the first Attachment from AgentContext.attachments matching each top-level format: rocketride-attachment property of a tool's inputSchema. Returns path-by-reference (the dispatcher resolves to bytes per Slice H). x-rocketride-mimes patterns gate eligibility (fnmatch); absent patterns accept any mime. TDD §6.5, §10.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
At the HostTool._run construction site, fill any unset attachment-typed slot with the picker's path-by-reference choice. LLM-decided values win on collision (setdefault semantics). The dispatcher (Slice H) then resolves the path to bytes before calling the underlying tool method. TDD §8.1, §10.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CrewAI's Task(description=...) and Wave's q.addInstruction() are
plain-string at the upstream API. Forwarding multimodal blocks
through prompt assembly requires upstream framework work deferred per
TDD §16. v1: emit a single warn-once-per-run log warning on the LLM
side and drop the attachments from the LLM call.
Tool-call attachment forwarding via TDD §6.5 is unaffected — tools
still receive path-by-reference when the picker matches:
- CrewAI: HostTool._run now stashes _rr_input_schema and runs the
picker before delegating to call_tool.
- Wave: the parallel executor looks up the descriptor by name in
context.tools.list and merges picker output into args.
TDD §8.2, §8.3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contract-level integration covering both layers of the path-by-reference
flow: (a) pick_for_tool_call output shape for tool_sha256's declared
inputSchema, and (b) IInstance.sha256 returning the correct digest
given the dispatcher's {path, mime, bytes} dict.
Full per-framework live-engine e2e is deferred to Slice J smoke tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the per-slice TODOs with structured logger.info('METRIC <name>
key=value ...') lines so the TDD §13 signal set is grep-able from
production logs until a metrics adapter wires them into the
MetricsManager surface. Privacy invariant: MIME and counts only — never
filenames or paths.
Engine-side counters wired (Python logger.info):
- attachment.count_per_message (ChatBase.chat, every send)
- attachment.dropped_unsupported (ChatBase translator drop path)
- llm.attachment_forwarded (ChatBase, per surviving attachment)
- attachment.dropped_agent_unsupported (CrewAI + RocketRide Wave)
- tool.call_with_attachment (LangChain/DeepAgent/CrewAI/Wave picker fill)
- tool.attachment_resolved (rocketlib filters _resolve_attachment_inputs)
Client-side (console.info from uploadAttachment.ts):
- attachment.upload_start, attachment.bytes, attachment.upload_error
Orphan reaper deferred per Q-J1; upload_start + count_per_message
provide the volume-visibility signal (delta = orphan volume) in v1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Integration test sketches the per-FileStore-backend round-trip for the Feature 2 happy path (upload → persist → reopen → read-back → delete). Runs as skip-by-default pending a live FileStore + engine harness in CI; the docstring captures the steps to fill in when the fixture lands. s3:// and azure:// variants gate on credential env vars so the matrix stays green in offline contributor environments. Orphan reaper deferred per Q-J1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two reviewer-checklist items so future PRs that touch agent integrations or @tool_function authoring are forced to think about the attachment forwarding contract: - New Question(...) inside an agent integration must inherit ``attachments`` from the entry-point question (TDD §8.1, Slice D). - New ``format: "rocketride-attachment"`` tools must keep the marker at top-level inputSchema.properties (no nesting, arrays, or oneOf/anyOf — TDD §10.3, Q-H2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug fixes surfaced during manual provider/agent testing: - add _chat_blocks overrides on all 13 llm_* chat drivers so the attachment dispatch path completes end-to-end instead of raising - make the CrewAI/Wave attachment drop real (forward_attachments=False); previously the drop was logged but attachments still reached the model - register a sync account file-store factory so attachment-typed tool inputs resolve path->bytes at dispatch time - let the picker own attachment-typed tool slots, overriding any model-supplied (often hallucinated) path Cleanup to reach steady state: - limit tool-attachment forwarding to perception-capable frameworks; remove the picker from CrewAI and RocketRide Wave, keep it on DeepAgent/LangChain plus the shared resolver/file-store plumbing - drop the tool_sha256 reference fixture and its only test consumer - strip dead provider_shape/provider_name declarations from the 13 llm_* IInstance classes (dispatch reads provider_shape off the chat driver, not the node instance; the decoy decls caused misrouting) - remove the provider_shape contract test that asserted those decoys - revert feature-specific additions to the PR template Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cleanup pass over the multimodal-chat-sessions diff (footprint 59 -> 51 files, +3020 -> +2264) before review. No behaviour change beyond the two fixes noted. - Restore to baseline: llm_base.py, ai/common/__init__.py, .gitignore (stray local ignores incl. checked-in CLAUDE.md/claude/), apps/chat-ui package.json + pnpm-lock.yaml. - Remove the attachment file-store global factory; nodes build their own store (tool_filesystem pattern), resolver reads self._file_store. - Agent drivers: drop the parallel stdlib logging logger for rocketlib debug()/warning() (deepagent, langchain, crewai, executor, planner); executor's logger was unused. - rocketlib filters.py: derive attachment MIME from the path, not the schema accept-list; drop the overclaiming metrics-adapter comment. - Fold attachment_picker into agent/_internal/utils.py with no public-API growth; delete the standalone module. - chat-ui: fix attachment pills never rendering on a freshly-sent message (userMessage now carries attachments); cut the unwired jest harness (test + config + devDeps + lockfile). - Tests: fix test_attachments_routing IInvokeLLM.Ask stub and test_dispatch import so both run under the canonical engine runner; cut the unconditionally-skipped filestore placeholder + its package marker. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (2)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
No description provided. |
Summary
Adds image/file attachments to chat, carried end-to-end: from the chat-ui paperclip → uploaded to the per-user FileStore → through the engine → onto the LLM call (provider-native multimodal blocks) and tool calls. Spans 2 client SDKs (TypeScript + Python), the agent base, the rocketlib dispatcher, and all 13
llm_*provider nodes.Stacked on
feat/chat-sessions— review/merge that first (see base-branch note below).What's included (by area)
Attachmenttype +Question.attachments, mirrored across the TS and Python SDKs and the chat-ui types;Attachmentre-exported at the top level ofrocketride.Chat.send/client.chatcarryattachments; persisted and rehydrated (TS + Python).llm_translate.pyturns attachments into provider-native content blocks;chat.pydispatches byprovider_shape(openai / anthropic / gemini / bedrock); each of the 13llm_*nodes gets a_chat_blocksoverride.Question.attachmentsflow ontoAgentContext; an attachment→tool-slot picker fillsformat: rocketride-attachmentinputs; rocketlib resolves those paths to{path, mime, bytes}attool.invoke.Key design decisions
openai.format: rocketride-attachmentpath string for{path, mime, bytes}(top-level properties only), so tool authors write no FileStore boilerplate.beginGlobal()(thetool_filesystempattern); the resolver readsself._file_store. Keeps rocketlib dependency-free.Testing
Kept Python contract/unit tests:
test_llm_translate(21),test_attachment_binding(5),test_attachments_routing(3, fixed),test_dispatch_tool_attachments(3, fixed), plus the client SDK tests.Base-branch note
This PR targets
feat/chat-sessions(notdevelop) so the diff is exactly the multimodal feature (51 files).feat/chat-sessionsis not yet merged — merge it first, then this.🤖 Generated with Claude Code