Feature Request: Multimodal Memory (Image + Text)

# Feature Request: Multimodal Memory (Image + Text)

## Use Cases

1. **Inventory / asset management** — Take a photo of an item, store it as a memory with description. Later retrieve by text query ("where's the blue storage box?") or by image similarity.

2. **People & faces** — Store photos associated with people/entities. The agent can "remember what someone looks like."

3. **Visual notes** — Screenshot a UI, diagram, or whiteboard; store it as a searchable memory alongside text annotations.

## Why this fits memory-lancedb-pro

- LanceDB natively supports multimodal data and CLIP-style embedding functions
- The plugin already supports "any OpenAI-compatible embedding provider" — CLIP models (e.g. jina-clip-v2) expose the same `/v1/embeddings` endpoint for both text and images
- Text and image vectors live in the same space with CLIP, so cross-modal retrieval (text query → image result) works out of the box at the vector level

## Possible Approach (incremental)

**Phase 1: Image attachment**
- Allow `memory_store` to accept an optional image (URL or base64)
- Store image reference/data alongside the text in the memories table
- Embed using a CLIP-compatible model; fall back to text-only embedding if the configured model doesn't support images

**Phase 2: Image-aware retrieval**
- `memory_recall` returns image references when relevant
- Support image-as-query (pass an image to find similar memories)

## Notes

- This doesn't need to replace the current text-only flow — it's additive. Users who don't configure a multimodal embedding model would see zero behavior change.
- The schema change is modest: an optional `image` column (storing a URL/path/base64) in the memories table.
- The main open question is whether this aligns with the plugin's scope, or if multimodal memory is better handled as a separate plugin/module.

Would love to hear thoughts on whether this direction is interesting for the project!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Multimodal Memory (Image + Text) #290

Feature Request: Multimodal Memory (Image + Text)

Use Cases

Why this fits memory-lancedb-pro

Possible Approach (incremental)

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Multimodal Memory (Image + Text) #290

Description

Feature Request: Multimodal Memory (Image + Text)

Use Cases

Why this fits memory-lancedb-pro

Possible Approach (incremental)

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions