anotherai-dev · anyacherniss · Aug 14, 2025 · Aug 14, 2025
diff --git a/docs/content/docs/examples/image-input-agents.mdx b/docs/content/docs/examples/image-input-agents.mdx
@@ -0,0 +1,124 @@
+mad# Creating Agents with Image Input
+
+## Overview
+
+The goal of this guide is to explain the differences between using images as input instead of text, provide context for why these differences exist, and show you a basic example of an agent with image input 
+
+## The Difference Between Text and Image Inputs
+
+When building agents that take an image as input, images are passed directly in the message content using the `image_url` type, not as input variables. This is because images require special handling in the message format that differs from regular text or JSON inputs.
+
+## Why Images Are Handled Differently
+
+[TODO: explain this better]
+
+Traditional input variables in AnotherAI are designed for text, numbers, and structured data that can be templated into prompts using Jinja2 syntax. For example, you might have `{{ user_name }}` or `{{ email_content }}` in your prompt template.
+
+Images, however, cannot be templated this way because:
+[the following is generated by Claude; not sure if it's accurate]
+- They are binary data or URLs, not text that can be inserted into a string
+- AI models expect images to be provided in a specific format within the message structure
+- The models need to know explicitly that they're receiving image data, not text
+
+## Example: Image Description Agent
+
+Let's explore a complete example that shows how to correctly pass images to an agent:
+
+```python
+def image_description(image_url: str) -> str:
+    res = openai.chat.completions.create(
+        model="gpt-4o-mini",
+        messages=[
+            {
+                "role": "system",
+                "content": """You are an image description specialist who provides detailed and accurate descriptions of images. Your task is to analyze the provided image and generate a comprehensive description that captures the key elements, context, and details visible in the image.
+
+                Your description should be:
+                - Clear and concise
+                - Factual and objective
+                - Detailed enough to help someone visualize the image
+                - Well-structured and easy to understand""",
+            },
+            {
+                "role": "user",
+                "content": [
+                    {"type": "image_url", "image_url": {"url": image_url}},
+                ],
+            },
+        ],
+    )
+    if not res.choices[0].message.content:
+        raise ValueError("No image description found")
+    return res.choices[0].message.content
+```
+
+As you can see, images differ from text inputs:
+- For images, use `type: "image_url"`
+- The URL is nested: `image_url: {"url": image_url}`
+
+### Combining Image Input with Text
+
+#### Mixing Static Text and Images
+
+[TODO: confirm if this is correct]
+You can combine text and image content in the same message. For example:
+
+```python
+{
+    "role": "user",
+    "content": [
+        {"type": "text", "text": "How many cats are in the image?"},
+        {"type": "image_url", "image_url": {"url": image_url}}
+    ],
+}
+```
+
+#### Mixing Input Variables and Images
+
+[TODO: confirm if this is correct]
+When your input includes both images and an input variable, you can use Jinja2 templating in text content while keeping images in the structured format. For example:
+
+```python
+class ImageQuestionAnswer(BaseModel):
+    answer: str
+
+def answer_image_question(
+    image_url: str,
+    question: str,
+) -> ImageQuestionAnswer:
+    res = openai.beta.chat.completions.parse(
+        model="gpt-4o-mini",
+        messages=[
+            {
+                "role": "system",
+                "content": """You are an image analyst who provides detailed and accurate answers to questions about images. Your task is to analyze the provided image and question about the image and generate a comprehensive answer.
+
+                Your answer should be:
+                - Clear and concise
+                - Factual and objective
+                - Well-structured and easy to understand""",
+            },
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "text", 
+                        "text": "{{question}}"
+                    },
+                    {"type": "image_url", "image_url": {"url": image_url}}
+                ]
+            }
+        ],
+        response_format=ImageQuestionAnswer,
+        extra_body={
+            "input": {
+                "variables": {
+                    "question": question,
+                }
+            }
+        },
+    )
+    if not res.choices[0].message.parsed:
+        raise ValueError("No image question answer found")
+    return res.choices[0].message.parsed
+```