Moondream Image Analysis

This application uses the Moondream model from Hugging Face to analyze and describe images from URLs or uploaded images. The application is built using Modal for serverless deployment and GPU acceleration.

Features

Image URL analysis - analyze any image from the web
Image upload analysis - analyze images from your device
GPU-accelerated image analysis using Moondream model
Custom prompting capability

How It Works

The application accepts images via URL or direct upload
Images are processed by the Moondream model on a GPU
The model's description is returned as a JSON response

Requirements

Modal account and CLI installed
Python 3.10+

Setup and Deployment

Install project dependencies:
```
pipenv install
```
Set up Modal:
```
modal setup
```
Drop into virtual env
```
pipenv shell
```
Deploy the application:
```
modal deploy moondream_inf.py
```
Alternatively, run it locally:
```
modal run moondream_inf.py
```

API Usage

The application provides a single API endpoint:

`POST /analyze_image`

Analyzes an image and returns a description.

Request Body:

{
  "image_url": "https://images.unsplash.com/photo-1554797589-7241bb691973?q=80&w=1336&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
  "prompt": "Describe the image composition and any text"
}

Response:

{
  "answer": "A narrow alleyway at night, lined with shops and restaurants, is illuminated by warm-toned lights and hanging lanterns. The lanterns are decorated with cherry blossoms, adding a delicate touch to the scene. The buildings are dark, with signs in Japanese characters that are partially visible. People are walking through the alleyway, adding a sense of activity to the scene. The overall atmosphere is one of a bustling, vibrant city at night."
}

Testing

You can also run a quick test directly through Modal:

modal run moondream.py

This will run a test with a default image and show the results.

Technical Details

The application uses the vikhyatk/moondream2 model from Hugging Face
Images are processed on a T4 GPU for faster inference
The model weights are cached in a Modal volume for faster startup
Uses torch.cuda.amp.autocast() for improved GPU memory efficiency
Includes comprehensive error handling and debugging
Uses Modal's fastapi_endpoint decorator for simplified API deployment

Limitations

Processing time depends on the Modal cold start and model initialization
URL images must be publicly accessible and in a supported format (JPEG, PNG, etc.)
Maximum image size is limited by available GPU memory

Troubleshooting

If you encounter issues:

Check the Modal logs for detailed error messages
Ensure your image is in a supported format (JPEG, PNG)
Try using a sample image to verify the model is working correctly
Check that your image URL is publicly accessible

Credits

Moondream model by Vik Korrapati (https://github.com/vikhyat/moondream)
Built with Modal (https://modal.com)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
notebooks		notebooks
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
__init__.py		__init__.py
moondream_inf.py		moondream_inf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moondream Image Analysis

Features

How It Works

Requirements

Setup and Deployment

API Usage

`POST /analyze_image`

Testing

Technical Details

Limitations

Troubleshooting

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

edgarrt/modal-moondream

Folders and files

Latest commit

History

Repository files navigation

Moondream Image Analysis

Features

How It Works

Requirements

Setup and Deployment

API Usage

POST /analyze_image

Testing

Technical Details

Limitations

Troubleshooting

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`POST /analyze_image`

Packages