Skip to content

edgarrt/modal-moondream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Moondream Image Analysis

This application uses the Moondream model from Hugging Face to analyze and describe images from URLs or uploaded images. The application is built using Modal for serverless deployment and GPU acceleration.

Features

  • Image URL analysis - analyze any image from the web
  • Image upload analysis - analyze images from your device
  • GPU-accelerated image analysis using Moondream model
  • Custom prompting capability

How It Works

  1. The application accepts images via URL or direct upload
  2. Images are processed by the Moondream model on a GPU
  3. The model's description is returned as a JSON response

Requirements

  • Modal account and CLI installed
  • Python 3.10+

Setup and Deployment

  1. Install project dependencies:

    pipenv install
    
  2. Set up Modal:

    modal setup
    
  3. Drop into virtual env

    pipenv shell
    
  4. Deploy the application:

    modal deploy moondream_inf.py
    
  5. Alternatively, run it locally:

    modal run moondream_inf.py
    

API Usage

The application provides a single API endpoint:

POST /analyze_image

Analyzes an image and returns a description.

Request Body:

{
  "image_url": "https://images.unsplash.com/photo-1554797589-7241bb691973?q=80&w=1336&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
  "prompt": "Describe the image composition and any text"
}

Response:

{
  "answer": "A narrow alleyway at night, lined with shops and restaurants, is illuminated by warm-toned lights and hanging lanterns. The lanterns are decorated with cherry blossoms, adding a delicate touch to the scene. The buildings are dark, with signs in Japanese characters that are partially visible. People are walking through the alleyway, adding a sense of activity to the scene. The overall atmosphere is one of a bustling, vibrant city at night."
}

Testing

You can also run a quick test directly through Modal:

modal run moondream.py

This will run a test with a default image and show the results.

Technical Details

  • The application uses the vikhyatk/moondream2 model from Hugging Face
  • Images are processed on a T4 GPU for faster inference
  • The model weights are cached in a Modal volume for faster startup
  • Uses torch.cuda.amp.autocast() for improved GPU memory efficiency
  • Includes comprehensive error handling and debugging
  • Uses Modal's fastapi_endpoint decorator for simplified API deployment

Limitations

  • Processing time depends on the Modal cold start and model initialization
  • URL images must be publicly accessible and in a supported format (JPEG, PNG, etc.)
  • Maximum image size is limited by available GPU memory

Troubleshooting

If you encounter issues:

  1. Check the Modal logs for detailed error messages
  2. Ensure your image is in a supported format (JPEG, PNG)
  3. Try using a sample image to verify the model is working correctly
  4. Check that your image URL is publicly accessible

Credits

Releases

No releases published

Packages

No packages published