This application uses the Moondream model from Hugging Face to analyze and describe images from URLs or uploaded images. The application is built using Modal for serverless deployment and GPU acceleration.
- Image URL analysis - analyze any image from the web
- Image upload analysis - analyze images from your device
- GPU-accelerated image analysis using Moondream model
- Custom prompting capability
- The application accepts images via URL or direct upload
- Images are processed by the Moondream model on a GPU
- The model's description is returned as a JSON response
- Modal account and CLI installed
- Python 3.10+
-
Install project dependencies:
pipenv install -
Set up Modal:
modal setup -
Drop into virtual env
pipenv shell -
Deploy the application:
modal deploy moondream_inf.py -
Alternatively, run it locally:
modal run moondream_inf.py
The application provides a single API endpoint:
Analyzes an image and returns a description.
Request Body:
{
"image_url": "https://images.unsplash.com/photo-1554797589-7241bb691973?q=80&w=1336&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
"prompt": "Describe the image composition and any text"
}Response:
{
"answer": "A narrow alleyway at night, lined with shops and restaurants, is illuminated by warm-toned lights and hanging lanterns. The lanterns are decorated with cherry blossoms, adding a delicate touch to the scene. The buildings are dark, with signs in Japanese characters that are partially visible. People are walking through the alleyway, adding a sense of activity to the scene. The overall atmosphere is one of a bustling, vibrant city at night."
}You can also run a quick test directly through Modal:
modal run moondream.pyThis will run a test with a default image and show the results.
- The application uses the
vikhyatk/moondream2model from Hugging Face - Images are processed on a T4 GPU for faster inference
- The model weights are cached in a Modal volume for faster startup
- Uses
torch.cuda.amp.autocast()for improved GPU memory efficiency - Includes comprehensive error handling and debugging
- Uses Modal's
fastapi_endpointdecorator for simplified API deployment
- Processing time depends on the Modal cold start and model initialization
- URL images must be publicly accessible and in a supported format (JPEG, PNG, etc.)
- Maximum image size is limited by available GPU memory
If you encounter issues:
- Check the Modal logs for detailed error messages
- Ensure your image is in a supported format (JPEG, PNG)
- Try using a sample image to verify the model is working correctly
- Check that your image URL is publicly accessible
- Moondream model by Vik Korrapati (https://github.com/vikhyat/moondream)
- Built with Modal (https://modal.com)