Turn your sketches into stunning 3D magic — inspired by Shakalaka Boom Boom.
Vision Crafter is an innovative & experimental SnapAR experience that lets you point your Snapchat Spectacles at a sketch or doodle, and watch it come alive as a 3D model — just like the magical pencil from Shakalaka Boom Boom.
It uses a combination of voice control, camera input, AI vision, and 3D generation to convert drawings into 3D assets in real-time.
| Drawing Input | Vision Processing | 3D Processing | Final Output | Example Output |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
-
🎨 Drawing-first AI understanding
Detects and prioritizes sketches and doodles as the main object, ignoring canvases like paper, iPad, notebooks, walls, or drawing tool through prompt guidance -
🗣️ Voice-activated scanning
Users can trigger the entire vision scan process through speech recognition via the Voice ML module. -
📷 Spectacles Camera access
Camera frame capture powered by Camera Module -
⚓ Object anchoring
Instant World Hit Test for immediate 3D anchoring to wherever you're looking at -
🧠 Smart prompt generation using Vision
Frame data is processed using OpenAI Vision API to generate a precise, 3D-ready text prompt for asset generation. -
🌐 3D asset creation using Meshy
Uses Meshy API to convert the text prompt into a textured 3D model, streamed back instantly. -
🧊 Remote 3D asset injection
Final 3D model is injected into the scene using Remote Media Module. -
✨ Edge-fade overlay trick
A visual UX trick to fade out edges and avoid harsh cutoff overlays for a seamless AR experience.
-
Configure API Keys
Navigate to theAPIConfigscript inside theLensControllersSceneObject and replace the placeholder values with your OpenAI Vision and Meshy API keys. -
Enable Scene Understanding
The experience begins by capturing a frame through Spectacles using Camera Module, which is then sent to OpenAI GPT-4 Vision for intelligent interpretation and generation of a 3D-friendly text prompt. -
Generate 3D Assets
The generated prompt is passed to the Meshy API, which returns a corresponding 3D model. This model is streamed and loaded using the Remote Media Module. -
Apply Material (Optional)
Since the Meshy API is currently used in preview mode without texture generation, a placeholder material is automatically applied to the imported model. -
Anchor the Model in the Real World
The asset is positioned using Instant World Hit Test, allowing immediate placement at the center of the user’s field of view. -
Initiate Scan via Voice Command
The scanning process is triggered using speech recognition. The default keyword is “boom”, and both the trigger phrase and hint can be customized within theVoiceCommandHandlerscript. -
Enhance Visual UX
A custom edge-fade masking technique is used to soften the periphery and avoid sharp cutoffs in the AR display, ensuring a smoother, more immersive experience.
| Feature | Technology Used |
|---|---|
| Voice Trigger | Voice ML Module |
| Frame Capture | Camera Module |
| Internet API Calls | Remote Service Module (Fetch) |
| Frame Analysis | OpenAI GPT-4 Vision API |
| 3D Model Generation | Meshy API |
| Model Injection | Remote Media Module |
| Anchoring | Instant World Hit Test |
| Platform | Lens Studio + Spectacles |
Inspired by the Indian TV show Shakalaka Boom Boom, where anything you drew with a magic pencil came to life. Vision Crafter brings that fantasy to life using today's cutting-edge tech.
This project is licensed under the MIT License
© 2025 Krunal MB Gediya
Open to improvements, issues, and community collabs. Feel free to fork, play, and create some krazyy AR with us.




