This project converts video recordings of people into 3D animated mannequins that you can view and rotate in a web browser. It uses Google's MediaPipe AI to track body movements from videos and displays them as a 3D model.
- Takes a video file (MP4, AVI, MOV, etc.)
- Analyzes each frame to find body landmarks (joints like shoulders, elbows, knees)
- Saves the landmark data as a JSON file
- Displays the data as an animated 3D mannequin in your browser
project/
├── batch_process_videos.py # Python script to process videos
├── viewer.html # Web page to view 3D animations
├── requirements.txt # Python dependencies
├── videos/ # Put your input videos here
└── json_output/ # Processed JSON files go here
- Python 3.8 or higher
- A modern web browser (Chrome, Firefox, Edge)
pip install -r requirements.txtmkdir videos
mkdir json_outputPlace your video files in the videos/ folder, then run:
python batch_process_videos.py --input_dir ./videos --output_dir ./json_outputOptional Parameters:
--min_detection_confidence 0.5- How confident the AI needs to be (0.0 to 1.0)--min_tracking_confidence 0.5- How confident when tracking between frames
- Open
viewer.htmlin your browser (double-click it) - Click "Choose File" button
- Select a JSON file from
json_output/folder - Watch your 3D mannequin!
- Mouse Drag: Rotate camera around mannequin
- Mouse Wheel: Zoom in/out
- Play Button: Start animation
- Pause Button: Stop animation
- Slider: Scrub through frames manually
- Video Input → Read video file frame by frame
- MediaPipe Analysis → AI detects 33 body landmarks per frame
- Data Extraction → Each landmark has X, Y, Z coordinates
- JSON Output → All frames saved as array of coordinate arrays
- 3D Rendering → Browser reads JSON and draws 3D mannequin
- Animation → Playback through frames creates movement
MediaPipe tracks these points:
- Face: Nose, eyes, ears, mouth
- Torso: Shoulders, hips
- Arms: Elbows, wrists
- Legs: Knees, ankles
- Hands: Pinky, index, thumb points
- Feet: Heel, foot index
- X: Left (-) to Right (+)
- Y: Up (+) to Down (-) in video, flipped in 3D view
- Z: Away from camera (+) to toward camera (-)
In viewer.html, find:
let frameSkip = 2; // Change this number1= Full speed2= Half speed3= Third speed- Higher = Slower
Find:
ground.position.y = -2; // Change this number- Lower number = Floor goes down
- Higher number = Floor goes up
Find:
x -= 1; // Change this number- Positive = Move left
- Negative = Move right
0= No horizontal shift
Find:
if (x > 0) x *= 1.1; // Change 1.11.0= No spreading1.1= 10% wider1.3= 30% wider
Find:
color: 0x00d9ff, // Cyan colorUse hex color codes:
0xff0000= Red0x00ff00= Green0x0000ff= Blue0xffffff= White0x000000= Black
Find:
const scale = 2; // Change this number- Higher = Bigger mannequin
- Lower = Smaller mannequin
Find in createMannequin():
mannequinParts.head = createSphere(0.12, skinMaterial);First number = radius/thickness
Examples:
- Head:
0.12(default) - Arms:
0.05(default) - Legs:
0.07(default)
Solution: Use shorter videos or lower resolution videos
Solution: This is normal - MediaPipe estimates depth from 2D video. View from front for best results.
Solution: This happens when MediaPipe loses tracking. Use videos with clear, well-lit subjects.
Solution: Adjust const scale = 2; in the getPosition() function
Solution: Lower the floor position: ground.position.y = -3;
Each JSON file contains an array of frames:
[
[x0, y0, z0, x1, y1, z1, ... x32, y32, z32], // Frame 0
[x0, y0, z0, x1, y1, z1, ... x32, y32, z32], // Frame 1
...
]Each frame has 99 numbers (33 landmarks × 3 coordinates).
- Processing: ~30-60 FPS depending on CPU
- Playback: 60 FPS in browser
- File size: ~1KB per second of video
- MediaPipe: Google's pose estimation library
- Three.js: 3D rendering library
- OpenCV: Video processing
This is an educational project. MediaPipe and Three.js have their own licenses.