This project teaches you how robots understand the 3D position and orientation (pose) of objects using a simple, reliable method: ArUco markers + camera calibration + OpenCV pose estimation (solvePnP).
Even if you're a total beginner, this repo guides you step-by-step through:
- What object pose estimation means
- Why robots need it
- How ArUco markers work
- How to calibrate your camera
- How to detect markers in live video
- How to compute 3D pose (x, y, z + rotation)
- How to record the output as a video
- How to debug issues
Everything is explained clearly and practically.
Robots cannot pick, place, move, or interact with objects unless they know:
Position in 3D
Orientation (angles / axis direction)
This is called 6-DoF Pose Estimation (x, y, z, roll, pitch, yaw).
ArUco markers are widely used in robotics because they are:
- easy to detect
- robust
- work even with normal webcams
- ideal for beginners
This project gives you a fully working pose estimation pipeline, similar to what real industrial robots use in:
- object grasping
- visual servoing
- pick-and-place
- robotic navigation
- AR applications
✔ Detect ArUco markers in images or webcam video
✔ Estimate 3D pose (translation + rotation)
✔ Draw 3D axes aligned with marker orientation
✔ Save annotated images
✔ Save MP4 recordings of live detection
✔ Works with any webcam
✔ Beginner-friendly explanations
✔ Supports camera calibration
object-pose-estimation-opencv/
├── detect_and_pose_aruco.py # Main script – detection + 3D pose + live view + recording
├── utils.py # Helper functions (camera loading, drawing 3D axes)
├── cam.py # Creates a synthetic camera calibration file
├── calibrate_aruco.py # (Optional) ArUco-based calibration
├── samples/
│ ├── aruco_sample.jpg # Provided sample marker image
├── outputs/ # Auto-generated outputs (created on first run)
│ ├── camera.npz # Synthetic OR real camera intrinsics
│ ├── camera_aruco.npz # Optional calibration output
│ ├── aruco_out.jpg # Annotated sample output (image mode)
│ ├── aruco_test_out.jpg
│ └── aruco_snap_*.jpg # Snapshots saved during webcam runs
├── requirements.txt # OpenCV + numpy + matplotlib
└── README.md # (THIS FILE)
Check your version:
python --versionpip install -r requirements.txtIf ArUco gives errors, install contrib build:
pip install opencv-contrib-pythonThis project uses a synthetic camera calibration file for simplicity.
Run:
python cam.pyThis generates:
outputs/camera.npzIt contains:
K– camera intrinsic matrixdist– distortion coefficientsreproj_err– calibration error
This file is required for accurate 3D pose estimation.
You can either:
(or show it on another phone/laptop screen)
OR
samples/aruco_sample.jpgHold it in front of the webcam.
python detect_and_pose_aruco.py --camera-file outputs/camera.npz --source samples/aruco_sample.jpg --out outputs/aruco_out.jpgOutput saved to:
outputs/aruco_out.jpgpython detect_and_pose_aruco.py --camera-file outputs/camera.npz --source 0q→ quits→ save a snapshot tooutputs/
python detect_and_pose_aruco.py --camera-file outputs/camera.npz --source 0 --out outputs/aruco_webcam_record.mp4This will:
- Show live detection
- Draw 3D axes
- Record MP4 video to:
outputs/aruco_webcam_record.mp4Use this README as your final documentation for GitHub.
Your outputs/ folder will look like:
outputs/
├── camera.npz # Saved camera intrinsics
├── camera_aruco.npz # Optional real calibration
├── aruco_out.jpg # Annotated sample output
├── aruco_test_out.jpg
└── aruco_snap_*.jpg # Snapshots captured from webcam
1️⃣ Detect the marker corners
OpenCV finds the 4 corners of each ArUco marker.
2️⃣ Use camera calibration
This tells OpenCV how your camera maps 3D → 2D.
3️⃣ Estimate 3D pose
OpenCV uses solvePnP to compute:
- Translation (x, y, z)
- Rotation (rvec)
4️⃣ Draw 3D axes
Red = X, Green = Y, Blue = Z — shows orientation visually.
Imagine a robot looking at a cube. It needs to know:
- how far the cube is
- whether it's tilted
- how to grab it
- where its corners are
Your webcam = robot eye
ArUco marker = robot "cheat code" to understand the object
OpenCV = robot brain
This repo gives that exact capability.
If your printed marker side = 5 cm, run with:
--marker-size 0.05If your marker side = 10 cm, run with:
--marker-size 0.10If marker side = 20 cm, run with:
--marker-size 0.20If marker size is wrong → the 3D pose will be wrong.
- Use good, even lighting
- Print marker at high quality (matte paper if possible)
- Avoid strong reflections or gloss
- Don’t tilt the marker too extremely (keep it visible)
- Keep at least 20–40 cm from webcam for standard phone/webcams
- Use
DICT_6X6_250for robustness:
--dict-name DICT_6X6_250-
❌ “No markers detected”
-
Show the printed image to the webcam (or screen)
-
Increase lighting
-
Increase marker size on screen or print larger
-
Try
DICT_6X6_250 -
❌ Webcam opens but nothing is shown
-
You must show a physical marker or the sample image to the camera.
-
❌ Recorded video file is 0 bytes or corrupt
-
Ensure
opencv-contrib-pythoninstalled (some builds required for video codecs):pip install opencv-contrib-python -
❌ Pose looks wrong
-
Verify
--marker-sizematches real marker size -
Re-check camera calibration (
outputs/camera.npz)
Webcam snapshot(s) saved during run

If any of the images above show as broken links in GitHub, run the detection once (image or webcam mode) to generate the corresponding file under
outputs/and commit it (or replace preview images with your generated outputs).
Generate synthetic camera intrinsics (quick start):
python cam.pyRun on sample image:
python detect_and_pose_aruco.py --camera-file outputs/camera.npz --source samples/aruco_sample.jpg --out outputs/aruco_out.jpgRun live webcam:
python detect_and_pose_aruco.py --camera-file outputs/camera.npz --source 0Run live webcam + record:
python detect_and_pose_aruco.py --camera-file outputs/camera.npz --source 0 --out outputs/aruco_webcam_record.mp4


