MathVizAI is an automated end-to-end system designed to generate polished, educational mathematical videos. It accepts a mathematical problem as input and autonomously produces a complete video explanation featuring synchronized visualizations, audio narration, and detailed step-by-step proofs. The system integrates advanced Large Language Models (LLMs) for reasoning, a self-correcting evaluation loop for accuracy, Manim for high-quality mathematical animations, and Microsoft VibeVoice for natural-sounding speech synthesis.
- Automated Problem Solving: Utilizes LLMs to solve complex mathematical problems with rigorous proofs and logical steps.
- Self-Correcting Verification: Includes a dedicated evaluation stage that validates the solution's accuracy and iterates until a correct proof is generated.
- Dynamic Visualizations: Automatically generates Python code for Manim (Mathematical Animation Engine) to create precise and aesthetically pleasing mathematical animations.
- High-Quality Neural TTS: Integrates Microsoft's VibeVoice (via
vibevoice) to generate natural, expressive audio narration. - RAG-Enhanced Generation: employs a Retrieval-Augmented Generation (RAG) system with a "Golden Set" of high-quality examples to ensure reliable Manim code generation.
- Audio-Visual Synchronization: Automatically aligns generated audio segments with their corresponding video animations for a seamless viewing experience.
- Pipeline Architecture: Modular design separating solving, script writing, video generation, and rendering for robustness and maintainability.
Check out a sample video generated by MathVizAI explaining Taylor Series:
Note: If the video doesn't play in your viewer, you can download it here.
MathVizAI is built on a sophisticated Multi-Agent and RAG architecture designed for autonomy and self-correction.
The system is managed by a central PipelineOrchestrator that coordinates specialized autonomous agents:
- Solver Agent: Uses pure reasoning to generate mathematical proofs.
- Evaluator Agent (Self-Correction): Acts as a critic, reviewing the Solver's output. If errors are found, it rejects the solution and provides specific feedback, triggering a retry loop until accuracy is verified.
- Script Agent: Transforms rigorous proofs into engaging, conversational scripts suitable for audio narration.
- Visual Developer Agent: A specialized code-generation agent that writes Manim Python scripts.
To ensure high-quality visual output, the Visual Developer Agent employs an advanced RAG pipeline:
- Golden Set: A curated vector database of high-quality Manim animations (e.g., from 3Blue1Brown).
- ReAct Loop: Before writing code, the agent enters a "Reasoning + Acting" loop. It actively searches the Golden Set for relevant visualization techniques (e.g., "how to animate a riemann sum") and retrieves proven code snippets to inform its generation, significantly reducing syntax errors and hallucinations.
The system leverages state-of-the-art prompt engineering techniques to maximize LLM performance:
- Chain of Thought (CoT): The Solver agent is explicitly prompted to "think step-by-step" and provide intuitive overviews before attempting formal proofs, mirroring human mathematical reasoning.
- Role Prompting: Agents are assigned specific personas (e.g., "Meticulous Validator", "3Blue1Brown-style Developer") to align their tone, rigor, and output style with expert standards.
- Structured Output: The Evaluator agent uses strict schema enforcement (JSON-like structures) to provide parseable, quantitative feedback (scores 0-10) rather than vague text.
- Constraint-Based Prompting: The Video Generator operates under strict defining constraints, such as hard "Frame Boundaries" (X=[-7.1, 7.1]) and rigid "Timing Contracts" to ensure perfect audio-visual sync.
Beyond standard error handling, the pipeline implements agent-native reliability features:
- Self-Correction Loops: The "Solver-Evaluator" loop acts as an autonomous feedback mechanism, catching and fixing logical errors before they reach the final output.
- Defensive Generation: The system uses a library of "safe" wrapper functions (
visual_utils) that abstract away complex or fragile Manim operations, preventing common runtime crashes during code generation. - Dry-Run Verification: Generated Manim code undergoes a syntax check and a "dry-run" execution phase to detect runtime errors (like LaTeX compilation failures) before committing to full-quality rendering.
The pipeline processes a query through the following sequential stages:
- Solver: The system receives a math problem and generates a detailed text solution.
- Evaluator: A secondary LLM reviews the solution for errors. if issues are found, the Solver is triggered to retry.
- Script Writer: Converts the verified solution into a conversational script, segmented for optimal pacing (approx. 15-20 seconds per segment).
- Video Generator: Generates Manim Python code for each script segment, visualizing the concepts described.
- TTS Generator: Synthesizes audio for each segment using VibeVoice.
- Renderer: Executes the Manim code to render video segments (supports parallel execution).
- Synchronizer & Assembly: Combines audio and video streams, adjusting playback logic as needed, and concatenates all segments into the final output file.
- Python: Version 3.8 or higher.
- FFmpeg: Required for Manim video rendering and audio processing.
- System Dependencies: Building VibeVoice and Manim may require system-level development tools (e.g.,
build-essential,espeak-ng).
-
Clone the Repository
git clone https://github.com/anirudhsengar/MathVizAI.git cd MathVizAI -
Install Python Dependencies
pip install -r requirements.txt
Note: The
vibevoicedependency is installed directly from its GitHub repository as specified inrequirements.txt. -
Configure Environment Variables
Create a
.envfile in the project root directory and add your API keys:OPENAI_API_KEY=your_openai_api_key TAVILY_API_KEY=your_tavily_api_key
OPENAI_API_KEY: Required for the LLM (GPT-4o).TAVILY_API_KEY: Required for web research capabilities (optional but recommended).
The system is highly configurable via config.py. Key settings include:
DEBUG_MODE: Set toTrueto retain all intermediate files (logs, individual video segments, audio files). Set toFalseto keep only the final video.DEEP_DIVE_MODE: WhenTrue, generates more comprehensive and detailed explanations.MANIM_QUALITY: Controls the rendering resolution (low,medium,high,production).RAG_ENABLED: Toggles the use of the Golden Set RAG system for improved code generation.MAX_TOKENS&TEMPERATURE: Fine-tune the behavior of the LLMs for different pipeline stages.
To start the interactive CLI:
python main.pyFollow the prompts to enter a mathematical problem (e.g., "Prove that the square root of 2 is irrational").
All generated content is saved to the output/ directory, organized by timestamp and query name. A typical session folder includes:
final/: Contains the final compiled video.solver/: Solutions and proof attempts.script/: Generated audio scripts and segmentation data.video/: Manim Python scripts (.py).render/: Raw rendered video segments (.mp4).audio/: Synthesized audio files (.wav).
The codebase is organized into modular components under the pipeline/ directory:
pipeline/orchestrator.py: Manages the overall data flow and stage execution.pipeline/solver.py: Handles mathematical reasoning.pipeline/evaluator.py: Validates solutions.pipeline/video_generator.py: Generates visualization code.pipeline/video_renderer.py: Handles Manim rendering.pipeline/tts_generator.py: Interfaces with the VibeVoice model.
This project is licensed under the MIT License.
