MathVizAI

Overview

MathVizAI is an automated end-to-end system designed to generate polished, educational mathematical videos. It accepts a mathematical problem as input and autonomously produces a complete video explanation featuring synchronized visualizations, audio narration, and detailed step-by-step proofs. The system integrates advanced Large Language Models (LLMs) for reasoning, a self-correcting evaluation loop for accuracy, Manim for high-quality mathematical animations, and Microsoft VibeVoice for natural-sounding speech synthesis.

Key Features

Automated Problem Solving: Utilizes LLMs to solve complex mathematical problems with rigorous proofs and logical steps.
Self-Correcting Verification: Includes a dedicated evaluation stage that validates the solution's accuracy and iterates until a correct proof is generated.
Dynamic Visualizations: Automatically generates Python code for Manim (Mathematical Animation Engine) to create precise and aesthetically pleasing mathematical animations.
High-Quality Neural TTS: Integrates Microsoft's VibeVoice (via vibevoice) to generate natural, expressive audio narration.
RAG-Enhanced Generation: employs a Retrieval-Augmented Generation (RAG) system with a "Golden Set" of high-quality examples to ensure reliable Manim code generation.
Audio-Visual Synchronization: Automatically aligns generated audio segments with their corresponding video animations for a seamless viewing experience.
Pipeline Architecture: Modular design separating solving, script writing, video generation, and rendering for robustness and maintainability.

Sample Output

Check out a sample video generated by MathVizAI explaining Taylor Series:

Note: If the video doesn't play in your viewer, you can download it here.

Technical Architecture

MathVizAI is built on a sophisticated Multi-Agent and RAG architecture designed for autonomy and self-correction.

Multi-Agent Orchestration

The system is managed by a central PipelineOrchestrator that coordinates specialized autonomous agents:

Solver Agent: Uses pure reasoning to generate mathematical proofs.
Evaluator Agent (Self-Correction): Acts as a critic, reviewing the Solver's output. If errors are found, it rejects the solution and provides specific feedback, triggering a retry loop until accuracy is verified.
Script Agent: Transforms rigorous proofs into engaging, conversational scripts suitable for audio narration.
Visual Developer Agent: A specialized code-generation agent that writes Manim Python scripts.

Retrieval-Augmented Generation (RAG) & Golden Set

To ensure high-quality visual output, the Visual Developer Agent employs an advanced RAG pipeline:

Golden Set: A curated vector database of high-quality Manim animations (e.g., from 3Blue1Brown).
ReAct Loop: Before writing code, the agent enters a "Reasoning + Acting" loop. It actively searches the Golden Set for relevant visualization techniques (e.g., "how to animate a riemann sum") and retrieves proven code snippets to inform its generation, significantly reducing syntax errors and hallucinations.

Advanced Prompt Engineering

The system leverages state-of-the-art prompt engineering techniques to maximize LLM performance:

Chain of Thought (CoT): The Solver agent is explicitly prompted to "think step-by-step" and provide intuitive overviews before attempting formal proofs, mirroring human mathematical reasoning.
Role Prompting: Agents are assigned specific personas (e.g., "Meticulous Validator", "3Blue1Brown-style Developer") to align their tone, rigor, and output style with expert standards.
Structured Output: The Evaluator agent uses strict schema enforcement (JSON-like structures) to provide parseable, quantitative feedback (scores 0-10) rather than vague text.
Constraint-Based Prompting: The Video Generator operates under strict defining constraints, such as hard "Frame Boundaries" (X=[-7.1, 7.1]) and rigid "Timing Contracts" to ensure perfect audio-visual sync.

Reliability & Best Practices

Beyond standard error handling, the pipeline implements agent-native reliability features:

Self-Correction Loops: The "Solver-Evaluator" loop acts as an autonomous feedback mechanism, catching and fixing logical errors before they reach the final output.
Defensive Generation: The system uses a library of "safe" wrapper functions (visual_utils) that abstract away complex or fragile Manim operations, preventing common runtime crashes during code generation.
Dry-Run Verification: Generated Manim code undergoes a syntax check and a "dry-run" execution phase to detect runtime errors (like LaTeX compilation failures) before committing to full-quality rendering.

System Architecture

The pipeline processes a query through the following sequential stages:

Solver: The system receives a math problem and generates a detailed text solution.
Evaluator: A secondary LLM reviews the solution for errors. if issues are found, the Solver is triggered to retry.
Script Writer: Converts the verified solution into a conversational script, segmented for optimal pacing (approx. 15-20 seconds per segment).
Video Generator: Generates Manim Python code for each script segment, visualizing the concepts described.
TTS Generator: Synthesizes audio for each segment using VibeVoice.
Renderer: Executes the Manim code to render video segments (supports parallel execution).
Synchronizer & Assembly: Combines audio and video streams, adjusting playback logic as needed, and concatenates all segments into the final output file.

Installation

Prerequisites

Python: Version 3.8 or higher.
FFmpeg: Required for Manim video rendering and audio processing.
System Dependencies: Building VibeVoice and Manim may require system-level development tools (e.g., build-essential, espeak-ng).

Setup

Clone the Repository

git clone https://github.com/anirudhsengar/MathVizAI.git
cd MathVizAI

Install Python Dependencies
```
pip install -r requirements.txt
```
Note: The vibevoice dependency is installed directly from its GitHub repository as specified in requirements.txt.
Configure Environment Variables

Create a .env file in the project root directory and add your API keys:
```
OPENAI_API_KEY=your_openai_api_key
TAVILY_API_KEY=your_tavily_api_key
```
- OPENAI_API_KEY: Required for the LLM (GPT-4o).
- TAVILY_API_KEY: Required for web research capabilities (optional but recommended).

Configuration

The system is highly configurable via config.py. Key settings include:

DEBUG_MODE: Set to True to retain all intermediate files (logs, individual video segments, audio files). Set to False to keep only the final video.
DEEP_DIVE_MODE: When True, generates more comprehensive and detailed explanations.
MANIM_QUALITY: Controls the rendering resolution (low, medium, high, production).
RAG_ENABLED: Toggles the use of the Golden Set RAG system for improved code generation.
MAX_TOKENS & TEMPERATURE: Fine-tune the behavior of the LLMs for different pipeline stages.

Usage

Running the Application

To start the interactive CLI:

python main.py

Follow the prompts to enter a mathematical problem (e.g., "Prove that the square root of 2 is irrational").

Output

All generated content is saved to the output/ directory, organized by timestamp and query name. A typical session folder includes:

final/: Contains the final compiled video.
solver/: Solutions and proof attempts.
script/: Generated audio scripts and segmentation data.
video/: Manim Python scripts (.py).
render/: Raw rendered video segments (.mp4).
audio/: Synthesized audio files (.wav).

Development

The codebase is organized into modular components under the pipeline/ directory:

pipeline/orchestrator.py: Manages the overall data flow and stage execution.
pipeline/solver.py: Handles mathematical reasoning.
pipeline/evaluator.py: Validates solutions.
pipeline/video_generator.py: Generates visualization code.
pipeline/video_renderer.py: Handles Manim rendering.
pipeline/tts_generator.py: Interfaces with the VibeVoice model.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
Sample		Sample
VectorStore		VectorStore
assets/branding		assets/branding
golden_set		golden_set
pipeline		pipeline
src		src
system_prompts		system_prompts
utils		utils
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
embedding.py		embedding.py
image.png		image.png
main.py		main.py
requirements.txt		requirements.txt
store.py		store.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MathVizAI

Overview

Key Features

Sample Output

Technical Architecture

Multi-Agent Orchestration

Retrieval-Augmented Generation (RAG) & Golden Set

Advanced Prompt Engineering

Reliability & Best Practices

System Architecture

Installation

Prerequisites

Setup

Configuration

Usage

Running the Application

Output

Development

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MathVizAI

Overview

Key Features

Sample Output

Technical Architecture

Multi-Agent Orchestration

Retrieval-Augmented Generation (RAG) & Golden Set

Advanced Prompt Engineering

Reliability & Best Practices

System Architecture

Installation

Prerequisites

Setup

Configuration

Usage

Running the Application

Output

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages