Skip to content

anirudhsengar/MathVizAI

Repository files navigation

MathVizAI

Overview

MathVizAI is an automated end-to-end system designed to generate polished, educational mathematical videos. It accepts a mathematical problem as input and autonomously produces a complete video explanation featuring synchronized visualizations, audio narration, and detailed step-by-step proofs. The system integrates advanced Large Language Models (LLMs) for reasoning, a self-correcting evaluation loop for accuracy, Manim for high-quality mathematical animations, and Microsoft VibeVoice for natural-sounding speech synthesis.

MathVizAI

Key Features

  • Automated Problem Solving: Utilizes LLMs to solve complex mathematical problems with rigorous proofs and logical steps.
  • Self-Correcting Verification: Includes a dedicated evaluation stage that validates the solution's accuracy and iterates until a correct proof is generated.
  • Dynamic Visualizations: Automatically generates Python code for Manim (Mathematical Animation Engine) to create precise and aesthetically pleasing mathematical animations.
  • High-Quality Neural TTS: Integrates Microsoft's VibeVoice (via vibevoice) to generate natural, expressive audio narration.
  • RAG-Enhanced Generation: employs a Retrieval-Augmented Generation (RAG) system with a "Golden Set" of high-quality examples to ensure reliable Manim code generation.
  • Audio-Visual Synchronization: Automatically aligns generated audio segments with their corresponding video animations for a seamless viewing experience.
  • Pipeline Architecture: Modular design separating solving, script writing, video generation, and rendering for robustness and maintainability.

Sample Output

Check out a sample video generated by MathVizAI explaining Taylor Series:

Note: If the video doesn't play in your viewer, you can download it here.

Technical Architecture

MathVizAI is built on a sophisticated Multi-Agent and RAG architecture designed for autonomy and self-correction.

Multi-Agent Orchestration

The system is managed by a central PipelineOrchestrator that coordinates specialized autonomous agents:

  • Solver Agent: Uses pure reasoning to generate mathematical proofs.
  • Evaluator Agent (Self-Correction): Acts as a critic, reviewing the Solver's output. If errors are found, it rejects the solution and provides specific feedback, triggering a retry loop until accuracy is verified.
  • Script Agent: Transforms rigorous proofs into engaging, conversational scripts suitable for audio narration.
  • Visual Developer Agent: A specialized code-generation agent that writes Manim Python scripts.

Retrieval-Augmented Generation (RAG) & Golden Set

To ensure high-quality visual output, the Visual Developer Agent employs an advanced RAG pipeline:

  • Golden Set: A curated vector database of high-quality Manim animations (e.g., from 3Blue1Brown).
  • ReAct Loop: Before writing code, the agent enters a "Reasoning + Acting" loop. It actively searches the Golden Set for relevant visualization techniques (e.g., "how to animate a riemann sum") and retrieves proven code snippets to inform its generation, significantly reducing syntax errors and hallucinations.

Advanced Prompt Engineering

The system leverages state-of-the-art prompt engineering techniques to maximize LLM performance:

  • Chain of Thought (CoT): The Solver agent is explicitly prompted to "think step-by-step" and provide intuitive overviews before attempting formal proofs, mirroring human mathematical reasoning.
  • Role Prompting: Agents are assigned specific personas (e.g., "Meticulous Validator", "3Blue1Brown-style Developer") to align their tone, rigor, and output style with expert standards.
  • Structured Output: The Evaluator agent uses strict schema enforcement (JSON-like structures) to provide parseable, quantitative feedback (scores 0-10) rather than vague text.
  • Constraint-Based Prompting: The Video Generator operates under strict defining constraints, such as hard "Frame Boundaries" (X=[-7.1, 7.1]) and rigid "Timing Contracts" to ensure perfect audio-visual sync.

Reliability & Best Practices

Beyond standard error handling, the pipeline implements agent-native reliability features:

  • Self-Correction Loops: The "Solver-Evaluator" loop acts as an autonomous feedback mechanism, catching and fixing logical errors before they reach the final output.
  • Defensive Generation: The system uses a library of "safe" wrapper functions (visual_utils) that abstract away complex or fragile Manim operations, preventing common runtime crashes during code generation.
  • Dry-Run Verification: Generated Manim code undergoes a syntax check and a "dry-run" execution phase to detect runtime errors (like LaTeX compilation failures) before committing to full-quality rendering.

System Architecture

The pipeline processes a query through the following sequential stages:

  1. Solver: The system receives a math problem and generates a detailed text solution.
  2. Evaluator: A secondary LLM reviews the solution for errors. if issues are found, the Solver is triggered to retry.
  3. Script Writer: Converts the verified solution into a conversational script, segmented for optimal pacing (approx. 15-20 seconds per segment).
  4. Video Generator: Generates Manim Python code for each script segment, visualizing the concepts described.
  5. TTS Generator: Synthesizes audio for each segment using VibeVoice.
  6. Renderer: Executes the Manim code to render video segments (supports parallel execution).
  7. Synchronizer & Assembly: Combines audio and video streams, adjusting playback logic as needed, and concatenates all segments into the final output file.

Installation

Prerequisites

  • Python: Version 3.8 or higher.
  • FFmpeg: Required for Manim video rendering and audio processing.
  • System Dependencies: Building VibeVoice and Manim may require system-level development tools (e.g., build-essential, espeak-ng).

Setup

  1. Clone the Repository

    git clone https://github.com/anirudhsengar/MathVizAI.git
    cd MathVizAI
  2. Install Python Dependencies

    pip install -r requirements.txt

    Note: The vibevoice dependency is installed directly from its GitHub repository as specified in requirements.txt.

  3. Configure Environment Variables

    Create a .env file in the project root directory and add your API keys:

    OPENAI_API_KEY=your_openai_api_key
    TAVILY_API_KEY=your_tavily_api_key
    • OPENAI_API_KEY: Required for the LLM (GPT-4o).
    • TAVILY_API_KEY: Required for web research capabilities (optional but recommended).

Configuration

The system is highly configurable via config.py. Key settings include:

  • DEBUG_MODE: Set to True to retain all intermediate files (logs, individual video segments, audio files). Set to False to keep only the final video.
  • DEEP_DIVE_MODE: When True, generates more comprehensive and detailed explanations.
  • MANIM_QUALITY: Controls the rendering resolution (low, medium, high, production).
  • RAG_ENABLED: Toggles the use of the Golden Set RAG system for improved code generation.
  • MAX_TOKENS & TEMPERATURE: Fine-tune the behavior of the LLMs for different pipeline stages.

Usage

Running the Application

To start the interactive CLI:

python main.py

Follow the prompts to enter a mathematical problem (e.g., "Prove that the square root of 2 is irrational").

Output

All generated content is saved to the output/ directory, organized by timestamp and query name. A typical session folder includes:

  • final/: Contains the final compiled video.
  • solver/: Solutions and proof attempts.
  • script/: Generated audio scripts and segmentation data.
  • video/: Manim Python scripts (.py).
  • render/: Raw rendered video segments (.mp4).
  • audio/: Synthesized audio files (.wav).

Development

The codebase is organized into modular components under the pipeline/ directory:

  • pipeline/orchestrator.py: Manages the overall data flow and stage execution.
  • pipeline/solver.py: Handles mathematical reasoning.
  • pipeline/evaluator.py: Validates solutions.
  • pipeline/video_generator.py: Generates visualization code.
  • pipeline/video_renderer.py: Handles Manim rendering.
  • pipeline/tts_generator.py: Interfaces with the VibeVoice model.

License

This project is licensed under the MIT License.

About

A complete end-to-end system that takes mathematical problems and automatically generates polished educational videos

Resources

License

Stars

Watchers

Forks

Contributors