Skip to content

qatre-ai/CodeAlpha-AI-Music-Generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Music Generator

CodeAlpha Internship — Task 3: Music Generation with AI

A full-stack web application that trains an LSTM-based Recurrent Neural Network on MIDI data and provides an interactive UI to generate and play new music. The system consists of a Python ML pipeline (data collection, preprocessing, training, generation), a FastAPI backend, and a modern single-page frontend.


Table of Contents

  1. Project Overview
  2. Architecture
  3. Prerequisites
  4. Installation
  5. Training the Model
  6. Running the Application
  7. Using the Frontend
  8. API Reference
  9. Project Structure
  10. Troubleshooting

Project Overview

This project demonstrates end-to-end AI music generation:

  • Data Collection: Automatically downloads the JSB Chorales dataset (229 Bach chorales) or generates a synthetic fallback.
  • Preprocessing: Uses music21 to parse MIDI files, extract notes and chords, and create numerical sequences.
  • Model: A two-layer LSTM network built with TensorFlow/Keras that predicts the next note in a sequence.
  • Generation: Autoregressively produces new melodies from a random seed, with controllable temperature and length.
  • Output: Saves generated music as both .mid and .wav files.
  • Backend: FastAPI server with /generate and /download endpoints.
  • Frontend: A polished, responsive single-page UI with parameter controls, audio playback, and file download.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Frontend (HTML/JS)                        │
│   index.html — TailwindCSS UI with audio player + download      │
└───────────────────────────┬─────────────────────────────────────┘
                            │  POST /generate
                            │  GET  /download/{file}
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                    FastAPI Backend (main.py)                      │
│   CORS middleware · Pydantic validation · File serving            │
└───────────────────────────┬─────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                  ML Pipeline (model/)                             │
│                                                                  │
│   train.py                  generate.py                          │
│   ├─ download_midi_dataset  ├─ load_model_and_metadata           │
│   ├─ extract_notes_from_midi├─ generate_notes (autoregressive)   │
│   ├─ prepare_sequences      ├─ create_midi (music21)             │
│   ├─ build_model (LSTM)     ├─ midi_to_wav (FluidSynth)         │
│   └─ train + save artefacts └─ generate_music (orchestrator)     │
│                                                                  │
│   saved/                     outputs/                            │
│   ├─ final_weights.keras    ├─ generated_xxxxx.mid               │
│   ├─ metadata.pkl           └─ generated_xxxxx.wav               │
│   └─ notes.pkl                                                  │
└─────────────────────────────────────────────────────────────────┘

Prerequisites

Python

  • Python 3.9 – 3.11 (TensorFlow compatibility)
  • pip package manager

FluidSynth (required for MIDI → WAV conversion)

FluidSynth is a real-time software synthesizer that renders MIDI files into audio using SoundFont files.

Windows

  1. Download the latest FluidSynth release from GitHub Releases.
  2. Extract the archive and add the bin/ directory to your system PATH environment variable.
  3. Verify installation:
    fluidsynth --version

Linux (Ubuntu/Debian)

sudo apt-get update
sudo apt-get install fluidsynth

macOS

brew install fluidsynth

SoundFont File

A SoundFont (.sf2) file is required by FluidSynth to synthesize audio. The recommended file is FluidR3_GM.sf2.

  • Download: FluidR3_GM.sf2 or search for it in your distribution's package manager.
  • Linux shortcut:
    sudo apt-get install fluid-soundfont-gm
    # Installed to: /usr/share/sounds/sf2/FluidR3_GM.sf2
  • Placement: Copy the .sf2 file to the project root directory. The code looks for FluidR3_GM.sf2 in the project root by default.

Installation

1. Clone or Download the Project

cd /path/to/your/projects
# Place the project files in a directory, e.g.:
mkdir codealpha-music-gen && cd codealpha-music-gen

2. Create a Virtual Environment (recommended)

# Windows
python -m venv venv
venv\Scripts\activate

# Linux / macOS
python3 -m venv venv
source venv/bin/activate

3. Install Python Dependencies

pip install -r backend/requirements.txt

This installs:

  • tensorflow — LSTM model training & inference
  • music21 — MIDI parsing & creation
  • fastapi + uvicorn — Backend server
  • numpy, requests, tqdm — Utilities

4. Verify music21 MIDI Configuration (optional)

music21 needs to know where your MIDI reader is. On first use it may prompt you. To pre-configure:

# Run once in Python:
from music21 import environment
env = environment.Environment()
env['musicxmlPath'] = '/usr/bin/musescore'  # or your MuseScore path
env['midiPath'] = '/usr/bin/timidity'        # or your MIDI player

For this project, music21 only needs to parse MIDI files, which it does natively — no extra configuration is strictly required.


Training the Model

Training is required before you can generate music. The training script will:

  1. Download the JSB Chorales dataset automatically (or use existing MIDI files in data/).
  2. Parse all MIDI files and extract notes/chords.
  3. Build the LSTM model architecture.
  4. Train for the specified number of epochs.
  5. Save model weights and metadata to backend/model/saved/.

Quick Start (default settings)

# From the project root:
python -m backend.model.train

Custom Training Parameters

python -m backend.model.train --epochs 100 --batch-size 128 --seq-length 100
Parameter Default Description
--epochs 50 Number of training epochs
--batch-size 64 Training batch size
--seq-length 100 Input sequence length (notes)
--data-dir data/ Directory containing .mid files
--output-dir backend/model/saved/ Where to save model artefacts

Using Your Own MIDI Dataset

Place any .mid or .midi files in the data/ directory. The training script will automatically detect them and skip the download step.

Training Time

  • JSB Chorales (~229 pieces, ~5 MB): ~5–15 minutes on CPU, ~2–5 minutes on GPU.
  • Larger datasets: Scales linearly with data size. For a dataset like MAESTRO, expect 1–4 hours on GPU.

What Gets Saved

After training, the following files appear in backend/model/saved/:

File Purpose
final_weights.keras The trained Keras model (architecture + weights)
metadata.pkl Vocabulary mappings (note_to_int, int_to_note, seq_length, vocab_size)
notes.pkl The full notes corpus (used for seed selection during generation)
weights-XX-LOSS.keras Checkpoint files from training (best loss)

Running the Application

1. Start the Backend

# From the project root:
uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload

You should see:

INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started reloader process

2. Verify the Backend

Open your browser and navigate to:

3. Open the Frontend

Simply open frontend/index.html in your web browser:

  • Windows: Double-click the file in File Explorer
  • Linux/macOS: xdg-open frontend/index.html or open frontend/index.html
  • Or: Use a local dev server like the VS Code Live Server extension

The frontend will automatically check if the backend is running and display the model status.


Using the Frontend

  1. Adjust Parameters — Use the sliders to configure:

    • Number of Notes (50–2000): How many note events to generate. More notes = longer piece.
    • Temperature (0.1–2.0): Controls randomness. Lower = more repetitive/conservative; higher = more creative/unpredictable.
    • Tempo (40–300 BPM): Speed of the generated music.
  2. Click "Generate Music" — The button shows a loading spinner while the model generates. This typically takes 10–30 seconds depending on the number of notes and your hardware.

  3. Listen — Once generation completes, the HTML5 audio player automatically starts playing the .wav file.

  4. Download — Use the "Download MIDI" or "Download WAV" buttons to save the files to your computer. MIDI files can be opened in any DAW or notation software (MuseScore, Ableton, FL Studio, etc.).


API Reference

POST /generate

Generate a new music sequence using the trained LSTM model.

Request Body:

{
  "num_notes": 500,
  "temperature": 1.0,
  "tempo": 120
}
Field Type Range Default Description
num_notes int 50–5000 500 Number of note events to generate
temperature float 0.1–2.0 1.0 Sampling temperature
tempo int 40–300 120 Tempo in BPM

Response (200):

{
  "success": true,
  "midi_filename": "generated_a1b2c3d4.mid",
  "wav_filename": "generated_a1b2c3d4.wav",
  "midi_url": "/download/generated_a1b2c3d4.mid",
  "wav_url": "/download/generated_a1b2c3d4.wav",
  "num_notes": 500,
  "temperature": 1.0,
  "message": "Music generated successfully!"
}

Error (503): Model not trained yet.


GET /download/{filename}

Download a generated file.

Parameter Type Description
filename string The filename returned by /generate

Returns the file with the appropriate Content-Type header (audio/wav or audio/midi).


GET /health

Health check endpoint.

{
  "status": "ok",
  "model_loaded": true
}

Project Structure

codealpha-music-gen/
├── backend/
│   ├── main.py                  # FastAPI server
│   ├── requirements.txt         # Python dependencies
│   ├── model/
│   │   ├── train.py             # Data prep & model training
│   │   ├── generate.py          # Music generation logic
│   │   └── saved/               # Model artefacts (created after training)
│   │       ├── final_weights.keras
│   │       ├── metadata.pkl
│   │       └── notes.pkl
│   └── outputs/                 # Generated audio files (created at runtime)
├── frontend/
│   └── index.html               # Single-page UI
├── data/                        # MIDI dataset (auto-downloaded)
├── FluidR3_GM.sf2              # SoundFont file (user-provided)
└── README.md                    # This file

Troubleshooting

"Model is not trained yet"

Run the training script before starting the backend:

python -m backend.model.train --epochs 50

"SoundFont file not found"

  • Download FluidR3_GM.sf2 and place it in the project root.
  • On Linux, install via: sudo apt-get install fluid-soundfont-gm
  • The code will auto-detect SoundFonts in common system directories as a fallback.

"WAV conversion failed"

  • Ensure FluidSynth is installed and on your system PATH:
    fluidsynth --version
  • On Windows, you may need to add the FluidSynth bin/ directory to your PATH manually.
  • The .mid file will still be generated even if WAV conversion fails — you can open it in any MIDI player or DAW.

"Cannot connect to the backend"

  • Make sure the FastAPI server is running: uvicorn backend.main:app --host 0.0.0.0 --port 8000
  • Check that nothing else is using port 8000.
  • If running the frontend from a file:// URL, some browsers may block requests to localhost. Use a local dev server or allow mixed content.

TensorFlow GPU Acceleration

If you have an NVIDIA GPU, install CUDA and cuDNN to significantly speed up training:

pip install tensorflow[and-cuda]

Refer to the TensorFlow GPU guide for detailed setup instructions.

Out of Memory During Training

If you encounter OOM errors:

  • Reduce --batch-size (try 32 or 16)
  • Reduce --seq-length (try 50)
  • Use a smaller dataset

Built as part of the CodeAlpha Internship Program — Task 3: Music Generation with AI

About

LSTM-based Music Generation Web App | CodeAlpha Task 3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors