AI Music Generator

CodeAlpha Internship — Task 3: Music Generation with AI

A full-stack web application that trains an LSTM-based Recurrent Neural Network on MIDI data and provides an interactive UI to generate and play new music. The system consists of a Python ML pipeline (data collection, preprocessing, training, generation), a FastAPI backend, and a modern single-page frontend.

Project Overview

This project demonstrates end-to-end AI music generation:

Data Collection: Automatically downloads the JSB Chorales dataset (229 Bach chorales) or generates a synthetic fallback.
Preprocessing: Uses music21 to parse MIDI files, extract notes and chords, and create numerical sequences.
Model: A two-layer LSTM network built with TensorFlow/Keras that predicts the next note in a sequence.
Generation: Autoregressively produces new melodies from a random seed, with controllable temperature and length.
Output: Saves generated music as both .mid and .wav files.
Backend: FastAPI server with /generate and /download endpoints.
Frontend: A polished, responsive single-page UI with parameter controls, audio playback, and file download.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Frontend (HTML/JS)                        │
│   index.html — TailwindCSS UI with audio player + download      │
└───────────────────────────┬─────────────────────────────────────┘
                            │  POST /generate
                            │  GET  /download/{file}
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                    FastAPI Backend (main.py)                      │
│   CORS middleware · Pydantic validation · File serving            │
└───────────────────────────┬─────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                  ML Pipeline (model/)                             │
│                                                                  │
│   train.py                  generate.py                          │
│   ├─ download_midi_dataset  ├─ load_model_and_metadata           │
│   ├─ extract_notes_from_midi├─ generate_notes (autoregressive)   │
│   ├─ prepare_sequences      ├─ create_midi (music21)             │
│   ├─ build_model (LSTM)     ├─ midi_to_wav (FluidSynth)         │
│   └─ train + save artefacts └─ generate_music (orchestrator)     │
│                                                                  │
│   saved/                     outputs/                            │
│   ├─ final_weights.keras    ├─ generated_xxxxx.mid               │
│   ├─ metadata.pkl           └─ generated_xxxxx.wav               │
│   └─ notes.pkl                                                  │
└─────────────────────────────────────────────────────────────────┘

Prerequisites

Python

Python 3.9 – 3.11 (TensorFlow compatibility)
pip package manager

FluidSynth (required for MIDI → WAV conversion)

FluidSynth is a real-time software synthesizer that renders MIDI files into audio using SoundFont files.

Windows

Download the latest FluidSynth release from GitHub Releases.
Extract the archive and add the bin/ directory to your system PATH environment variable.
Verify installation:
```
fluidsynth --version
```

Linux (Ubuntu/Debian)

sudo apt-get update
sudo apt-get install fluidsynth

macOS

brew install fluidsynth

SoundFont File

A SoundFont (.sf2) file is required by FluidSynth to synthesize audio. The recommended file is FluidR3_GM.sf2.

Download: FluidR3_GM.sf2 or search for it in your distribution's package manager.

Linux shortcut:

sudo apt-get install fluid-soundfont-gm
# Installed to: /usr/share/sounds/sf2/FluidR3_GM.sf2

Placement: Copy the .sf2 file to the project root directory. The code looks for FluidR3_GM.sf2 in the project root by default.

Installation

1. Clone or Download the Project

cd /path/to/your/projects
# Place the project files in a directory, e.g.:
mkdir codealpha-music-gen && cd codealpha-music-gen

2. Create a Virtual Environment (recommended)

# Windows
python -m venv venv
venv\Scripts\activate

# Linux / macOS
python3 -m venv venv
source venv/bin/activate

3. Install Python Dependencies

pip install -r backend/requirements.txt

This installs:

tensorflow — LSTM model training & inference
music21 — MIDI parsing & creation
fastapi + uvicorn — Backend server
numpy, requests, tqdm — Utilities

4. Verify music21 MIDI Configuration (optional)

music21 needs to know where your MIDI reader is. On first use it may prompt you. To pre-configure:

# Run once in Python:
from music21 import environment
env = environment.Environment()
env['musicxmlPath'] = '/usr/bin/musescore'  # or your MuseScore path
env['midiPath'] = '/usr/bin/timidity'        # or your MIDI player

For this project, music21 only needs to parse MIDI files, which it does natively — no extra configuration is strictly required.

Training the Model

Training is required before you can generate music. The training script will:

Download the JSB Chorales dataset automatically (or use existing MIDI files in data/).
Parse all MIDI files and extract notes/chords.
Build the LSTM model architecture.
Train for the specified number of epochs.
Save model weights and metadata to backend/model/saved/.

Quick Start (default settings)

# From the project root:
python -m backend.model.train

Custom Training Parameters

python -m backend.model.train --epochs 100 --batch-size 128 --seq-length 100

Parameter	Default	Description
`--epochs`	50	Number of training epochs
`--batch-size`	64	Training batch size
`--seq-length`	100	Input sequence length (notes)
`--data-dir`	`data/`	Directory containing `.mid` files
`--output-dir`	`backend/model/saved/`	Where to save model artefacts

Using Your Own MIDI Dataset

Place any .mid or .midi files in the data/ directory. The training script will automatically detect them and skip the download step.

Training Time

JSB Chorales (~229 pieces, ~5 MB): ~5–15 minutes on CPU, ~2–5 minutes on GPU.
Larger datasets: Scales linearly with data size. For a dataset like MAESTRO, expect 1–4 hours on GPU.

What Gets Saved

After training, the following files appear in backend/model/saved/:

File	Purpose
`final_weights.keras`	The trained Keras model (architecture + weights)
`metadata.pkl`	Vocabulary mappings (`note_to_int`, `int_to_note`, `seq_length`, `vocab_size`)
`notes.pkl`	The full notes corpus (used for seed selection during generation)
`weights-XX-LOSS.keras`	Checkpoint files from training (best loss)

Running the Application

1. Start the Backend

# From the project root:
uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload

You should see:

INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     Started reloader process

2. Verify the Backend

Open your browser and navigate to:

API docs: http://localhost:8000/docs (Swagger UI)
Health check: http://localhost:8000/health

3. Open the Frontend

Simply open frontend/index.html in your web browser:

Windows: Double-click the file in File Explorer
Linux/macOS: xdg-open frontend/index.html or open frontend/index.html
Or: Use a local dev server like the VS Code Live Server extension

The frontend will automatically check if the backend is running and display the model status.

Using the Frontend

Adjust Parameters — Use the sliders to configure:
- Number of Notes (50–2000): How many note events to generate. More notes = longer piece.
- Temperature (0.1–2.0): Controls randomness. Lower = more repetitive/conservative; higher = more creative/unpredictable.
- Tempo (40–300 BPM): Speed of the generated music.
Click "Generate Music" — The button shows a loading spinner while the model generates. This typically takes 10–30 seconds depending on the number of notes and your hardware.
Listen — Once generation completes, the HTML5 audio player automatically starts playing the .wav file.
Download — Use the "Download MIDI" or "Download WAV" buttons to save the files to your computer. MIDI files can be opened in any DAW or notation software (MuseScore, Ableton, FL Studio, etc.).

API Reference

`POST /generate`

Generate a new music sequence using the trained LSTM model.

Request Body:

{
  "num_notes": 500,
  "temperature": 1.0,
  "tempo": 120
}

Field	Type	Range	Default	Description
`num_notes`	int	50–5000	500	Number of note events to generate
`temperature`	float	0.1–2.0	1.0	Sampling temperature
`tempo`	int	40–300	120	Tempo in BPM

Response (200):

{
  "success": true,
  "midi_filename": "generated_a1b2c3d4.mid",
  "wav_filename": "generated_a1b2c3d4.wav",
  "midi_url": "/download/generated_a1b2c3d4.mid",
  "wav_url": "/download/generated_a1b2c3d4.wav",
  "num_notes": 500,
  "temperature": 1.0,
  "message": "Music generated successfully!"
}

Error (503): Model not trained yet.

`GET /download/{filename}`

Download a generated file.

Parameter	Type	Description
`filename`	string	The filename returned by `/generate`

Returns the file with the appropriate Content-Type header (audio/wav or audio/midi).

`GET /health`

Health check endpoint.

{
  "status": "ok",
  "model_loaded": true
}

Project Structure

codealpha-music-gen/
├── backend/
│   ├── main.py                  # FastAPI server
│   ├── requirements.txt         # Python dependencies
│   ├── model/
│   │   ├── train.py             # Data prep & model training
│   │   ├── generate.py          # Music generation logic
│   │   └── saved/               # Model artefacts (created after training)
│   │       ├── final_weights.keras
│   │       ├── metadata.pkl
│   │       └── notes.pkl
│   └── outputs/                 # Generated audio files (created at runtime)
├── frontend/
│   └── index.html               # Single-page UI
├── data/                        # MIDI dataset (auto-downloaded)
├── FluidR3_GM.sf2              # SoundFont file (user-provided)
└── README.md                    # This file

Troubleshooting

"Model is not trained yet"

Run the training script before starting the backend:

python -m backend.model.train --epochs 50

"SoundFont file not found"

Download FluidR3_GM.sf2 and place it in the project root.
On Linux, install via: sudo apt-get install fluid-soundfont-gm
The code will auto-detect SoundFonts in common system directories as a fallback.

"WAV conversion failed"

Ensure FluidSynth is installed and on your system PATH:
```
fluidsynth --version
```
On Windows, you may need to add the FluidSynth bin/ directory to your PATH manually.
The .mid file will still be generated even if WAV conversion fails — you can open it in any MIDI player or DAW.

"Cannot connect to the backend"

Make sure the FastAPI server is running: uvicorn backend.main:app --host 0.0.0.0 --port 8000
Check that nothing else is using port 8000.
If running the frontend from a file:// URL, some browsers may block requests to localhost. Use a local dev server or allow mixed content.

TensorFlow GPU Acceleration

If you have an NVIDIA GPU, install CUDA and cuDNN to significantly speed up training:

pip install tensorflow[and-cuda]

Refer to the TensorFlow GPU guide for detailed setup instructions.

Out of Memory During Training

If you encounter OOM errors:

Reduce --batch-size (try 32 or 16)
Reduce --seq-length (try 50)
Use a smaller dataset

Built as part of the CodeAlpha Internship Program — Task 3: Music Generation with AI

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
-r		-r
.gitignore		.gitignore
0.0.6		0.0.6
0.1.1		0.1.1
0.100.0		0.100.0
0.2.10		0.2.10
0.25.1		0.25.1
2.31.0		2.31.0
26.1.1		26.1.1
4.65.0		4.65.0
9.1.0		9.1.0
Demo Screenshot.png		Demo Screenshot.png
README.md		README.md
python		python
python.exe		python.exe

Folders and files

Latest commit

History

Repository files navigation

AI Music Generator

Table of Contents

Project Overview

Architecture

Prerequisites

Python

FluidSynth (required for MIDI → WAV conversion)

Windows

Linux (Ubuntu/Debian)

macOS

SoundFont File

Installation

1. Clone or Download the Project

2. Create a Virtual Environment (recommended)

3. Install Python Dependencies

4. Verify music21 MIDI Configuration (optional)

Training the Model

Quick Start (default settings)

Custom Training Parameters

Using Your Own MIDI Dataset

Training Time

What Gets Saved

Running the Application

1. Start the Backend

2. Verify the Backend

3. Open the Frontend

Using the Frontend

API Reference

POST /generate

GET /download/{filename}

GET /health

Project Structure

Troubleshooting

"Model is not trained yet"

"SoundFont file not found"

"WAV conversion failed"

"Cannot connect to the backend"

TensorFlow GPU Acceleration

Out of Memory During Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /generate`

`GET /download/{filename}`

`GET /health`

Packages