CodeAlpha Internship — Task 3: Music Generation with AI
A full-stack web application that trains an LSTM-based Recurrent Neural Network on MIDI data and provides an interactive UI to generate and play new music. The system consists of a Python ML pipeline (data collection, preprocessing, training, generation), a FastAPI backend, and a modern single-page frontend.
- Project Overview
- Architecture
- Prerequisites
- Installation
- Training the Model
- Running the Application
- Using the Frontend
- API Reference
- Project Structure
- Troubleshooting
This project demonstrates end-to-end AI music generation:
- Data Collection: Automatically downloads the JSB Chorales dataset (229 Bach chorales) or generates a synthetic fallback.
- Preprocessing: Uses
music21to parse MIDI files, extract notes and chords, and create numerical sequences. - Model: A two-layer LSTM network built with TensorFlow/Keras that predicts the next note in a sequence.
- Generation: Autoregressively produces new melodies from a random seed, with controllable temperature and length.
- Output: Saves generated music as both
.midand.wavfiles. - Backend: FastAPI server with
/generateand/downloadendpoints. - Frontend: A polished, responsive single-page UI with parameter controls, audio playback, and file download.
┌─────────────────────────────────────────────────────────────────┐
│ Frontend (HTML/JS) │
│ index.html — TailwindCSS UI with audio player + download │
└───────────────────────────┬─────────────────────────────────────┘
│ POST /generate
│ GET /download/{file}
▼
┌─────────────────────────────────────────────────────────────────┐
│ FastAPI Backend (main.py) │
│ CORS middleware · Pydantic validation · File serving │
└───────────────────────────┬─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ML Pipeline (model/) │
│ │
│ train.py generate.py │
│ ├─ download_midi_dataset ├─ load_model_and_metadata │
│ ├─ extract_notes_from_midi├─ generate_notes (autoregressive) │
│ ├─ prepare_sequences ├─ create_midi (music21) │
│ ├─ build_model (LSTM) ├─ midi_to_wav (FluidSynth) │
│ └─ train + save artefacts └─ generate_music (orchestrator) │
│ │
│ saved/ outputs/ │
│ ├─ final_weights.keras ├─ generated_xxxxx.mid │
│ ├─ metadata.pkl └─ generated_xxxxx.wav │
│ └─ notes.pkl │
└─────────────────────────────────────────────────────────────────┘
- Python 3.9 – 3.11 (TensorFlow compatibility)
pippackage manager
FluidSynth is a real-time software synthesizer that renders MIDI files into audio using SoundFont files.
- Download the latest FluidSynth release from GitHub Releases.
- Extract the archive and add the
bin/directory to your system PATH environment variable. - Verify installation:
fluidsynth --version
sudo apt-get update
sudo apt-get install fluidsynthbrew install fluidsynthA SoundFont (.sf2) file is required by FluidSynth to synthesize audio. The recommended file is FluidR3_GM.sf2.
- Download: FluidR3_GM.sf2 or search for it in your distribution's package manager.
- Linux shortcut:
sudo apt-get install fluid-soundfont-gm # Installed to: /usr/share/sounds/sf2/FluidR3_GM.sf2 - Placement: Copy the
.sf2file to the project root directory. The code looks forFluidR3_GM.sf2in the project root by default.
cd /path/to/your/projects
# Place the project files in a directory, e.g.:
mkdir codealpha-music-gen && cd codealpha-music-gen# Windows
python -m venv venv
venv\Scripts\activate
# Linux / macOS
python3 -m venv venv
source venv/bin/activatepip install -r backend/requirements.txtThis installs:
tensorflow— LSTM model training & inferencemusic21— MIDI parsing & creationfastapi+uvicorn— Backend servernumpy,requests,tqdm— Utilities
music21 needs to know where your MIDI reader is. On first use it may prompt you. To pre-configure:
# Run once in Python:
from music21 import environment
env = environment.Environment()
env['musicxmlPath'] = '/usr/bin/musescore' # or your MuseScore path
env['midiPath'] = '/usr/bin/timidity' # or your MIDI playerFor this project, music21 only needs to parse MIDI files, which it does natively — no extra configuration is strictly required.
Training is required before you can generate music. The training script will:
- Download the JSB Chorales dataset automatically (or use existing MIDI files in
data/). - Parse all MIDI files and extract notes/chords.
- Build the LSTM model architecture.
- Train for the specified number of epochs.
- Save model weights and metadata to
backend/model/saved/.
# From the project root:
python -m backend.model.trainpython -m backend.model.train --epochs 100 --batch-size 128 --seq-length 100| Parameter | Default | Description |
|---|---|---|
--epochs |
50 | Number of training epochs |
--batch-size |
64 | Training batch size |
--seq-length |
100 | Input sequence length (notes) |
--data-dir |
data/ |
Directory containing .mid files |
--output-dir |
backend/model/saved/ |
Where to save model artefacts |
Place any .mid or .midi files in the data/ directory. The training script will automatically detect them and skip the download step.
- JSB Chorales (~229 pieces, ~5 MB): ~5–15 minutes on CPU, ~2–5 minutes on GPU.
- Larger datasets: Scales linearly with data size. For a dataset like MAESTRO, expect 1–4 hours on GPU.
After training, the following files appear in backend/model/saved/:
| File | Purpose |
|---|---|
final_weights.keras |
The trained Keras model (architecture + weights) |
metadata.pkl |
Vocabulary mappings (note_to_int, int_to_note, seq_length, vocab_size) |
notes.pkl |
The full notes corpus (used for seed selection during generation) |
weights-XX-LOSS.keras |
Checkpoint files from training (best loss) |
# From the project root:
uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reloadYou should see:
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO: Started reloader process
Open your browser and navigate to:
- API docs: http://localhost:8000/docs (Swagger UI)
- Health check: http://localhost:8000/health
Simply open frontend/index.html in your web browser:
- Windows: Double-click the file in File Explorer
- Linux/macOS:
xdg-open frontend/index.htmloropen frontend/index.html - Or: Use a local dev server like the VS Code Live Server extension
The frontend will automatically check if the backend is running and display the model status.
-
Adjust Parameters — Use the sliders to configure:
- Number of Notes (50–2000): How many note events to generate. More notes = longer piece.
- Temperature (0.1–2.0): Controls randomness. Lower = more repetitive/conservative; higher = more creative/unpredictable.
- Tempo (40–300 BPM): Speed of the generated music.
-
Click "Generate Music" — The button shows a loading spinner while the model generates. This typically takes 10–30 seconds depending on the number of notes and your hardware.
-
Listen — Once generation completes, the HTML5 audio player automatically starts playing the
.wavfile. -
Download — Use the "Download MIDI" or "Download WAV" buttons to save the files to your computer. MIDI files can be opened in any DAW or notation software (MuseScore, Ableton, FL Studio, etc.).
Generate a new music sequence using the trained LSTM model.
Request Body:
{
"num_notes": 500,
"temperature": 1.0,
"tempo": 120
}| Field | Type | Range | Default | Description |
|---|---|---|---|---|
num_notes |
int | 50–5000 | 500 | Number of note events to generate |
temperature |
float | 0.1–2.0 | 1.0 | Sampling temperature |
tempo |
int | 40–300 | 120 | Tempo in BPM |
Response (200):
{
"success": true,
"midi_filename": "generated_a1b2c3d4.mid",
"wav_filename": "generated_a1b2c3d4.wav",
"midi_url": "/download/generated_a1b2c3d4.mid",
"wav_url": "/download/generated_a1b2c3d4.wav",
"num_notes": 500,
"temperature": 1.0,
"message": "Music generated successfully!"
}Error (503): Model not trained yet.
Download a generated file.
| Parameter | Type | Description |
|---|---|---|
filename |
string | The filename returned by /generate |
Returns the file with the appropriate Content-Type header (audio/wav or audio/midi).
Health check endpoint.
{
"status": "ok",
"model_loaded": true
}codealpha-music-gen/
├── backend/
│ ├── main.py # FastAPI server
│ ├── requirements.txt # Python dependencies
│ ├── model/
│ │ ├── train.py # Data prep & model training
│ │ ├── generate.py # Music generation logic
│ │ └── saved/ # Model artefacts (created after training)
│ │ ├── final_weights.keras
│ │ ├── metadata.pkl
│ │ └── notes.pkl
│ └── outputs/ # Generated audio files (created at runtime)
├── frontend/
│ └── index.html # Single-page UI
├── data/ # MIDI dataset (auto-downloaded)
├── FluidR3_GM.sf2 # SoundFont file (user-provided)
└── README.md # This file
Run the training script before starting the backend:
python -m backend.model.train --epochs 50- Download
FluidR3_GM.sf2and place it in the project root. - On Linux, install via:
sudo apt-get install fluid-soundfont-gm - The code will auto-detect SoundFonts in common system directories as a fallback.
- Ensure FluidSynth is installed and on your system PATH:
fluidsynth --version
- On Windows, you may need to add the FluidSynth
bin/directory to your PATH manually. - The
.midfile will still be generated even if WAV conversion fails — you can open it in any MIDI player or DAW.
- Make sure the FastAPI server is running:
uvicorn backend.main:app --host 0.0.0.0 --port 8000 - Check that nothing else is using port 8000.
- If running the frontend from a file:// URL, some browsers may block requests to localhost. Use a local dev server or allow mixed content.
If you have an NVIDIA GPU, install CUDA and cuDNN to significantly speed up training:
pip install tensorflow[and-cuda]Refer to the TensorFlow GPU guide for detailed setup instructions.
If you encounter OOM errors:
- Reduce
--batch-size(try 32 or 16) - Reduce
--seq-length(try 50) - Use a smaller dataset
Built as part of the CodeAlpha Internship Program — Task 3: Music Generation with AI