MusicBrainz Data Dumps to PostgreSQL, in one beat
musicbrainz-database-setup is a Python CLI that sets up a full MusicBrainz database in PostgreSQL. Point it at a connection string and a dump, and it handles the rest: resumable, SHA256-verified downloads from the MetaBrainz mirror, schema creation via the upstream admin/sql/*.sql files, and streaming COPY of every table with live progress.
This project originated from RAGtime, where it powers the entity-resolution step that maps extracted mentions to canonical MusicBrainz entities.
- PostgreSQL 16 or later — any official
postgres:*Docker image satisfies every server-side requirement out of the box. - A role with
SUPERUSERon the target DB (the defaultpostgresuser works). psqlon your$PATHon the machine running the tool.pbzip2(orlbzip2) on$PATH(optional, recommended) — parallelises bz2 decompression during COPY, the single biggest phase of a fresh import. Without it the tool falls back to CPython's single-threaded stdlibbz2.
See docs/README.md for the full setup guide — extensions, ICU, server-side tuning, disk/memory, and managed-PostgreSQL notes.
Any official postgres:* image works. --name is the Docker container name (for docker exec / docker stop); the database name used by the tool is postgres, the default created by the image's entrypoint.
docker run -d \
--name musicbrainz-postgres \
-e POSTGRES_PASSWORD=postgres \
-p 5432:5432 \
postgres:17-alpine💡 Want a faster import? Run two short
psqlblocks before and after the import —ALTER SYSTEM SETfour reload-able knobs (max_wal_size,checkpoint_timeout, …) andALTER SYSTEM RESETthem after — to roughly halve post-import DDL time without restarting Postgres. See Server-side tuning in the reference guide for the statements and per-setting rationale.
Use uvx to run the CLI straight from GitHub:
uvx --from git+https://github.com/rafacm/musicbrainz-database-setup \
musicbrainz-database-setup run \
--db postgresql://postgres:postgres@localhost:5432/postgres \
--modules core \
--latestConnection string is postgresql://<user>:<password>@<host>:<port>/<database>. If neither --latest nor --date YYYYMMDD-HHMMSS is passed, run interactively prompts for a dump directory from the mirror.
The main configuration options have MUSICBRAINZ_DATABASE_SETUP_* env-var equivalents, and standard PG* libpq vars apply to the connection string. See Configuration in the reference guide for the full list and how to keep the password out of the URL.
Open a psql session against the database and run a couple of sanity queries:
psql postgresql://postgres:postgres@localhost:5432/postgres-- Row counts of the top-level entities
SELECT
(SELECT count(*) FROM musicbrainz.artist) AS artists,
(SELECT count(*) FROM musicbrainz.release) AS releases,
(SELECT count(*) FROM musicbrainz.recording) AS recordings;Look up Django Reinhardt by his MBID (his Wikidata ID is Q44122) — joining artist to gender, artist_type, and area for the human-readable details:
SELECT
a.gid,
a.name,
a.sort_name,
g.name AS gender,
at.name AS type,
a.begin_date_year,
a.end_date_year,
birth_area.name AS birth_area,
a.comment
FROM musicbrainz.artist a
LEFT JOIN musicbrainz.gender g ON a.gender = g.id
LEFT JOIN musicbrainz.artist_type at ON a.type = at.id
LEFT JOIN musicbrainz.area birth_area ON a.area = birth_area.id
WHERE a.gid = '650bf385-6f6d-4992-a3b9-779d144920a4';…and his external links via the URL relationship system (l_artist_url → link → link_type → url) — Wikidata, Discogs, allmusic, last.fm, IMDb, and many more:
SELECT lt.name AS link_type, u.url
FROM musicbrainz.l_artist_url lau
JOIN musicbrainz.artist a ON lau.entity0 = a.id
JOIN musicbrainz.link l ON lau.link = l.id
JOIN musicbrainz.link_type lt ON l.link_type = lt.id
JOIN musicbrainz.url u ON lau.entity1 = u.id
WHERE a.gid = '650bf385-6f6d-4992-a3b9-779d144920a4'
ORDER BY lt.name;For the full entity model and table-by-table reference, see the MusicBrainz Database and Database Schema docs.
Pass --modules core,derived,… (comma-separated) to select which .tar.bz2 archives to download and import. Default is core; the others are opt-in. See available modules for the full list and what each contains.
Run uvx --from git+https://github.com/rafacm/musicbrainz-database-setup musicbrainz-database-setup --help (or uv run musicbrainz-database-setup --help from a checkout) to see the available commands and options at any time.
list-dumps: print the dated dump directories on the mirror.download: fetch the selected archives (SHA256-verified, resumable).schema create: fetchadmin/sql/*.sqlat--refand apply pre- and/or post-import DDL.import: stream TSVs from a local--dump-dirthroughCOPY FROM STDIN.run: end-to-end — pick or resolve a dump, download, pre-DDL, import, post-DDL,VACUUM ANALYZE.verify: printSCHEMA_SEQUENCE/REPLICATION_SEQUENCEfor each local archive.clean: remove cached downloads.
Pass -v / --verbose on any of these commands to surface the underlying DEBUG-level output (httpx HTTP requests and the rest of the chatter); the default transcript stays focused on high-level phase progress.
This project is under active development.
- Dump discovery — mirror listing,
LATESTresolution, and an interactive picker when--date/--latest/--dump-dirare omitted. - Resumable, SHA256-verified downloads.
- Upstream admin SQL fetched at a configurable git ref and applied via
psqlin the canonicaladmin/InitDb.plphase order. - Streaming TSV →
COPY FROM STDINwith per-archive and per-table progress bars, across all ten dump modules. - Idempotent reruns; pre-flights for required extensions, ICU support, and
psqlon the client.
See CHANGELOG.md for the full list of implemented features, fixes, implementation plans, feature documentation and session transcripts.
SCHEMA_SEQUENCEcross-check — compare the dump archive'sSCHEMA_SEQUENCEfile against thecurrent_schema_sequencevalue in the fetchedCreateTables.sqland fail hard on mismatch, so silently pointing the tool at an incompatible upstream--refcan't corrupt an import.--allow-schema-mismatchas an escape hatch.
For running the test suite, linting, or type-checking, install with the dev extra:
uv sync --extra dev
uv run pytest
uv run ruff check src tests
uv run mypy srcThis tool stands on the work of the MusicBrainz, MetaBrainz, and acoustid communities. It automates the steps documented across the MusicBrainz wiki and the upstream musicbrainz-server/admin Perl scripts into a single command you can point at any PostgreSQL connection — local Docker, managed (RDS, Cloud SQL), or anything else that speaks libpq.
See docs/README.md → References for the full list of upstream sources.
MIT — © 2026 Rafael Cordones.

