Visualizing relationships between music subgenres from Discogs.
- Discogs categorizes releases and artists into 15 genres and 760 styles (subgenres).
- Many subgenres co-occur in the same release—similar to how words co-occur in sentences.
- Example: Post-Rock and Shoegaze often appear together on the same album, showing that these genres are closely related.
- Can we capture these relationships in a vector space?
- Represent 760 subgenres as low-dimensional vectors using:
- Word2Vec (treating co-occurring styles like co-occurring words)
- Node2Vec (treating styles as nodes in a co-occurrence graph)
- Visualize embeddings with dimensionality reduction (UMAP) in 2D and 3D.
| Method | 2D | 3D |
|---|---|---|
| Word2Vec | View | View |
| Node2Vec | View | View |
- Discogs monthly data dump: https://data.discogs.com/
- Current version: 2026-01-01
- Note:
Word2Vec.pyandcounter.pyuseraw_data/discogs_20260101_masters.xml.gz.This file is not included due to its large size.