Skip to content

Bug / Feature: Balanced Playlist Selection — equal representation regardless of playlist size #525

@maxlin1

Description

@maxlin1

Summary

When multiple playlists are selected, songs are drawn from a single merged pool with equal probability per song. This means large playlists dominate the game: a 290-song playlist combined with a 30-song playlist will produce ~91% of rounds from the large one and only ~9% from the small one. Players who selected the small playlist for variety will barely hear any of its songs.


Current Behavior (Code Analysis)

Songs are merged into a flat list with no playlist tracking

views.py — all songs from all playlists are concatenated into one list:

songs: list[dict] = []

for playlist_path in playlist_paths:
    playlist_data = json.loads(file_content)
    for song in playlist_data.get("songs", []):
        songs.append(song)   # ← flat merge, no playlist origin tracked

# Result: one big list passed to create_game(songs=songs)

playlist.pyPlaylistManager.get_next_song() picks randomly from the merged pool:

def get_next_song(self) -> dict | None:
    available = [
        s for s in self._songs
        if get_song_uri(s, self._provider) not in self._played_uris
    ]
    if not available:
        return None
    return random.choice(available)   # ← uniform random from ALL songs

Real-world impact with actual playlist sizes

Playlist Songs Share in combined pool
Cologne Carnival 🎭 290 43 %
80s Hits 208 31 %
100 Greatest Movie Themes 🎬 162 24 %
Gen Z Anthems 30 4 %
Combined (all 4) 690

→ In a 15-round game combined from all four: Gen Z gets ~0.6 songs on average, Cologne Carnival gets ~6.5.


Proposed Fix — Playlist-Aware Weighted Selection

Instead of picking from a flat pool, the PlaylistManager should:

  1. Track songs per source playlist
  2. Pick a playlist first (each playlist gets equal weight = 1/N), then pick a random unplayed song from that playlist
  3. Skip playlists that are exhausted

This guarantees that every selected playlist contributes equally to the game, regardless of size.

Implementation

views.py — tag each song with its source playlist before merging:

songs: list[dict] = []

for playlist_path in playlist_paths:
    playlist_data = json.loads(file_content)
    for song in playlist_data.get("songs", []):
        song = dict(song)
        song["_playlist_source"] = playlist_path  # ← track origin
        songs.append(song)

playlist.pyPlaylistManager — restructure to group songs by playlist:

class PlaylistManager:
    def __init__(self, songs: list[dict], provider: str = PROVIDER_DEFAULT) -> None:
        self._provider = provider
        self._played_uris: set[str] = set()

        # Group songs by source playlist
        from collections import defaultdict
        buckets: dict[str, list[dict]] = defaultdict(list)
        for song in songs:
            uri = get_song_uri(song, provider)
            if not uri:
                continue
            source = song.get("_playlist_source", "__default__")
            buckets[source].append(song)

        self._buckets: dict[str, list[dict]] = dict(buckets)
        self._single_pool = len(self._buckets) <= 1  # fallback: single playlist

        total = sum(len(v) for v in self._buckets.values())
        _LOGGER.info(
            "PlaylistManager: %d songs across %d playlist(s) for %s",
            total, len(self._buckets), provider,
        )

    def get_next_song(self) -> dict | None:
        if self._single_pool:
            return self._get_random_unplayed()

        # Balanced selection: pick a random non-exhausted playlist, then a song
        active_buckets = {
            k: [s for s in v if get_song_uri(s, self._provider) not in self._played_uris]
            for k, v in self._buckets.items()
        }
        active_buckets = {k: v for k, v in active_buckets.items() if v}

        if not active_buckets:
            return None  # all playlists exhausted

        # Equal weight per playlist regardless of size
        chosen_key = random.choice(list(active_buckets.keys()))  # noqa: S311
        song = random.choice(active_buckets[chosen_key])         # noqa: S311
        song_copy = song.copy()
        song_copy["_resolved_uri"] = get_song_uri(song, self._provider)
        return song_copy

    def _get_random_unplayed(self) -> dict | None:
        """Fallback: uniform random from single merged pool."""
        all_songs = [s for bucket in self._buckets.values() for s in bucket]
        available = [
            s for s in all_songs
            if get_song_uri(s, self._provider) not in self._played_uris
        ]
        if not available:
            return None
        song = random.choice(available)  # noqa: S311
        song_copy = song.copy()
        song_copy["_resolved_uri"] = get_song_uri(song, self._provider)
        return song_copy

Deduplication — How Duplicate Songs Are Prevented

Within a game session: _played_uris set (already exists)

The existing mark_played(uri) mechanism adds the resolved URI to a shared set[str]. In the proposed bucket-based selection, each bucket is filtered before picking:

active_buckets = {
    k: [s for s in v if get_song_uri(s, self._provider) not in self._played_uris]
    for k, v in self._buckets.items()
}

Because _played_uris is shared across all buckets, a song played from Bucket A is automatically excluded from Bucket B in the next round — even if that same song exists in both playlists. No song can play twice. ✅

Cross-playlist duplicates: 308 songs appear in multiple playlists

A scan of all bundled playlists reveals 308 songs that appear in 2 or more playlists (e.g. a song in both "Disco & Funk Classics" and "80s Hits"). With the naive bucket approach, such a song would exist in two buckets and therefore have a proportionally higher chance of being selected — it can be reached from either bucket.

Fix: deduplicate at PlaylistManager init time by URI

def __init__(self, songs: list[dict], provider: str = PROVIDER_DEFAULT) -> None:
    self._provider = provider
    self._played_uris: set[str] = set()

    seen_uris: set[str] = set()          # ← global dedup across all playlists
    buckets: dict[str, list[dict]] = defaultdict(list)

    for song in songs:
        uri = get_song_uri(song, provider)
        if not uri:
            continue
        if uri in seen_uris:
            continue                     # ← skip: already in another bucket
        seen_uris.add(uri)
        source = song.get("_playlist_source", "__default__")
        buckets[source].append(song)

    self._buckets = dict(buckets)

Result: each unique URI appears in exactly one bucket (the first playlist that contained it). The 308 cross-playlist duplicates are silently dropped from secondary playlists. Every song plays at most once. ✅

Alternative dedup strategy: instead of "first playlist wins", assign each duplicate to the playlist where it is most "at home" (e.g. by playlist name matching the song's genre tag). But "first wins" is simple, deterministic, and sufficient.


Edge Cases

Scenario Behavior
Single playlist selected Falls back to existing uniform random (unchanged)
One playlist exhausted mid-game Remaining rounds drawn from other playlists
All playlists exhausted get_next_song() returns None → game ends (existing behavior)
Combined with num_rounds limit Works transparently — balanced selection applies to however many rounds are played

Optional Enhancement — Admin UI: Playlist Balance Mode

For power users, an optional toggle could expose the behavior:

Playlist Mix:
  ● Balanced  — equal rounds per playlist (proposed default)
  ○ Random    — current behavior, proportional to playlist size

This gives users who explicitly want more songs from a large playlist the option to opt out.


Affected Files

File Change
server/views.py Tag each song with _playlist_source before merging
game/playlist.py PlaylistManager.__init__: build per-playlist buckets; get_next_song(): playlist-first selection
www/admin.html (optional) Balanced/Random toggle
www/js/admin.js (optional) Send balanced_playlists flag in startGame() payload

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions