Skip to content

Fix init_index to create shards in audio/ subdirectory#44

Merged
tlebryk merged 3 commits intofeat/shard-from-audiofrom
copilot/sub-pr-40-another-one
Feb 11, 2026
Merged

Fix init_index to create shards in audio/ subdirectory#44
tlebryk merged 3 commits intofeat/shard-from-audiofrom
copilot/sub-pr-40-another-one

Conversation

Copy link
Copy Markdown

Copilot AI commented Feb 10, 2026

list_all_shards() only discovers shards within subdirectories, not at the dataset root. When init_index=True, shards written directly to dataset_root resulted in an empty index.

Changes

  • Redirect shard output when init_index=True: If output_dir is not already named "audio", automatically write shards to output_dir/audio/ subdirectory
  • Preserve existing behavior: When output_dir already ends with "audio", write shards directly there (avoids nested audio/audio/)
  • Set dataset_root correctly: Compute before potential redirection to ensure key_mapping.json and index.sqlite3 are placed at the proper root level

Example

# Before: shards written to dataset_root, empty index created
shard_from_audio_dir("input/", "my_dataset/", init_index=True)
# my_dataset/audio-00000.wsds  ← not discoverable
# my_dataset/index.sqlite3      ← empty

# After: shards written to audio/ subdirectory, index populated correctly  
shard_from_audio_dir("input/", "my_dataset/", init_index=True)
# my_dataset/audio/audio-00000.wsds  ← discoverable
# my_dataset/index.sqlite3            ← populated

# Already-named audio/ dir works as before
shard_from_audio_dir("input/", "my_dataset/audio/", init_index=True)
# my_dataset/audio/audio-00000.wsds  ← no nested audio/audio/
# my_dataset/index.sqlite3

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: tlebryk <43556997+tlebryk@users.noreply.github.com>
Copilot AI changed the title [WIP] Address feedback on shard from audio implementation Fix init_index to create shards in audio/ subdirectory Feb 10, 2026
Copilot AI requested a review from tlebryk February 10, 2026 23:58
@tlebryk tlebryk marked this pull request as ready for review February 11, 2026 02:22
@tlebryk tlebryk merged commit 0917788 into feat/shard-from-audio Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants