Skip to content

Comments

Refactor Classification Model_I and Model_II dataset loading for portability and deterministic class mapping#139

Open
Panchadip-128 wants to merge 3 commits intoML4SCI:mainfrom
Panchadip-128:refactor-classification-dataloader
Open

Refactor Classification Model_I and Model_II dataset loading for portability and deterministic class mapping#139
Panchadip-128 wants to merge 3 commits intoML4SCI:mainfrom
Panchadip-128:refactor-classification-dataloader

Conversation

@Panchadip-128
Copy link

Summary

This PR refactors the dataset loading logic in the Classification pipeline (Model_I and Model_II) to improve portability, reproducibility, and contributor usability.

Motivation

Previously:

  • Dataset paths depended on user-specific directory structures.
  • glob usage was loosely defined.
  • Class index mapping relied on dictionary iteration order.
  • No validation existed for empty dataset directories.

These issues reduced portability and made onboarding difficult for new contributors.

Changes

  • Standardized dataset structure to:
    root_dir/class_name/*.npy

  • Replaced ambiguous glob usage with:
    os.path.join(root_dir, "", ".npy")

  • Added validation to raise a clear error if no .npy files are found.

  • Made class mapping deterministic using sorted class names to ensure consistent label indices across runs.

  • Improved path handling using os.path utilities for cross-platform compatibility.

Impact

  • No changes to model architecture
  • No changes to training logic
  • No changes to transform behavior
  • No changes to output format

This is purely a structural and usability improvement.

Expected Dataset Structure

Example:

data/
Model_I/
axion/
cdm/
no_sub/
Model_I_test/
axion/
cdm/
no_sub/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant