refactor: simplify validator CLI by removing --files parameter by rysb-dev · Pull Request #11 · rysb-dev/agari

rysb-dev · 2026-02-03T03:04:09Z

The validator now progressively loads mjson files until it has enough valid hora events to meet the --samples target. This removes the confusion of needing to specify both --files and --samples where insufficient files would result in fewer samples than requested.

Usage is now simply:
uv run agari_validator.py /path/to/data --samples 5000

The validator now progressively loads mjson files until it has enough valid hora events to meet the --samples target. This removes the confusion of needing to specify both --files and --samples where insufficient files would result in fewer samples than requested. Usage is now simply: python agari_validator.py /path/to/data --samples 5000

Copilot

Pull request overview

Refactors the validator CLI to remove the --files parameter and instead progressively load .mjson files until enough valid hora events are collected to satisfy the --samples target.

Changes:

Removed --files CLI argument and associated file-sampling helper.
Shuffles discovered .mjson files and incrementally parses them until --samples valid hora events are collected (then samples down if needed).
Updates summary/returned stats to report files_processed instead of skipped invalid horas.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-04T00:43:14Z

+    print(
+        f"Processed {files_processed} files, extracted {len(all_horas)} valid hora events"
+    )

-    # Sample hora events
+    # Sample down if we got more than requested


If there aren’t enough valid hora events across the dataset to reach num_samples, the script will silently validate fewer than requested. Consider adding an explicit warning when len(all_horas) < num_samples after extraction so users understand the target could not be met.

Copilot · 2026-02-04T00:43:15Z

        except Exception as e:
+            files_processed += 1
            if verbose:


files_processed is incremented in both the success and exception paths, so it effectively counts files attempted rather than successfully parsed. The summary text (“Files processed”) and earlier message (“Processed X files…”) can be misleading; consider renaming to files_attempted or tracking separate files_parsed/files_failed counters.

Copilot · 2026-02-04T00:43:15Z

+    # Shuffle files for random sampling
+    if seed is not None:
+        random.seed(seed)
+    random.shuffle(all_files)


--seed is advertised as providing reproducibility, but Path.rglob() order is filesystem-dependent. With the current implementation, the same seed can still yield different file orders across OS/filesystems. Consider sorting all_files (e.g., all_files.sort()) before random.shuffle(all_files) so seeded runs are deterministic.

rysb-dev requested a review from Copilot February 4, 2026 00:39

Copilot started reviewing on behalf of rysb-dev February 4, 2026 00:40 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

rysb-dev merged commit 507e5c4 into main Feb 4, 2026
14 checks passed

rysb-dev deleted the refactor/simplify-validator-cli branch February 4, 2026 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: simplify validator CLI by removing --files parameter#11

refactor: simplify validator CLI by removing --files parameter#11
rysb-dev merged 1 commit intomainfrom
refactor/simplify-validator-cli

rysb-dev commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rysb-dev commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants