Add structured task playground sandbox and resilient model fallback for task runs#4
Conversation
Agent-Logs-Url: https://github.com/Grumpified-OGGVCT/hat_stack/sessions/5e1526c3-bd4e-48fc-a842-28b9c5065464 Co-authored-by: AccidentalJedi <92951150+AccidentalJedi@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Grumpified-OGGVCT/hat_stack/sessions/5e1526c3-bd4e-48fc-a842-28b9c5065464 Co-authored-by: AccidentalJedi <92951150+AccidentalJedi@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Grumpified-OGGVCT/hat_stack/sessions/5e1526c3-bd4e-48fc-a842-28b9c5065464 Co-authored-by: AccidentalJedi <92951150+AccidentalJedi@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Grumpified-OGGVCT/hat_stack/sessions/5e1526c3-bd4e-48fc-a842-28b9c5065464 Co-authored-by: AccidentalJedi <92951150+AccidentalJedi@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Grumpified-OGGVCT/hat_stack/sessions/5e1526c3-bd4e-48fc-a842-28b9c5065464 Co-authored-by: AccidentalJedi <92951150+AccidentalJedi@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Grumpified-OGGVCT/hat_stack/sessions/5e1526c3-bd4e-48fc-a842-28b9c5065464 Co-authored-by: AccidentalJedi <92951150+AccidentalJedi@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Grumpified-OGGVCT/hat_stack/sessions/5e1526c3-bd4e-48fc-a842-28b9c5065464 Co-authored-by: AccidentalJedi <92951150+AccidentalJedi@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a first-class “playground” sandbox structure for task runs (with manifests/indexes) and makes task execution more resilient by retrying comparable configured models when the primary model fails.
Changes:
- Introduces structured workspace preparation, safe file write constraints, per-run manifests, and workspace index generation for task runs.
- Adds model fallback sequencing across comparable configured tiers and records attempted models/fallback usage.
- Extends workflow + CLI plumbing to pass playground metadata and upload both per-run outputs and the full playground artifact.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/hats_task_runner.py | Implements playground workspace structure, safe output paths, run manifests/indexes, and model fallback logic. |
| scripts/hat | Adds --category/--genre/--project flags and forwards them in repository_dispatch payloads. |
| .github/workflows/hats-task.yml | Accepts playground metadata inputs, passes them into the task runner, and uploads output + playground artifacts. |
| README.md | Documents the new playground layout, artifact behavior, and fallback semantics. |
| FORK_SETUP.md | Updates task-mode examples and explains the new playground/fallback behavior for forks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def build_run_id(explicit_run_id: str | None = None) -> str: | ||
| """Build a deterministic run id for workspace storage.""" | ||
| if explicit_run_id: | ||
| return slugify_path_component(explicit_run_id, "run") | ||
|
|
||
| github_run_id = os.environ.get("GITHUB_RUN_ID", "").strip() | ||
| github_attempt = os.environ.get("GITHUB_RUN_ATTEMPT", "").strip() | ||
| if github_run_id: | ||
| attempt_suffix = f"-attempt-{github_attempt}" if github_attempt else "" | ||
| return f"run-{slugify_path_component(github_run_id, 'run')}{attempt_suffix}" | ||
|
|
||
| return time.strftime("run-%Y%m%d-%H%M%S", time.gmtime()) |
| return { | ||
| "schema_version": 1, | ||
| "created_at_utc": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()), | ||
| "status": "completed", | ||
| "task_type": task_result["task_type"], | ||
| "prompt": prompt, |
| runs = sorted(p for p in project_dir.iterdir() if p.is_dir()) | ||
| if not runs: | ||
| continue | ||
| latest = runs[-1] |
scripts/hats_task_runner.py
Outdated
| root_lines.append(f"| {category_name} | {project_count} |") | ||
| root_lines.append("") | ||
| root_lines.append( | ||
| "Folder layout: `playground/<category>/<genre>/<project>/<run-id>/`" |
.github/workflows/hats-task.yml
Outdated
| echo "category=${{ inputs.category }}" >> "$GITHUB_OUTPUT" | ||
| echo "genre=${{ inputs.genre }}" >> "$GITHUB_OUTPUT" | ||
| echo "project=${{ inputs.project }}" >> "$GITHUB_OUTPUT" | ||
| echo "workspace_root=${{ inputs.workspace_root }}" >> "$GITHUB_OUTPUT" |
🎩 Hats Team Review Report✅ Verdict: ALLOW (Risk Score: 0/100)Severity Summary
Hat Execution Summary
Pipeline Stats
|
|
@copilot apply changes based on the comments in this thread and address that hats workflow failures |
Agent-Logs-Url: https://github.com/Grumpified-OGGVCT/hat_stack/sessions/4d97ddbd-e0d9-4fe7-b2b0-42f70dc12d65 Co-authored-by: AccidentalJedi <92951150+AccidentalJedi@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Grumpified-OGGVCT/hat_stack/sessions/4d97ddbd-e0d9-4fe7-b2b0-42f70dc12d65 Co-authored-by: AccidentalJedi <92951150+AccidentalJedi@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Grumpified-OGGVCT/hat_stack/sessions/4d97ddbd-e0d9-4fe7-b2b0-42f70dc12d65 Co-authored-by: AccidentalJedi <92951150+AccidentalJedi@users.noreply.github.com>
Applied the review-thread fixes in |
Task mode only wrote outputs to a flat temp directory and had no first-class sandbox structure for organizing generated work. This change adds a managed playground workspace for task runs and makes task execution retry comparable configured models when the initial Ollama model fails.
Structured playground workspace
playground/<category>/<genre>/<project>/<run-id>/category,genre,project, andrun-idRun manifests and indexes
PLAYGROUND_MANIFEST.jsonwith task/source metadata, hats used, summary, status, and generated filesPLAYGROUND_INDEX.mdCATEGORY_INDEX.mdWorkflow and CLI plumbing
hats-task.ymlto accept playground metadata and pass source repo / PR / issue context into task runsscripts/hatwith:--category--genre--projectModel fallback behavior
Docs
Example:
hat task generate_code "Build a FastAPI auth module with JWT" \ --repo myorg/app \ --pr 42 \ --category code \ --genre api \ --project auth-serviceThis produces a structured run directory like: