Skip to content

Add pyspark-test-runner asset (v1.10.0)#14

Merged
vmariiechko merged 2 commits into
mainfrom
feature/pyspark-test-runner-asset
Jun 20, 2026
Merged

Add pyspark-test-runner asset (v1.10.0)#14
vmariiechko merged 2 commits into
mainfrom
feature/pyspark-test-runner-asset

Conversation

@vmariiechko

Copy link
Copy Markdown
Owner

Related Issue

N/A (no tracking issue)

Summary

Adds the pyspark-test-runner asset to the Asset Library and cuts the v1.10.0 release. The asset is a single-file Python wrapper around pytest for local PySpark suites: it runs pytest, writes the full output to a log file, and prints only a bounded, agent-friendly digest built from the JUnit XML. When many tests fail with the same error, the digest collapses them into one signature block instead of flooding a coding agent's context window.

Changes

  • New asset assets/pyspark-test-runner/ (schema + README + template/{{.target_dir}}/skills/pyspark-test-runner/ with SKILL.md and scripts/run-pyspark-tests.py), mirroring the dbx-ro-query skill+script layout. Single target_dir prompt (default .agents).
  • Digest is driven by the JUnit XML rather than scraped stdout, so heavy Spark/JVM stderr cannot corrupt it. Output is bounded on every axis: capped failing-tests list, top signatures, head+tail traceback trim, and a hard total-output backstop. Failures are deduplicated by a normalized signature.
  • Edge cases handled: collection/import errors (exit 2), no tests collected (exit 5), other exit codes mapped to a readable result, hung runs killed after --timeout-sec with a process-tree reap guard, and malformed/missing JUnit XML falling back to a bounded log tail.
  • Per-asset tests at tests/assets/test_pyspark_test_runner.py (JUnit parsing, node-id reconstruction, signature dedup, excerpt trim, digest budget, interpreter/log-dir resolution) and test config tests/configs/assets/pyspark_test_runner.json.
  • ASSETS.md catalog and ROADMAP.md "Shipped" list updated.
  • Release cut to 1.10.0: CHANGELOG.md finalized and both version markers bumped.

Change Area

  • Asset Library (assets/<name>/)

Configuration Axes Affected

  • Template schema (databricks_template_schema.json)
  • Asset Library (new asset, asset schema, or framework changes)

Testing

  • All tests pass (pytest tests/ -V) — 2415 passed, 163 skipped
  • New tests added for new functionality (if applicable)

Additionally validated against a real PySpark suite (PySpark 4.1.2, OpenJDK 17): all-pass, a repetitive-failure flood (797 raw pytest lines collapsed to a ~70-line digest with one signature block), collection/import error, no-tests, timeout (clean process-tree reap, no orphaned JVM workers), and a long Py4J traceback (head+tail trimmed).

Asset Changes (if applicable)

  • Asset installs standalone via databricks bundle init . --template-dir assets/<name> --output-dir <dir>
  • Asset is self-contained (no references to library/helpers.tmpl or other assets)
  • tests/configs/assets/<name>.json added
  • Asset appears in ASSETS.md catalog

Release (if this PR cuts a release)

  • CHANGELOG.md finalized: [Unreleased] renamed to [1.10.0] - 2026-06-20, fresh empty [Unreleased] added above
  • Both version markers bumped to 1.10.0 (pyproject.toml and template/{{.project_name}}/bundle_init_config.json.tmpl)
  • Version guard test passes (tests/test_release_metadata.py)

Checklist

  • Go template syntax is valid (no unclosed {{ }} blocks)
  • No .tmpl files appear in generated output
  • Generated YAML files are valid
  • Documentation updated (if behavior changed)

Single-file Python wrapper around pytest for local PySpark suites,
packaged as an agentskills.io-style skill (SKILL.md + the runner script)
mirroring dbx-ro-query. The wrapper runs pytest, writes full output to a
log file, and prints only a bounded digest built from the JUnit XML:
result, exit code, counts, runnable failing node ids, and failures
deduplicated by a normalized signature so many tests failing with one
cause collapse to a single block. Output is bounded on every axis
(failing-list cap, top signatures, head+tail excerpt trim, hard total
backstop), and it handles collection/import errors, no-tests, timeouts
(with a process-tree reap guard), and malformed or missing XML.

Includes per-asset tests (JUnit parsing, node-id reconstruction,
signature dedup, excerpt trim, digest budget, interpreter and log-dir
resolution), the test config, and ASSETS.md plus ROADMAP.md catalog
entries.
Finalize the changelog for the v1.10.0 release: rename [Unreleased] to
[1.10.0] - 2026-06-20 and add a fresh empty [Unreleased]. Bump both
version markers (pyproject.toml version and the generated bundle's
_template_version) to 1.10.0 so they agree with the changelog, per the
release-metadata guard test.
@vmariiechko vmariiechko merged commit e24f33f into main Jun 20, 2026
1 check passed
@vmariiechko vmariiechko deleted the feature/pyspark-test-runner-asset branch June 20, 2026 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant