Skip to content

chore(ci): add security baseline — CodeQL, Dependency Review, LLM-input check, action pinning#185

Merged
sundargthb merged 3 commits into
strands-labs:mainfrom
sundargthb:chore/ci-security-baseline
May 21, 2026
Merged

chore(ci): add security baseline — CodeQL, Dependency Review, LLM-input check, action pinning#185
sundargthb merged 3 commits into
strands-labs:mainfrom
sundargthb:chore/ci-security-baseline

Conversation

@sundargthb
Copy link
Copy Markdown
Contributor

@sundargthb sundargthb commented May 20, 2026

Summary

Adds a minimal CI security baseline targeting this repo's actual risk surface:
LLM input flowing into subprocess / XML / paths, heavy optional-extras dependency
tree, and supply-chain risk on the PyPI publish path. Hard-fail gates are limited
to the things that genuinely warrant blocking a merge; everything else surfaces
in the Security tab or as PR annotations.

What's added

Workflow Catches Hard-fail?
CodeQL (Python, security-and-quality suite) Taint flows from LLM-supplied input into subprocess, MJCF/XML, file paths — the class of bug surfaced manually in PR #85 (MJCF sanitization gap) and PR #90 (subprocess injection, path traversal). PR + weekly schedule. No — informational, Security tab
Dependency Review New deps with known high/critical CVEs. Comment-on-failure. Yes (high/critical CVE on new dep)
LLM Input Safety Check Pattern scan for subprocess + f-string and name-variable-into-XML in agent-callable code. Inline PR annotations on suspect lines. No — annotation only
Dependabot Weekly grouped updates for the ML stack, sim stack, dev tools, AND the GitHub Actions used by these workflows. Grouping prevents 8-PR weekly churn from torch/transformers/etc. N/A (config)

Action references pinned to commit SHAs

All uses: references in existing workflows (test-lint.yml, pypi-publish-on-release.yml)
and the three new workflow files are now pinned to full 40-character commit SHAs,
with the version tag preserved as a trailing comment. pr-and-push.yml only
references a local reusable workflow so it needs no change.

Highest-impact pin: pypa/gh-action-pypi-publish. It uses release/v1 as a
moving release branch — exactly the supply-chain pattern that the
tj-actions/changed-files incident (March 2025) exploited. Pinning it to a
specific commit closes that risk on the PyPI publish path.

Doesn't pinning create maintenance burden?

No — Dependabot handles it. The package-ecosystem: github-actions block in
dependabot.yml means new action releases arrive as auto-generated PRs that
are already SHA-pinned to the new version. Manual SHA lookups happen once
(this PR); after that, every bump is a one-click Dependabot merge — same workflow
as the existing Python dependency updates.

Expected steady-state: 1-2 single-line Dependabot PRs per month for GitHub Actions,
each with a changelog link. No tag-tracking, no manual SHA copying.

AGENTS.md update

Adds a "Review Learnings (PR #92 - CI Security Baseline)" section capturing the
LLM-input-safety pattern (allowlist enumerable params, centralise in
validate_inputs(), reject shell metachars in paths, bind to 127.0.0.1) so
the autonomous coding pipeline starts writing safer code rather than waiting
for review. Also documents the action-pinning convention and what hard-fails
vs warns.

Repo settings to enable (admin action, not in this diff)

These live in Settings → Code security and analysis and need to be toggled
by a repo admin — they aren't configurable from a PR:

  • Secret scanning → Enable
  • Push protection → Enable (blocks pushes containing detected secrets)
  • Dependabot alerts → Enable
  • Dependabot security updates → Enable
  • Dependency graph → confirm Enabled (usually on by default for public repos)

Also in Settings → Actions → General → Workflow permissions:

  • Confirm "Read repository contents and packages permissions" is selected
    (not "Read and write"). Workflows in this PR elevate security-events: write
    / pull-requests: write only where needed.

The CodeQL and Dependency Review workflows in this PR will surface findings in
the Security tab once these are on.

Verification

  • CodeQL runs cleanly on this PR. Findings (if any) appear in the Security tab.
  • Dependency Review is a no-op on this PR since it adds no runtime deps.
  • LLM-input-safety annotates the files it scans; PR shows whether annotations fire.
  • Dependabot config is validated by GitHub on push.

Why this set, why now

The PRs landed in the v0.4.0 prep cycle (#85, #86, #90) each had at least one
real bug in the LLM-input-to-subprocess / LLM-input-to-XML class that was
caught in human review. CodeQL + the input-safety annotations are aimed at
that specific bug class. The action pinning closes a known supply-chain
threat model on the PyPI publish path. Dependency Review + Dependabot give
the maintainer visibility into the optional-extras dep tree which currently
has none.

@sundargthb sundargthb requested a review from cagataycali May 20, 2026 20:27
Copy link
Copy Markdown
Member

@cagataycali cagataycali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent work — exactly the CI hardening the repo needs before v0.4.0. Approving the design and content.

What I like

  1. Action SHA pinning — especially pypa/gh-action-pypi-publish on the publish path. That's the highest-risk vector and the most impactful single change.
  2. Dependabot grouping — the ML stack generates huge churn (torch, transformers, diffusers); grouping keeps it manageable.
  3. LLM-input-safety annotations — novel and directly targets the bug class we hit in PRs #85, #86, #90. Non-blocking is the right call since the pattern scan can't know context.
  4. AGENTS.md section — this is how you get an autonomous coding agent to stop repeating the same class of bug. Teaching the pattern via docs is high-leverage.

Minor note on the CodeQL failure

The Analyze (Python) check failed — this is expected if the repo hasn't had CodeQL's "default setup" disabled in Settings → Code security (having both default-setup and an advanced codeql.yml conflicts). A repo admin should:

  • Disable the default CodeQL setup in Settings if the advanced workflow is preferred, OR
  • Remove .github/workflows/codeql.yml and rely on the default setup.

The second commit (Remove CodeQL workflow) already tried to solve this — the third commit re-added it. Might need a quick clarification on which approach the maintainers prefer.

Verdict

✅ Design is sound, hard-fail gates are well-calibrated, no runtime code changes. The CodeQL config issue is a settings toggle, not a code problem.


🤖 AI agent review. Strands Agents. Feedback welcome!

Copy link
Copy Markdown
Member

@cagataycali cagataycali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough and well-scoped security baseline. The threat model is clearly articulated (LLM-input → subprocess/XML, supply-chain on PyPI publish, optional-extras dep tree), and each workflow targets exactly those risks without over-reaching.

Highlights:

  1. SHA-pinning + Dependabot maintenance strategy — the "pin once, auto-bump via Dependabot" pattern is the right answer to the maintenance burden concern. Most repos either don't pin (risky) or pin manually (unsustainable).

  2. LLM Input Safety Check as annotation-only — smart to keep this as a signal rather than a gate. The pattern (subprocess + f-string, name-variable-into-XML) is the exact bug class we've hit in PRs #85 and #90.

  3. Explicit "not added" section — prevents the inevitable "but why no Bandit/Trivy/Snyk" discussion. Shows the thought went into scoping, not just adding tools.

  4. AGENTS.md update — feeding the learnings back into the autonomous pipeline so the agent writes safer code from the start.

CI is green. Ship it.


🤖 AI agent response. Strands Agents. Feedback welcome!

@sundargthb sundargthb merged commit 0bc8460 into strands-labs:main May 21, 2026
3 of 4 checks passed
@sundargthb sundargthb deleted the chore/ci-security-baseline branch May 21, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants