diff --git a/README.md b/README.md index 708e9f2..04257bc 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ APTS is not a testing methodology. It complements PTES, OWASP WSTG, and OSSTMM b - **Tier 2 (Verified)**: 85 additional (157 cumulative). Full transparency, tamper-proof audit trails, and independently verifiable findings. - **Tier 3 (Comprehensive)**: 16 additional (173 cumulative). Highest assurance for critical infrastructure and L4 autonomous operations. -Eighteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance. +Nineteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance. APTS has no certification body, no mandatory third-party audit, and no fee. Platforms are assessed against the requirements and conformance is documented. The standard does not prescribe who performs the assessment; internal self-assessment, independent internal review, and external third-party assessment are all valid approaches, and the choice is left to the reader. diff --git a/index.md b/index.md index ca8d883..3810740 100644 --- a/index.md +++ b/index.md @@ -45,7 +45,7 @@ APTS is not a testing methodology. It complements PTES, OWASP WSTG, and OSSTMM b - **Tier 2 (Verified)**: 85 additional (157 cumulative). Full transparency, tamper-proof audit trails, and independently verifiable findings. - **Tier 3 (Comprehensive)**: 16 additional (173 cumulative). Highest assurance for critical infrastructure and L4 autonomous operations. -Eighteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance. +Nineteen additional advisory practices live exclusively in the [Advisory Requirements appendix](./standard/appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern. Advisory practices are not counted toward any tier and do not affect conformance. APTS has no certification body, no mandatory third-party audit, and no fee. Platforms are assessed against the requirements and conformance is documented. The standard does not prescribe who performs the assessment; internal self-assessment, independent internal review, and external third-party assessment are all valid approaches, and the choice is left to the reader. diff --git a/standard/2_Safety_Controls/README.md b/standard/2_Safety_Controls/README.md index 5b19472..5c4fb78 100644 --- a/standard/2_Safety_Controls/README.md +++ b/standard/2_Safety_Controls/README.md @@ -52,7 +52,7 @@ The 20 requirements in this domain fall into seven thematic groups: A platform claims conformance with this domain by implementing every MUST requirement assigned to the compliance tier it targets and to all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim (see the [Conformance Claim Template](../appendix/Conformance_Claim_Template.md)). An unimplemented MUST requirement or an undocumented SHOULD deviation is a conformance gap. APTS defines three cumulative compliance tiers (Tier 1 Foundation, Tier 2 Verified, Tier 3 Comprehensive) in the [Introduction](../Introduction.md); a Tier 2 platform satisfies every Tier 1 SC requirement plus every Tier 2 SC requirement, and a Tier 3 platform satisfies all three tiers. -Three appendix-only advisory practices for this domain (APTS-SC-A01 Platform Health Monitoring and Anomaly Detection, APTS-SC-A02 Context Window Safety and Constraint Preservation, and APTS-SC-A03 Tool Invocation Parameter and Chaining Governance) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. +Four appendix-only advisory practices for this domain (APTS-SC-A01 Platform Health Monitoring and Anomaly Detection, APTS-SC-A02 Context Window Safety and Constraint Preservation, APTS-SC-A03 Tool Invocation Parameter and Chaining Governance, and APTS-SC-A04 Inference Spend and Compute Budget Containment) are documented in the [Advisory Requirements appendix](../appendix/Advisory_Requirements.md). They are not required for conformance at any tier. Every requirement in this domain includes a Verification subsection listing the verification procedures a reviewer uses to confirm implementation. diff --git a/standard/Frontispiece.md b/standard/Frontispiece.md index 70519bc..48fd5f2 100644 --- a/standard/Frontispiece.md +++ b/standard/Frontispiece.md @@ -74,5 +74,5 @@ Licensed under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/). | Version | Date | Notes | |---------|------|-------| -| 0.1.0 | April 2026 | Initial release. Eight domains, 173 tier-required requirements across three compliance tiers, plus 18 advisory practices in the appendix. | +| 0.1.0 | April 2026 | Initial release. Eight domains, 173 tier-required requirements across three compliance tiers, plus 19 advisory practices in the appendix. | diff --git a/standard/Getting_Started.md b/standard/Getting_Started.md index b7b1da7..e3c65fe 100644 --- a/standard/Getting_Started.md +++ b/standard/Getting_Started.md @@ -94,7 +94,7 @@ Depending on your role: ## Common Questions **Q: Do I need to implement all 173 requirements?** -No. Start with Tier 1 (72 requirements). Tier 2 and Tier 3 add requirements progressively for cumulative totals of 157 and 173. An additional 18 advisory practices live in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern; advisory practices are not required for conformance at any tier. See [Introduction: Compliance Tiers](Introduction.md#compliance-tiers) for details. +No. Start with Tier 1 (72 requirements). Tier 2 and Tier 3 add requirements progressively for cumulative totals of 157 and 173. An additional 19 advisory practices live in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md) under the `APTS--A0x` identifier pattern; advisory practices are not required for conformance at any tier. See [Introduction: Compliance Tiers](Introduction.md#compliance-tiers) for details. **Q: What if my platform meets most but not all Tier 1 requirements?** APTS does not award partial credit. A tier claim requires every MUST requirement at the claimed tier and all lower tiers to be implemented, with no deviation. Every SHOULD requirement at those tiers must be either implemented or covered by a documented justification in the conformance claim. Address MUST gaps before claiming a tier. diff --git a/standard/Introduction.md b/standard/Introduction.md index 00af415..9c8ebea 100644 --- a/standard/Introduction.md +++ b/standard/Introduction.md @@ -44,7 +44,7 @@ APTS does not prescribe who performs the assessment. The choice of internal self | 7 | Third-Party & Supply Chain Trust | TP | 22 | AI providers, cloud dependencies, data handling, foundation model disclosure | | 8 | Reporting | RP | 15 | Finding validation, confidence scoring, coverage disclosure | -**Total: 173 tier-required requirements** (Tier 1 + Tier 2 + Tier 3) across the eight domains. An additional **18 advisory practices** live exclusively in the [Advisory Requirements](appendix/Advisory_Requirements.md) appendix using the `APTS--A0x` identifier pattern; advisory practices are not counted toward any tier and do not affect conformance. +**Total: 173 tier-required requirements** (Tier 1 + Tier 2 + Tier 3) across the eight domains. An additional **19 advisory practices** live exclusively in the [Advisory Requirements](appendix/Advisory_Requirements.md) appendix using the `APTS--A0x` identifier pattern; advisory practices are not counted toward any tier and do not affect conformance. --- diff --git a/standard/README.md b/standard/README.md index 1ea3ae1..eb8312c 100644 --- a/standard/README.md +++ b/standard/README.md @@ -1,6 +1,6 @@ # OWASP Autonomous Penetration Testing Standard -This is the full OWASP Autonomous Penetration Testing Standard. It defines 173 tier-required requirements across 8 domains (plus 18 advisory practices in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md)) that autonomous penetration testing platforms must meet to operate safely, transparently, and within defined boundaries, whether delivered by vendors, operated as a service, or built in-house by enterprise security teams. +This is the full OWASP Autonomous Penetration Testing Standard. It defines 173 tier-required requirements across 8 domains (plus 19 advisory practices in the [Advisory Requirements appendix](appendix/Advisory_Requirements.md)) that autonomous penetration testing platforms must meet to operate safely, transparently, and within defined boundaries, whether delivered by vendors, operated as a service, or built in-house by enterprise security teams. ## Getting Started diff --git a/standard/appendix/Advisory_Requirements.md b/standard/appendix/Advisory_Requirements.md index f9ff84e..9c06b01 100644 --- a/standard/appendix/Advisory_Requirements.md +++ b/standard/appendix/Advisory_Requirements.md @@ -141,6 +141,28 @@ Platforms operating with multi-step agents should also define a bounded set of d --- +### APTS-SC-A04: Inference Spend and Compute Budget Containment (Advisory) + +**Applicability:** This practice applies to platforms whose agent consumes metered compute or inference (model API tokens, GPU time, or per-call tool and service costs) during an engagement. + +**Rationale:** APTS contains the agent along several axes but not its own compute consumption. SC-004 limits traffic to the target (connection, bandwidth, and payload constraints), SC-011 terminates on host resource exhaustion (CPU, memory), SC-007 halts on cumulative risk, and SC-013 halts on wall-clock duration. None treats inference or compute spend as a quantity to monitor or a condition to halt on. The closest reference, TP-008, raises a cloud billing alert, but only as a signal of account compromise, and it alerts rather than stops the agent. The unaddressed case is a runaway agent: a planning loop, a retry storm, or a degenerate tool-call chain that consumes tokens or compute far beyond the engagement's intended envelope, with no spend ceiling to arrest it. This is a containment concern rather than a cost-management nicety, because uncontrolled consumption is one observable signature of an agent operating outside its mandate. + +**Value:** A spend ceiling that halts the agent turns an open-ended runaway into a bounded, reviewable event, and burn-rate anomaly detection surfaces degenerate behavior early, before it exhausts a budget or masks a deeper fault. + +**Practice Description:** + +Treat per-engagement inference and compute spend as a first-class containment quantity: + +1. **Set a spend ceiling that halts.** Define a per-engagement budget for inference and metered compute, and halt the agent through the existing kill-switch and termination path (APTS-SC-009, APTS-SC-011) when the ceiling is reached, rather than only logging the overage. +2. **Monitor burn rate, not just totals.** Track consumption rate against an expected envelope and treat an abnormal spike (for example, a retry or planning loop) as an escalation signal, since the rate anomaly precedes budget exhaustion. +3. **Record spend in the audit trail.** Log per-engagement consumption and any spend-triggered halt alongside the other termination conditions, so a halt and its cause are reconstructable. + +**Recommendation:** Start with a hard per-engagement ceiling and a simple rate threshold on the most expensive resource (usually model tokens), wired into the existing halt path rather than a separate mechanism. Treat a spend-triggered halt like any other automated termination for review purposes, and treat repeated spend halts as a reason to investigate the agent's planning behavior, since a runtime that burns its budget on degenerate loops is behaving outside its mandate (APTS-MR-023). + +**Related normative requirements:** APTS-SC-004, APTS-SC-007, APTS-SC-009, APTS-SC-011, APTS-AR-003, APTS-TP-008. + +--- + ### APTS-MR-A03: Multi-Turn Adversarial Conversation Resilience (Advisory) **Applicability:** This practice applies to platforms that use LLM-based or agentic runtimes with conversational state spanning multiple turns, tool calls, or planning steps. diff --git a/standard/appendix/Glossary.md b/standard/appendix/Glossary.md index 999bc1b..fe1ac59 100644 --- a/standard/appendix/Glossary.md +++ b/standard/appendix/Glossary.md @@ -82,7 +82,7 @@ Notation for specifying IP address ranges using a base address and prefix length Alternative security measures that mitigate vulnerability when the primary control is missing. Example: Two-factor authentication compensates for weak passwords. **Compliance Tier** -One of three progressive levels of APTS conformance. Tier 1 (Foundation) requires 72 core requirements (MUST | Tier 1). Tier 2 (Verified) adds 85 requirements for a cumulative 157 (MUST | Tier 2 + SHOULD | Tier 2). Tier 3 (Comprehensive) adds 16 requirements for a cumulative 173 (MUST | Tier 3 + SHOULD | Tier 3). A platform claims a tier by implementing every MUST requirement at that tier and all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim. An additional 18 advisory practices in the Advisory Requirements appendix are recommended for highest-assurance engagements but are not counted toward any tier. +One of three progressive levels of APTS conformance. Tier 1 (Foundation) requires 72 core requirements (MUST | Tier 1). Tier 2 (Verified) adds 85 requirements for a cumulative 157 (MUST | Tier 2 + SHOULD | Tier 2). Tier 3 (Comprehensive) adds 16 requirements for a cumulative 173 (MUST | Tier 3 + SHOULD | Tier 3). A platform claims a tier by implementing every MUST requirement at that tier and all lower tiers, with no deviation, and by either implementing every SHOULD requirement at those tiers or recording a documented justification for each deviation in its conformance claim. An additional 19 advisory practices in the Advisory Requirements appendix are recommended for highest-assurance engagements but are not counted toward any tier. **Confidence Score** A numeric value on a 0-100% scale indicating the platform's certainty in a scope boundary determination, target legitimacy assessment, asset classification, or finding validity. Scores below 75% for scope-related decisions trigger mandatory human escalation. See APTS-HO-013, APTS-RP-003. diff --git a/standard/appendix/Vendor_Evaluation_Guide.md b/standard/appendix/Vendor_Evaluation_Guide.md index 5da39e6..a8d9eb5 100644 --- a/standard/appendix/Vendor_Evaluation_Guide.md +++ b/standard/appendix/Vendor_Evaluation_Guide.md @@ -14,7 +14,7 @@ Decide your minimum compliance tier based on your risk tolerance: - **Tier 2 (Verified):** 157 cumulative requirements (72 + 85). The platform is fully transparent about what it did and why, protects your data with tamper-proof audit trails, handles incidents with formal response procedures, and provides independently verifiable findings. **Choose Tier 2 when:** you are testing production environments, operating in regulated industries, or need full accountability for audit or compliance purposes. This is the recommended minimum for most production deployments. -- **Tier 3 (Comprehensive):** 173 cumulative requirements (157 + 16). The platform meets the highest assurance bar for critical infrastructure, fully autonomous (L4) operations, and the strictest regulatory requirements. **Choose Tier 3 when:** you are deploying fully autonomous testing against critical infrastructure, financial systems, or healthcare environments with minimal human oversight. An additional 18 advisory practices in the [Advisory Requirements appendix](Advisory_Requirements.md) are recommended for highest-assurance engagements but are not counted toward any tier. +- **Tier 3 (Comprehensive):** 173 cumulative requirements (157 + 16). The platform meets the highest assurance bar for critical infrastructure, fully autonomous (L4) operations, and the strictest regulatory requirements. **Choose Tier 3 when:** you are deploying fully autonomous testing against critical infrastructure, financial systems, or healthcare environments with minimal human oversight. An additional 19 advisory practices in the [Advisory Requirements appendix](Advisory_Requirements.md) are recommended for highest-assurance engagements but are not counted toward any tier. > **Minimum tier guidance:** Tier 1 is appropriate for supervised testing of non-critical systems in non-regulated environments. Organizations in financial services, healthcare, critical infrastructure, or any regulated industry SHOULD require Tier 2 as a minimum. Tier 3 is recommended for critical infrastructure, fully autonomous (L4) operations, and environments with the strictest regulatory requirements. diff --git a/standard/apts_requirements.json b/standard/apts_requirements.json index 3bbb32d..8a53077 100644 --- a/standard/apts_requirements.json +++ b/standard/apts_requirements.json @@ -1,7 +1,7 @@ { "version": "0.1.0", "source": "OWASP Autonomous Penetration Testing Standard", - "last_updated": "2026-04-26T10:41:06Z", + "last_updated": "2026-06-17T14:08:37Z", "requirements": [ { "id": "APTS-SE-001",