diff --git a/docs/security/README.md b/docs/security/README.md new file mode 100644 index 0000000000..e68c9b1a77 --- /dev/null +++ b/docs/security/README.md @@ -0,0 +1,321 @@ +# ToolHive Security Documentation + +This directory contains comprehensive security documentation for the ToolHive platform, including threat models, attack trees, and security best practices. + +## ๐Ÿ“š Documentation Index + +### [Attack Tree](./attack-tree.md) +Visual representation of potential attack vectors against ToolHive across all deployment modes. Includes: +- **Attack chains** showing step-by-step compromise paths +- **Risk classifications** (High/Medium/Low) for each attack vector +- **Cost estimates** for attacker effort and prerequisites +- **Threat actor profiles** from script kiddies to nation-state actors +- **Key attack chains** with detailed mitigation strategies + +**Use this when**: Planning defense-in-depth strategies, prioritizing security investments, or assessing threat exposure. + +### [Threat Model](./threat-model.md) +STRIDE-based threat analysis of all ToolHive components. Includes: +- **Data flow diagrams** (DFDs) for Local, Kubernetes, and Remote MCP scenarios +- **STRIDE analysis** (Spoofing, Tampering, Repudiation, Information Disclosure, DoS, Privilege Escalation) +- **Critical asset inventory** with sensitivity classifications +- **Trust boundaries** between system components +- **Top 10 critical threats** with immediate action items +- **Security control recommendations** for each component + +**Use this when**: Designing new features, reviewing architectural changes, or conducting security assessments. + +## ๐ŸŽฏ Quick Reference + +### Critical Security Assets (Priority Order) + +| Asset | Location | Protection Mechanism | +|-------|----------|---------------------| +| 1. **Secrets (API Keys, Tokens)** | OS Keyring, K8s Secrets | AES-256-GCM, RBAC | +| 2. **Container Runtime Socket** | `/var/run/docker.sock` | Socket authentication, rootless mode | +| 3. **OAuth Access Tokens** | Memory, optional cache | PKCE, short TTL, HTTPS | +| 4. **JWT Signing Keys** | Config files, K8s Secrets | Strong algorithms (RS256/ES256), rotation | +| 5. **etcd Cluster** | Kubernetes control plane | Encryption at rest, network isolation | + +### Top 5 Attack Vectors to Mitigate First + +1. **Secrets Exposure** (Attack Tree: `SECRETS_LOCAL`, `K8S_SECRETS`) + - Implement file permissions 0600 on encrypted secrets + - Enable etcd encryption at rest + - Audit secret access patterns + +2. **Container Runtime Abuse** (Attack Tree: `DOCKER_SOCKET`) + - Never mount Docker socket into containers + - Use rootless containers where possible + - Implement runtime authentication + +3. **Privilege Escalation** (Threat Model: Elevation of Privilege) + - Drop all container capabilities by default + - Enforce Pod Security Standards in Kubernetes + - Never allow privileged containers + +4. **Supply Chain Compromise** (Attack Tree: `SUPPLY`) + - Implement image signature verification + - Scan dependencies and images regularly + - Use registry allow-lists + +5. **Authentication Bypass** (Threat Model: Spoofing) + - Enforce strong JWT signing (RS256/ES256) + - Implement PKCE for all OAuth flows + - Validate issuer, audience, and signature + +## ๐Ÿ›ก๏ธ Security by Deployment Mode + +### Local Deployment (CLI/Desktop) + +**Primary Threats:** +- Local secret theft from OS keyring +- Container escape via Docker socket +- Electron vulnerabilities in Desktop UI +- Process memory extraction + +**Key Mitigations:** +- Enable OS keyring encryption +- Use rootless container runtime +- Keep Electron framework updated +- Implement code signing for binaries + +**Documentation**: See [Threat Model ยง4.1, ยง4.2, ยง4.6](./threat-model.md) + +### Kubernetes Deployment (Operator) + +**Primary Threats:** +- RBAC misconfiguration allowing secret access +- CRD injection to deploy malicious workloads +- etcd direct access +- Operator privilege escalation + +**Key Mitigations:** +- Implement least-privilege RBAC +- Enable admission webhooks +- Encrypt etcd at rest +- Use namespace isolation + +**Documentation**: See [Threat Model ยง4.3, ยง4.4, ยง4.7](./threat-model.md) + +### Remote MCP Servers + +**Primary Threats:** +- OAuth/OIDC flow compromise +- PKCE bypass leading to session hijacking +- Man-in-the-middle attacks +- Token theft and replay + +**Key Mitigations:** +- Enforce PKCE mandatory +- Implement certificate pinning +- Use short-lived tokens +- Validate issuer and audience + +**Documentation**: See [Threat Model ยง4.10](./threat-model.md), [Remote MCP Authentication](../remote-mcp-authentication.md) + +## ๐Ÿ” Security Review Checklist + +Use this checklist when reviewing pull requests or new features: + +### Authentication & Authorization +- [ ] JWT tokens use strong algorithms (RS256/ES256, not HS256) +- [ ] OAuth flows enforce PKCE +- [ ] Cedar policies follow least-privilege principle +- [ ] User inputs are validated before authentication checks +- [ ] Token expiry times are reasonable (access: 15m, refresh: 7d) + +### Secrets Management +- [ ] Secrets never hardcoded in code or configs +- [ ] Secrets referenced by name, not embedded +- [ ] Secrets redacted in all logs +- [ ] File permissions 0600 on secret storage +- [ ] K8s secrets use SecretKeyRef, not direct values + +### Container Security +- [ ] Containers run as non-root user +- [ ] All capabilities dropped, only required ones added +- [ ] No privileged containers allowed +- [ ] Resource limits (CPU, memory, PID) specified +- [ ] Network isolation enabled for untrusted workloads +- [ ] Volume mounts are read-only where possible + +### Network Security +- [ ] All external connections use HTTPS/TLS +- [ ] Certificate validation enabled (no InsecureSkipVerify) +- [ ] Egress proxy enforces allow-list for isolated workloads +- [ ] No sensitive data in URLs or query parameters +- [ ] Rate limiting implemented on public endpoints + +### Input Validation +- [ ] All user inputs validated against allow-list +- [ ] Path traversal checks for file operations +- [ ] Command injection prevention (no shell=true) +- [ ] JSON/YAML parsing uses safe libraries +- [ ] Maximum size limits on inputs + +### Kubernetes +- [ ] RBAC follows least-privilege (no cluster-admin) +- [ ] Admission webhooks validate CRDs +- [ ] Pod Security Standards enforced +- [ ] Network policies restrict pod-to-pod traffic +- [ ] Secrets mounted as volumes, not environment variables + +### Audit & Monitoring +- [ ] Security-relevant events logged (auth, authz, secret access) +- [ ] Logs include correlation IDs for tracing +- [ ] No sensitive data in logs (credentials, PII, tokens) +- [ ] Distributed tracing enabled (OpenTelemetry) +- [ ] Alerts configured for security events + +## ๐Ÿ“Š Risk Assessment Matrix + +| Likelihood โ†’
Impact โ†“ | Low | Medium | High | +|---------------------------|-----|--------|------| +| **Critical** | Medium Risk | High Risk | **Critical Risk** | +| **High** | Low Risk | Medium Risk | High Risk | +| **Medium** | Low Risk | Low Risk | Medium Risk | +| **Low** | Acceptable | Low Risk | Low Risk | + +### Risk Categories +- **Critical Risk**: Immediate action required, security incident likely +- **High Risk**: Address within current sprint, significant threat +- **Medium Risk**: Address within quarter, moderate threat +- **Low Risk**: Address as time permits, minimal threat +- **Acceptable**: No action needed, acceptable risk level + +## ๐Ÿ” Security Best Practices + +### For Developers + +1. **Never commit secrets** to version control + - Use `.gitignore` for config files with secrets + - Scan commits with tools like `git-secrets` or `trufflehog` + +2. **Validate all inputs** at the earliest possible point + - Reject invalid inputs, don't try to sanitize + - Use allow-lists, not deny-lists + +3. **Fail securely** when errors occur + - Default to deny access on error + - Log security-relevant errors + - Don't expose internal details in error messages + +4. **Use security linters** + - `gosec` for Go code + - `bandit` for Python code + - `eslint-plugin-security` for JavaScript + +5. **Write security tests** + - Test authentication bypass scenarios + - Test authorization with different user roles + - Test input validation with fuzzing + +### For Operators + +1. **Keep systems patched** + - Enable Dependabot/Renovate for dependencies + - Subscribe to security mailing lists + - Test patches in staging before production + +2. **Monitor security events** + - Set up alerts for failed authentication + - Monitor for unusual secret access patterns + - Track container escape attempts + +3. **Practice principle of least privilege** + - Grant minimum required RBAC permissions + - Use namespace isolation + - Regular access reviews + +4. **Backup and disaster recovery** + - Regular backups of secrets and configs + - Test restore procedures + - Document incident response plan + +5. **Security training** + - Regular security awareness training + - Threat modeling workshops + - Tabletop exercises for incident response + +## ๐Ÿšจ Reporting Security Vulnerabilities + +**DO NOT** open public GitHub issues for security vulnerabilities. + +Instead, follow our [Security Policy](../../SECURITY.md): + +1. Email security@stacklok.com with details +2. Include proof-of-concept if available +3. Wait for response before public disclosure +4. Coordinate disclosure timeline with security team + +We typically respond within 48 hours and aim to patch critical issues within 7 days. + +## ๐Ÿ“– Related Documentation + +### ToolHive Architecture +- [Architecture Overview](../arch/00-overview.md) +- [Deployment Modes](../arch/01-deployment-modes.md) +- [Secrets Management](../arch/04-secrets-management.md) +- [RunConfig and Permissions](../arch/05-runconfig-and-permissions.md) + +### Security Features +- [Authorization Framework](../authz.md) - Cedar policies +- [Remote MCP Authentication](../remote-mcp-authentication.md) - OAuth/OIDC +- [Middleware](../middleware.md) - Auth/Authz/Audit chain +- [Runtime Implementation Guide](../runtime-implementation-guide.md) - Security mapping + +### Operational Security +- [Kubernetes Integration](../kubernetes-integration.md) +- [Operator Documentation](../../cmd/thv-operator/README.md) +- [Observability](../observability.md) - Logging and monitoring + +## ๐Ÿ”„ Maintenance + +### Review Schedule + +| Activity | Frequency | Owner | Next Due | +|----------|-----------|-------|----------| +| Threat model review | Quarterly | Security Team | 2026-02-19 | +| Attack tree update | Quarterly | Security Team | 2026-02-19 | +| Penetration testing | Annually | External Auditor | TBD | +| Security training | Bi-annually | All Teams | TBD | +| Incident response drill | Quarterly | DevOps + Security | TBD | + +### Change Management + +When to update these documents: +- โœ… New features that handle secrets or authentication +- โœ… Changes to RBAC or permission models +- โœ… New deployment modes or components +- โœ… After security incidents or near-misses +- โœ… New threat intelligence or attack patterns +- โŒ Minor bug fixes without security impact +- โŒ Documentation-only changes +- โŒ Performance optimizations + +### Version History + +| Version | Date | Changes | Author | +|---------|------|---------|--------| +| 1.0 | 2025-11-19 | Initial release with attack tree and threat model | Security Team | + +## ๐Ÿค Contributing + +Security improvements are always welcome! When contributing: + +1. **For new features**: Update threat model with STRIDE analysis +2. **For security fixes**: Reference threat model sections addressed +3. **For architectural changes**: Update attack tree with new vectors +4. **For incident learnings**: Document in threat model and attack tree + +See [CONTRIBUTING.md](../../CONTRIBUTING.md) for general contribution guidelines. + +## ๐Ÿ“ License + +These security documents are part of the ToolHive project and are licensed under [Apache 2.0](../../LICENSE). + +--- + +**Questions or concerns?** Contact security@stacklok.com or open a discussion in our [Discord](https://discord.gg/stacklok). + diff --git a/docs/security/SUMMARY.md b/docs/security/SUMMARY.md new file mode 100644 index 0000000000..135b95f002 --- /dev/null +++ b/docs/security/SUMMARY.md @@ -0,0 +1,248 @@ +# Security Documentation Summary + +## What Has Been Created + +A comprehensive security documentation suite for ToolHive has been created in `/docs/security/` with three main documents: + +### 1. Attack Tree (`attack-tree.md`) +- **Purpose**: Visual representation of attack vectors and paths +- **Format**: Mermaid diagram with detailed attack chains +- **Key Features**: + - 150+ attack scenarios across all deployment modes + - Risk classifications (Critical/High/Medium/Low) + - Cost estimates (attacker effort, impact, prerequisites) + - Threat actor profiles (Script Kiddie โ†’ Nation-State) + - 5 detailed attack chains with mitigations + - Actionable defense strategies + +### 2. Threat Model (`threat-model.md`) +- **Purpose**: STRIDE-based security analysis +- **Methodology**: Spoofing, Tampering, Repudiation, Information Disclosure, DoS, Privilege Escalation +- **Coverage**: 11 major components analyzed + - CLI Binary (`thv`) + - Desktop UI (ToolHive Studio) + - Kubernetes Operator + - Proxy Runner + - MCP Server Containers + - Secrets Management (Local & K8s) + - Middleware Chain + - Registry System + - OAuth/OIDC + - Network Isolation +- **Key Features**: + - Data flow diagrams for each deployment mode + - Trust boundary mapping + - Critical asset inventory with priorities + - Top 10 critical threats (P0) + - 80+ specific threats with mitigations + - Security control recommendations + - Incident response plan + +### 3. Index & Guide (`README.md`) +- **Purpose**: Navigation and quick reference +- **Key Features**: + - Quick reference tables for critical assets + - Top 5 attack vectors to mitigate first + - Security by deployment mode + - Security review checklist + - Risk assessment matrix + - Best practices for developers and operators + - Maintenance schedule + +## Coverage by Architecture Component + +### โœ… Local Deployment +- CLI binary security (command injection, path traversal, privilege escalation) +- Desktop UI threats (Electron vulnerabilities, XSS, IPC abuse) +- Container runtime abuse (Docker socket, container escape) +- Secrets management (keyring, encrypted file storage) + +### โœ… Kubernetes Deployment +- Operator security (CRD injection, RBAC, admission webhooks) +- Proxy runner threats (middleware bypass, K8s API abuse) +- etcd and secrets management +- Pod security standards +- Network policies + +### โœ… Cross-Component +- MCP server container security (image verification, permission profiles) +- Middleware chain (JWT, Cedar authorization) +- OAuth/OIDC flows (PKCE, token handling) +- Network isolation (egress proxy, DNS) +- Registry system (supply chain security) + +## Cost Estimates Provided + +### Attack Cost Levels +- **Low**: Hours to days (script kiddie capability) +- **Medium**: Days to weeks (specialized knowledge) +- **High**: Weeks to months (advanced expertise) +- **Very High**: Months+ (deep expertise/insider access) + +### Impact Levels +- **Medium**: Single workload/user affected +- **High**: Multiple workloads, partial compromise +- **Critical**: Full system compromise, complete data access + +### Target Assets by Cost +Assigned costs to 60+ attack scenarios covering: +- Secret theft (all methods) +- Container escapes +- OAuth compromises +- Supply chain attacks +- Middleware bypasses +- Network isolation bypasses +- Operator attacks +- Desktop UI exploits + +## Industry Best Practices Applied + +### Standards Referenced +- โœ… **STRIDE** methodology (Microsoft) +- โœ… **MITRE ATT&CK** Container Matrix +- โœ… **NIST SP 800-190** Container Security +- โœ… **CIS Kubernetes Benchmark** +- โœ… **OWASP** Container Security +- โœ… **CNCF** Security SIG recommendations + +### Security Patterns Implemented +- โœ… Defense in depth +- โœ… Least privilege principle +- โœ… Zero trust architecture +- โœ… Secure by default +- โœ… Fail securely +- โœ… Complete mediation +- โœ… Separation of duties + +## Actionable Outputs + +### For Security Teams +1. **Immediate Actions**: Top 10 critical threats (P0) with clear remediation steps +2. **Quarterly Reviews**: Predefined schedule and review criteria +3. **Incident Response**: Detection, containment, eradication, recovery procedures +4. **Testing Strategy**: Unit, integration, penetration testing recommendations + +### For Developers +1. **Security Review Checklist**: 40+ items for PR reviews +2. **Best Practices**: Input validation, secrets handling, container security +3. **Security Testing**: Specific test scenarios for each component +4. **Code Examples**: References to existing ToolHive security implementations + +### For Architects +1. **Trust Boundaries**: Clear demarcation between security zones +2. **Data Flow Diagrams**: Visual security analysis for each deployment mode +3. **Threat Actor Profiles**: Understanding adversary capabilities and motivations +4. **Compliance Mapping**: GDPR, SOC 2, HIPAA, PCI DSS considerations + +### For Operators +1. **Deployment-Specific Guidance**: Local CLI vs. Kubernetes security differences +2. **Monitoring & Alerting**: Security event correlation and detection +3. **Hardening Guides**: Configuration recommendations by component +4. **Backup & Recovery**: Disaster recovery procedures + +## Key Differentiators + +### Context-Aware +- Considers ToolHive's unique architecture (CLI, UI, Operator, Remote MCP) +- Covers protocol-specific concerns (stdio, SSE, streamable-http) +- Addresses both Docker and Kubernetes runtimes + +### Comprehensive +- 11 components analyzed with STRIDE +- 150+ attack scenarios mapped +- 80+ specific threats identified +- 60+ cost estimates provided + +### Actionable +- Every threat has a mitigation strategy +- Clear priority levels (P0, P1, P2, P3) +- Implementation status tracked (โœ… Done, โš ๏ธ Partial, โŒ Missing) +- Specific code references where mitigations exist + +### Maintainable +- Quarterly review schedule +- Change management guidelines +- Version history tracking +- Clear ownership and responsibilities + +## Integration with Existing Docs + +These security documents complement existing ToolHive documentation: + +### Cross-References Added +- Architecture docs (`docs/arch/`) +- Authorization framework (`docs/authz.md`) +- Remote MCP authentication (`docs/remote-mcp-authentication.md`) +- Secrets management (`docs/arch/04-secrets-management.md`) +- Middleware (`docs/middleware.md`) +- Operator documentation (`cmd/thv-operator/README.md`) + +### Links to Implementation +- Code references to security implementations +- Specific file paths for mitigations +- Configuration examples from existing docs + +## How to Use These Documents + +### For New Features +1. **Design Phase**: Review threat model for component being modified +2. **Implementation**: Follow security review checklist +3. **Testing**: Use attack scenarios for security testing +4. **Documentation**: Update threat model if new attack surface added + +### For Security Reviews +1. **Quarterly**: Full STRIDE analysis review +2. **Pre-Deployment**: Attack tree walkthrough +3. **Post-Incident**: Update based on lessons learned +4. **Architecture Changes**: Immediate threat model update + +### For Compliance +1. **Audit Prep**: Use threat model as evidence of security analysis +2. **Risk Assessment**: Reference attack cost estimates +3. **Control Mapping**: Link mitigations to compliance requirements +4. **Documentation**: Provide to auditors as security posture evidence + +## Next Steps + +### Immediate (P0) +1. Review and validate all P0 (critical) threats +2. Implement missing mitigations for critical threats +3. Set up security monitoring for high-risk scenarios +4. Schedule first quarterly review + +### Short-Term (P1 - Next Quarter) +1. Add admission webhooks for Kubernetes operator +2. Implement JWT signing key rotation +3. Add image signature verification +4. Enable RBAC auditing + +### Medium-Term (P2 - Next 6 Months) +1. Integrate external secrets operator +2. Implement SBOM generation +3. Add runtime security monitoring (Falco) +4. Deploy centralized SIEM + +### Long-Term (Ongoing) +1. Maintain quarterly review schedule +2. Update after architectural changes +3. Conduct annual penetration testing +4. Track threat landscape evolution + +## Questions or Feedback + +These documents are living artifacts and should evolve with: +- New threat intelligence +- Architectural changes +- Security incidents +- Regulatory requirements +- Technology updates + +**Feedback**: Contact security@stacklok.com or discuss in [Discord](https://discord.gg/stacklok) + +--- + +**Created**: 2025-11-19 +**Version**: 1.0 +**Authors**: Security Team +**Status**: Ready for Review + diff --git a/docs/security/attack-tree.md b/docs/security/attack-tree.md new file mode 100644 index 0000000000..f8fd15582b --- /dev/null +++ b/docs/security/attack-tree.md @@ -0,0 +1,607 @@ +# ToolHive Attack Tree + +This attack tree models the potential attack vectors against the ToolHive platform across its different deployment modes and components. It serves as a structured approach to understanding security threats and implementing appropriate countermeasures. + +## Root Goal: Compromise ToolHive Platform + +### High-Level Attack Vectors + +This overview shows the main attack categories. Click through to detailed sections below for specific attack paths. + +```mermaid +graph LR + ROOT[Compromise ToolHive Platform] --> DEPLOY{OR: Attack Vector by Deployment Mode} + + DEPLOY --> LOCAL[Attack Local Deployment] + DEPLOY --> K8S[Attack Kubernetes Deployment] + DEPLOY --> REMOTE[Attack Remote MCP Servers] + DEPLOY --> SUPPLY[Supply Chain Attack] + DEPLOY --> CROSS[Cross-Component Attacks] + + LOCAL --> LOCAL_SUB["See: Local Deployment Detail"] + K8S --> K8S_SUB["See: Kubernetes Deployment Detail"] + REMOTE --> REMOTE_SUB["See: Remote MCP Detail"] + SUPPLY --> SUPPLY_SUB["See: Supply Chain Detail"] + CROSS --> CROSS_SUB["See: Cross-Component Detail"] + + classDef highRisk fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px + classDef mediumRisk fill:#ffd93d,stroke:#f59f00,stroke-width:2px + classDef detailLink fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,stroke-dasharray: 5 5 + + class LOCAL,K8S,SUPPLY highRisk + class REMOTE,CROSS mediumRisk + class LOCAL_SUB,K8S_SUB,REMOTE_SUB,SUPPLY_SUB,CROSS_SUB detailLink +``` + +--- + +## Detailed Attack Trees by Category + +### 1. Local Deployment Attacks + +Attacks targeting CLI and Desktop UI deployments running on user workstations. + +**ToolHive-Specific Elements**: RunConfig manipulation, MCP proxy abuse, permission profile bypass, ToolHive API exploitation + +**Generic Infrastructure Elements**: Container runtime vulnerabilities, OS-level secret theft (apply to any containerized app) + +```mermaid +graph LR + LOCAL[Attack Local Deployment] --> LOCAL_OR{OR: Local Attack Vectors} + LOCAL_OR --> CLI[Attack thv CLI] + LOCAL_OR --> RUNCONFIG[Manipulate RunConfig] + LOCAL_OR --> SECRETS_LOCAL[Steal Local Secrets] + LOCAL_OR --> DESKTOP[Attack ToolHive Studio UI] + LOCAL_OR --> CONTAINER[Container Runtime Generic] + + CLI --> CLI_VULN_OR{OR: thv CLI Exploitation} + CLI_VULN_OR --> CLI_INJECT[Command Injection via --args] + CLI_VULN_OR --> CLI_PATH[Path Traversal in --from-config] + CLI_VULN_OR --> CLI_SECRET[Secret Injection via --secret] + CLI_VULN_OR --> CLI_RUNTIME[Abuse Runtime Socket Access] + + RUNCONFIG --> RC_OR{OR: RunConfig Attacks} + RC_OR --> RC_TAMPER[Modify Exported RunConfig] + RC_OR --> RC_PRIVPROFILE[Disable Permission Profile] + RC_OR --> RC_NETWORK[Disable Network Isolation] + RC_OR --> RC_VOLUME[Add Malicious Volume Mounts] + + CONTAINER --> CONTAINER_OR{OR: Container Runtime} + CONTAINER_OR --> DOCKER_SOCKET[Docker Socket Abuse - Generic] + CONTAINER_OR --> DOCKER_API[Runtime API Exploit - Generic] + + SECRETS_LOCAL --> SECRETS_LOCAL_OR{OR: Local Secret Theft} + SECRETS_LOCAL_OR --> KEYRING[Extract from OS Keyring] + SECRETS_LOCAL_OR --> SECRET_FILE[Read Encrypted Secret File] + SECRETS_LOCAL_OR --> ENV_VAR[Sniff Environment Variables] + SECRETS_LOCAL_OR --> MEMORY[Extract from Process Memory] + + DESKTOP --> DESKTOP_OR{OR: ToolHive Studio Attacks} + DESKTOP_OR --> STUDIO_IPC[Abuse thv serve API] + DESKTOP_OR --> STUDIO_RENDERER[XSS in Server List/Logs] + DESKTOP_OR --> ELECTRON_VULN[Electron CVE - Generic] + DESKTOP_OR --> UPDATE_HIJACK[Update Hijack - Generic] + + classDef highRisk fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px + classDef mediumRisk fill:#ffd93d,stroke:#f59f00,stroke-width:2px + classDef lowRisk fill:#a8dadc,stroke:#1864ab,stroke-width:1px + classDef toolhive fill:#e1f5fe,stroke:#01579b,stroke-width:2px + classDef generic fill:#f5f5f5,stroke:#616161,stroke-width:1px + + class SECRETS_LOCAL,CLI_INJECT,RC_PRIVPROFILE,CLI_RUNTIME highRisk + class RUNCONFIG,DESKTOP,STUDIO_IPC mediumRisk + class ENV_VAR,MEMORY,ELECTRON_VULN lowRisk +``` + +**Related Documentation**: + +- [Architecture: Deployment Modes](../arch/01-deployment-modes.md) +- [Secrets Management Architecture](../arch/04-secrets-management.md) +- [RunConfig and Permissions](../arch/05-runconfig-and-permissions.md) + +### 2. Kubernetes Deployment Attacks + +Attacks targeting Kubernetes operator deployments in cluster environments. + +**ToolHive-Specific Elements**: MCPServer CRD manipulation, thv-operator exploitation, thv-proxyrunner abuse, ToolHive RBAC + +**Generic Infrastructure Elements**: etcd access, generic RBAC misconfig, pod security (apply to any K8s operator) + +```mermaid +graph LR + K8S[Attack Kubernetes Deployment] --> K8S_OR{OR: K8s Attack Vectors} + K8S_OR --> OPERATOR[Attack thv-operator] + K8S_OR --> PROXY_RUNNER[Attack thv-proxyrunner] + K8S_OR --> CRD[Manipulate MCPServer CRDs] + K8S_OR --> K8S_SECRETS[K8s Secrets - Generic] + K8S_OR --> RBAC[RBAC Misconfig - Generic] + + OPERATOR --> OPERATOR_OR{OR: thv-operator Exploitation} + OPERATOR_OR --> OP_CRD_INJECT[Malicious MCPServer Spec] + OPERATOR_OR --> OP_REGISTRY[Poison MCPRegistry CRD] + OPERATOR_OR --> OP_RECONCILE[Reconciliation Logic Flaw] + OPERATOR_OR --> OP_WEBHOOK[Admission Webhook Bypass - Generic] + + CRD --> CRD_OR{OR: MCPServer CRD Attacks} + CRD_OR --> CRD_PRIVILEGED[Set Privileged: true] + CRD_OR --> CRD_VOLUME[Mount Host Filesystem] + CRD_OR --> CRD_SECRET[Reference Wrong Secrets] + CRD_OR --> CRD_IMAGE[Use Backdoored Image] + + PROXY_RUNNER --> PROXY_OR{OR: thv-proxyrunner Attacks} + PROXY_OR --> PROXY_MIDDLEWARE[Bypass Middleware Chain] + PROXY_OR --> PROXY_STATEFUL[Create Malicious StatefulSet] + PROXY_OR --> PROXY_K8S_API[Abuse K8s API Permissions] + + K8S_SECRETS --> K8S_SECRETS_OR{OR: K8s Secret Theft} + K8S_SECRETS_OR --> ETCD_ACCESS[Direct etcd - Generic] + K8S_SECRETS_OR --> RBAC_ABUSE[RBAC Abuse - Generic] + K8S_SECRETS_OR --> POD_MOUNT[Access MCP Server Secrets] + + classDef highRisk fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px + classDef mediumRisk fill:#ffd93d,stroke:#f59f00,stroke-width:2px + classDef lowRisk fill:#a8dadc,stroke:#1864ab,stroke-width:1px + + class CRD_PRIVILEGED,CRD_VOLUME,PROXY_MIDDLEWARE,OP_CRD_INJECT highRisk + class CRD,PROXY_RUNNER,OP_REGISTRY,PROXY_STATEFUL mediumRisk + class OP_RECONCILE,RBAC_ABUSE lowRisk +``` + +**Related Documentation**: + +- [Kubernetes Operator README](../../cmd/thv-operator/README.md) +- [Operator Architecture](../arch/09-operator-architecture.md) +- [MCPServer CRD API Reference](../operator/crd-api.md) + +### 3. Remote MCP Server Attacks + +Attacks targeting ToolHive's OAuth/OIDC authentication flows and remote MCP server connections. + +**ToolHive-Specific Elements**: RFC 9728 discovery exploitation, dynamic registration abuse, resource parameter manipulation + +**Generic Infrastructure Elements**: Standard OAuth vulnerabilities (apply to any OAuth client) + +```mermaid +graph LR + REMOTE[Attack Remote MCP Servers] --> REMOTE_OR{OR: Remote MCP Attack Vectors} + REMOTE_OR --> OAUTH[Attack ToolHive OAuth Flow] + REMOTE_OR --> DISCOVERY[Exploit RFC 9728 Discovery] + REMOTE_OR --> DYNAMIC_REG[Abuse Dynamic Registration] + REMOTE_OR --> TOKEN_THEFT[Token Theft - Generic] + + OAUTH --> OAUTH_OR{OR: ToolHive OAuth Exploitation} + OAUTH_OR --> PKCE_BYPASS[Bypass PKCE Enforcement] + OAUTH_OR --> REDIRECT_HIJACK[localhost Callback Hijack] + OAUTH_OR --> RESOURCE_PARAM[Resource Parameter Manipulation] + + DISCOVERY --> DISC_OR{OR: Discovery Attacks} + DISC_OR --> WELLKNOWN_SPOOF[Spoof .well-known Endpoint] + DISC_OR --> METADATA_POISON[Poison Resource Metadata] + DISC_OR --> ISSUER_SPOOF[Fake Authorization Server] + + DYNAMIC_REG --> DYN_OR{OR: Dynamic Registration} + DYN_OR --> REG_FLOOD[Register Many Clients] + DYN_OR --> REG_ABUSE[Malicious Redirect URIs] + + TOKEN_THEFT --> TOKEN_OR{OR: Token Theft Methods} + TOKEN_OR --> TOKEN_MEMORY[Extract from thv Memory] + TOKEN_OR --> TOKEN_LEAK[Token in Logs/Errors] + TOKEN_OR --> TOKEN_PHISH[Phishing - Generic] + + classDef highRisk fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px + classDef mediumRisk fill:#ffd93d,stroke:#f59f00,stroke-width:2px + classDef lowRisk fill:#a8dadc,stroke:#1864ab,stroke-width:1px + + class PKCE_BYPASS,WELLKNOWN_SPOOF,RESOURCE_PARAM,METADATA_POISON highRisk + class DISCOVERY,DYNAMIC_REG,ISSUER_SPOOF mediumRisk + class TOKEN_LEAK,TOKEN_PHISH lowRisk +``` + +**Related Documentation**: + +- [Remote MCP Authentication](../remote-mcp-authentication.md) +- [Authorization Framework](../authz.md) + +### 4. Supply Chain Attacks + +Attacks targeting ToolHive's software supply chain, from MCP registries to build pipelines. + +**ToolHive-Specific Elements**: MCP registry manipulation, MCPRegistry CRD poisoning, protocol builds (uvx://, npx://, go://) + +**Generic Infrastructure Elements**: Standard supply chain attacks (apply to any software) + +```mermaid +graph LR + SUPPLY[Supply Chain Attack] --> SUPPLY_OR{OR: Supply Chain Vectors} + SUPPLY_OR --> TH_REGISTRY[Poison ToolHive Registry] + SUPPLY_OR --> MCP_IMAGE[Backdoored MCP Server Image] + SUPPLY_OR --> PROTOCOL_BUILD[Malicious Protocol Build] + SUPPLY_OR --> DEPENDENCY[Malicious Dependency - Generic] + SUPPLY_OR --> BUILD[Build Pipeline - Generic] + + TH_REGISTRY --> REGISTRY_OR{OR: ToolHive Registry Attacks} + REGISTRY_OR --> REG_JSON[Modify registry.json] + REGISTRY_OR --> REG_GIT[Compromise Git Registry Source] + REGISTRY_OR --> REG_CONFIGMAP[Modify MCPRegistry ConfigMap] + REGISTRY_OR --> REG_MITM[MITM Registry Fetch] + + MCP_IMAGE --> IMAGE_OR{OR: MCP Image Attacks} + IMAGE_OR --> IMAGE_TYPO[Typosquat MCP Server Name] + IMAGE_OR --> IMAGE_TROJAN[Trojanized Popular MCP Server] + IMAGE_OR --> IMAGE_UPDATE[Compromise Image in Registry] + + PROTOCOL_BUILD --> PROTO_OR{OR: Protocol Build Attacks} + PROTO_OR --> UVX_POISON[Poison uvx:// Package] + PROTO_OR --> NPX_POISON[Poison npx:// Package] + PROTO_OR --> GO_POISON[Malicious go:// Module] + + classDef highRisk fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px + classDef mediumRisk fill:#ffd93d,stroke:#f59f00,stroke-width:2px + classDef lowRisk fill:#a8dadc,stroke:#1864ab,stroke-width:1px + + class REG_JSON,IMAGE_TROJAN,UVX_POISON,NPX_POISON highRisk + class TH_REGISTRY,MCP_IMAGE,PROTOCOL_BUILD mediumRisk + class REG_MITM,DEPENDENCY lowRisk +``` + +**Related Documentation**: + +- [Registry System Architecture](../arch/06-registry-system.md) +- [Registry Documentation](../registry/) +- [MCPRegistry CRD](../../cmd/thv-operator/REGISTRY.md) + +### 5. Cross-Component Attacks + +Attacks that span multiple components, including ToolHive's middleware, MCP tool abuse, and network isolation bypass. + +**ToolHive-Specific Elements**: Cedar policy exploitation, MCP tool permission abuse, ToolHive egress proxy bypass + +**Generic Infrastructure Elements**: Standard auth bypass, generic network attacks + +#### 5.1 Middleware Chain Attacks + +```mermaid +graph LR + MIDDLEWARE[Attack ToolHive Middleware] --> MW_OR{OR: Middleware Attacks} + MW_OR --> AUTH_BYPASS[Bypass JWT Auth] + MW_OR --> AUTHZ_BYPASS[Bypass Cedar Authorization] + MW_OR --> AUDIT_TAMPER[Tamper Audit Logs] + MW_OR --> MW_ORDER[Exploit Middleware Order] + + AUTH_BYPASS --> AUTH_OR{OR: JWT Auth Bypass} + AUTH_OR --> JWT_FORGE[Forge JWT Token] + AUTH_OR --> JWT_WEAK[Exploit Weak JWT Secret] + AUTH_OR --> JWT_SKIP[Skip JWT Middleware] + + AUTHZ_BYPASS --> AUTHZ_OR{OR: Cedar Authz Bypass} + AUTHZ_OR --> CEDAR_POLICY[Exploit Cedar Policy Logic] + AUTHZ_OR --> CEDAR_CONTEXT[Cedar Context Injection] + AUTHZ_OR --> IDOR_MCP[IDOR on MCP Tools/Resources] + AUTHZ_OR --> TOOL_FILTER[Bypass Tool Filter] + + classDef highRisk fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px + classDef mediumRisk fill:#ffd93d,stroke:#f59f00,stroke-width:2px + classDef lowRisk fill:#a8dadc,stroke:#1864ab,stroke-width:1px + + class CEDAR_POLICY,CEDAR_CONTEXT,JWT_SKIP,IDOR_MCP highRisk + class AUTH_BYPASS,AUTHZ_BYPASS,TOOL_FILTER mediumRisk + class AUDIT_TAMPER,MW_ORDER lowRisk +``` + +**Related Documentation**: + +- [Middleware Architecture](../middleware.md) +- [Authorization Framework (Cedar)](../authz.md) + +#### 5.2 Data Exfiltration via MCP Tools + +```mermaid +graph LR + EXFIL[Data Exfiltration via MCP] + + EXFIL --> EXFIL_OR{OR: Exfiltration Methods} + EXFIL_OR --> MCP_ABUSE[Abuse MCP Tool Permissions] + EXFIL_OR --> VOLUME_ACCESS[Read Mounted Volumes] + EXFIL_OR --> NETWORK_EXFIL[Bypass Network Isolation] + EXFIL_OR --> LOGS_EXFIL[Extract Data via Logs] + + MCP_ABUSE --> MCP_OR{OR: MCP Tool Abuse} + MCP_OR --> TOOL_FETCH[Fetch MCP Server] + MCP_OR --> TOOL_FS[Filesystem MCP Server] + MCP_OR --> TOOL_EXEC[Command Exec MCP Server] + MCP_OR --> TOOL_CUSTOM[Overprivileged Custom Tool] + + classDef highRisk fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px + classDef mediumRisk fill:#ffd93d,stroke:#f59f00,stroke-width:2px + classDef lowRisk fill:#a8dadc,stroke:#1864ab,stroke-width:1px + + class MCP_ABUSE,TOOL_EXEC,TOOL_CUSTOM highRisk + class VOLUME_ACCESS,NETWORK_EXFIL mediumRisk + class LOGS_EXFIL lowRisk +``` + +**Related Documentation**: + +- [RunConfig and Permissions](../arch/05-runconfig-and-permissions.md) + +#### 5.3 ToolHive Network Isolation Bypass + +```mermaid +graph LR + NET_BYPASS[Bypass ToolHive Network Isolation] + + NET_BYPASS --> NET_OR{OR: Isolation Bypass} + NET_OR --> PROXY_BYPASS[Bypass ToolHive Egress Proxy] + NET_OR --> DNS_BYPASS[Bypass ToolHive DNS] + NET_OR --> SQUID_VULN[Exploit Squid in Proxy] + NET_OR --> NO_PROXY[Set NO_PROXY Variable] + NET_OR --> PROTOCOL_TUNNEL[Protocol Tunneling - Generic] + + PROXY_BYPASS --> PB_OR{OR: Proxy Bypass Methods} + PB_OR --> DIRECT_CONNECT[Hardcoded IP Address] + PB_OR --> NONHTTP[Non-HTTP/HTTPS Protocol] + PB_OR --> ACL_BYPASS[Exploit ACL Misconfiguration] + + classDef highRisk fill:#ff6b6b,stroke:#c92a2a,stroke-width:3px + classDef mediumRisk fill:#ffd93d,stroke:#f59f00,stroke-width:2px + classDef lowRisk fill:#a8dadc,stroke:#1864ab,stroke-width:1px + + class PROXY_BYPASS,NO_PROXY,SQUID_VULN highRisk + class DNS_BYPASS,ACL_BYPASS mediumRisk + class PROTOCOL_TUNNEL lowRisk +``` + +**Related Documentation**: + +- [Runtime Implementation Guide (Network Isolation)](../runtime-implementation-guide.md) + +--- + +## Legend + +### Node Types + +- **Root Node**: Main objective of attack (Compromise ToolHive Platform) +- **{OR}**: Any one child path is sufficient to achieve parent goal +- **{AND}**: All child paths must succeed to achieve parent goal +- **(Leaf Nodes)**: Specific attack techniques or actions + +### Attack Specificity + +- **ToolHive-Specific**: Attacks that exploit ToolHive's unique features, architecture, or implementation + - Examples: MCPServer CRD manipulation, Cedar policy bypass, RunConfig tampering, RFC 9728 discovery exploitation, protocol builds (uvx://, npx://, go://) +- **Generic Infrastructure**: Standard attacks applicable to any system using similar technology (labeled with "- Generic" suffix in diagrams) + - Examples: etcd access (any K8s app), Docker socket abuse (any container platform), standard OAuth phishing + +### Risk Classification + +- ๐Ÿ”ด **High Risk (Red)**: Critical impact, leads to full system compromise or secret exposure +- ๐ŸŸก **Medium Risk (Yellow)**: Significant impact, may lead to partial compromise or privilege escalation +- ๐Ÿ”ต **Low Risk (Blue)**: Limited impact, requires additional exploitation steps + +## Attack Cost Estimates (ToolHive-Specific) + +The following table provides estimated costs (attacker effort) and potential impact for key **ToolHive-specific** attack paths. Generic infrastructure attacks (e.g., etcd access, container escape) are excluded. + +| Attack Path | Cost | Impact | Target Asset | Prerequisites | +|-------------|------|--------|--------------|---------------| +| **RunConfig Manipulation** | | | | | +| Modify Exported RunConfig | Low | High | Workload configuration | File system access to exported config | +| Disable Permission Profile | Low | Critical | MCP server restrictions | Access to RunConfig before `thv run` | +| Add Malicious Volume Mounts | Low | Critical | Host file system | Ability to modify RunConfig | +| Disable Network Isolation | Low | High | Network restrictions | Access to RunConfig | +| **ToolHive CLI Exploitation** | | | | | +| Command Injection via --args | Medium | Critical | Code execution in container | Craft malicious CLI arguments | +| Path Traversal in --from-config | Medium | High | Read arbitrary files | Control config file path | +| Secret Injection via --secret | Low | Medium | Inject fake secrets | Craft malicious secret references | +| **MCPServer CRD Attacks** | | | | | +| Set Privileged: true in CRD | Low | Critical | Full node compromise | K8s API write for MCPServer CRD | +| Mount Host Filesystem via CRD | Low | Critical | Host data access | K8s API write for MCPServer CRD | +| Reference Wrong Secrets | Low | Medium | Cross-namespace secret access | K8s API write + RBAC misconfig | +| Use Backdoored MCP Image | Medium | Critical | Container compromise | Control image field in CRD | +| **thv-operator Exploitation** | | | | | +| Malicious MCPServer Spec Injection | Medium | Critical | Deploy malicious workload | K8s API write for MCPServer | +| Poison MCPRegistry CRD | Medium | High | Distribute malware | K8s API write for MCPRegistry | +| Reconciliation Logic Flaw | Very High | Medium | Bypass validation | Find operator bug | +| **thv-proxyrunner Attacks** | | | | | +| Bypass Middleware Chain | High | Critical | Skip auth/authz/audit | Exploit proxy logic flaw | +| Create Malicious StatefulSet | Medium | High | Deploy backdoored MCP server | Compromise proxy runner pod | +| Abuse K8s API Permissions | Medium | High | Cluster-wide access | Exploit proxy RBAC permissions | +| **Cedar Authorization Bypass** | | | | | +| Exploit Cedar Policy Logic | Medium | High | Access unauthorized tools | Find policy logic flaw | +| Cedar Context Injection | High | Critical | Forge authorization context | Inject claims/arguments | +| Bypass Tool Filter | Low | Medium | Access filtered tools | Exploit filter logic | +| IDOR on MCP Tools/Resources | Low | Medium | Access other users' MCP tools | Predictable tool IDs | +| **ToolHive OAuth/OIDC Attacks** | | | | | +| Bypass PKCE Enforcement | High | Critical | Session hijacking | Find PKCE validation bug | +| localhost Callback Hijack | Medium | High | Steal authorization code | Local network access | +| Resource Parameter Manipulation | Medium | Medium | Access wrong resources | Manipulate RFC 8707 parameter | +| Spoof .well-known Endpoint | High | Critical | Fake auth server | MITM or DNS control | +| Poison Resource Metadata | High | Critical | Redirect to malicious issuer | MITM RFC 9728 discovery | +| **ToolHive Registry Attacks** | | | | | +| Modify registry.json | Low | Critical | Distribute malware | File system or git access | +| Poison MCPRegistry ConfigMap | Low | Critical | K8s cluster-wide malware | K8s ConfigMap write access | +| Typosquat MCP Server Name | Medium | High | Trick users to install | Register similar name | +| Trojanize Popular MCP Server | High | Critical | Widespread compromise | Compromise popular image | +| **Protocol Build Attacks** | | | | | +| Poison uvx:// Package | Medium | Critical | Python package compromise | PyPI access or MITM | +| Poison npx:// Package | Medium | Critical | npm package compromise | npm registry access | +| Malicious go:// Module | Medium | High | Go module compromise | Control go module | +| **ToolHive Network Isolation** | | | | | +| Bypass ToolHive Egress Proxy | Medium | High | Unrestricted network | Non-HTTP protocol or direct IP | +| Set NO_PROXY Variable | Low | High | Disable proxy | Environment variable injection | +| Bypass ToolHive DNS | Medium | High | DNS resolution bypass | Hardcoded IPs in MCP server | +| Exploit Squid in ToolHive Proxy | High | Critical | Proxy compromise | Unpatched Squid CVE | +| **ToolHive Studio (Desktop UI)** | | | | | +| Abuse thv serve API | Medium | High | Control all local workloads | Access to API server port | +| XSS in Server List/Logs | Low | Medium | Client-side code execution | Inject HTML in server names/logs | + +### Cost Levels + +- **Low**: Hours to days, script kiddie capability +- **Medium**: Days to weeks, requires specialized knowledge +- **High**: Weeks to months, requires advanced expertise +- **Very High**: Months+, requires deep expertise and/or insider access + +### Impact Levels + +- **Medium**: Limited scope, affects single workload/user +- **High**: Affects multiple workloads/users, partial system compromise +- **Critical**: Full system compromise, complete data access, persistent control + +## Key Attack Chains (ToolHive-Specific) + +### Chain 1: RunConfig Tampering to Host Compromise + +**ToolHive-Specific**: Exploits RunConfig portability and permission profiles + +1. User exports MCP server config: `thv export server1 config.json` +2. Attacker modifies RunConfig to disable permission profile +3. Attacker adds volume mount: `"volumes": ["/:/host:rw"]` +4. User imports and runs: `thv run --from-config config.json` +5. MCP server has full host filesystem access + +**Mitigations**: + +- Validate RunConfig signatures before import +- Warn users when importing configs with privileged settings +- Implement RunConfig schema validation with security checks + +### Chain 2: MCPRegistry Poisoning to Cluster Compromise + +**ToolHive-Specific**: Exploits MCPRegistry CRD and auto-sync + +1. Attacker gains write access to MCPRegistry ConfigMap or Git source +2. Modifies registry.json to point popular MCP server to backdoored image +3. MCPRegistry controller syncs poisoned data +4. Users run infected server: `thv run popular-mcp-server` +5. Malicious container deployed across cluster with normal permissions +6. Backdoor exfiltrates data or escalates privileges + +**Mitigations**: + +- Implement registry signing with Sigstore/Cosign +- ConfigMap write access tightly controlled (RBAC) +- Image scanning before deployment +- Git commit signing required for registry sources + +### Chain 3: Cedar Policy Bypass to Unauthorized MCP Access + +**ToolHive-Specific**: Exploits Cedar context injection + +1. Attacker analyzes Cedar policies for authorization logic +2. Finds policy: `permit when { context.claim_role == "admin" }` +3. Crafts MCP request with injected context/claims +4. Exploits middleware ordering to skip JWT validation +5. Bypasses Cedar authorization checks +6. Accesses restricted MCP tools without valid auth + +**Mitigations**: + +- Validate all context sources in Cedar policies +- Immutable middleware chain ordering +- Never trust client-provided context without signature +- Policy testing framework for edge cases + +### Chain 4: RFC 9728 Discovery Exploitation to MITM + +**ToolHive-Specific**: Exploits ToolHive's RFC 9728 well-known URI discovery + +1. User attempts to connect to remote MCP: `thv run https://mcp.example.com` +2. Attacker performs MITM on network +3. Intercepts `GET /.well-known/oauth-protected-resource` +4. Returns malicious metadata pointing to attacker's auth server +5. ToolHive performs OAuth flow with attacker's server +6. Attacker captures user credentials and tokens + +**Mitigations**: + +- Certificate pinning for well-known endpoints +- Require DNSSEC validation +- Warn users about untrusted OAuth issuers +- Manual issuer override: `--remote-auth-issuer` flag + +### Chain 5: Protocol Build Supply Chain Attack + +**ToolHive-Specific**: Exploits uvx://npx://go:// protocol builds + +1. Attacker typosquats popular MCP server package +2. Uploads to PyPI: `uvx://mcp-servr` (note typo) +3. User runs: `thv run uvx://mcp-servr` (typo in command) +4. ToolHive builds container from malicious package +5. Malicious code executes during build or runtime +6. Backdoor establishes persistence and exfiltrates data + +**Mitigations**: + +- Package name validation and typosquat detection +- Sandbox protocol builds in separate environment +- Display package source prominently before build +- Warn on first-time package usage + +### Chain 6: thv-proxyrunner to Cluster Escalation + +**ToolHive-Specific**: Exploits thv-proxyrunner K8s API permissions + +1. Attacker compromises thv-proxyrunner pod +2. Abuses K8s API permissions to create StatefulSets +3. Creates malicious StatefulSet in different namespace +4. StatefulSet mounts K8s service account with elevated permissions +5. Uses elevated permissions to modify other MCPServer CRDs +6. Deploys backdoored MCP servers cluster-wide + +**Mitigations**: + +- Namespace-scoped RBAC for thv-proxyrunner +- Admission webhooks validate all StatefulSets +- Network policies isolate proxy-runner pods +- Audit all StatefulSet creations by operator components + +## Threat Actor Profiles + +### Script Kiddie (Low Sophistication) + +- **Capabilities**: Uses public exploits, basic tools +- **Targets**: Publicly exposed instances, default configurations +- **Effective Against**: Environment variable sniffing, IDOR, basic XSS +- **Mitigation Priority**: Secure defaults, input validation, basic hardening + +### Malicious Insider (Medium Sophistication) + +- **Capabilities**: Internal knowledge, legitimate access +- **Targets**: Secrets, data exfiltration, privilege escalation +- **Effective Against**: Secret theft, RBAC abuse, audit tampering +- **Mitigation Priority**: Least privilege, audit logging, separation of duties + +### Advanced Persistent Threat (High Sophistication) + +- **Capabilities**: Custom exploits, social engineering, supply chain attacks +- **Targets**: Long-term persistence, data exfiltration, infrastructure control +- **Effective Against**: All vectors, especially supply chain and 0-days +- **Mitigation Priority**: Defense in depth, monitoring, incident response + +### Nation-State Actor (Very High Sophistication) + +- **Capabilities**: 0-day exploits, hardware attacks, insider recruitment +- **Targets**: Critical infrastructure, intellectual property, strategic data +- **Effective Against**: All vectors including hardware/firmware +- **Mitigation Priority**: Assume breach, air gaps, hardware security modules + +## References + +- [MITRE ATT&CK Container Matrix](https://attack.mitre.org/matrices/enterprise/containers/) +- [NIST Container Security Guide](https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-190.pdf) +- [Kubernetes Security Best Practices](https://kubernetes.io/docs/concepts/security/) +- [OWASP Container Security](https://owasp.org/www-community/vulnerabilities/Container_Security) + +## Maintenance + +This attack tree should be reviewed and updated: + +- Quarterly by the security team +- After any significant architectural changes +- Following security incidents or near-misses +- When new threat intelligence emerges + +**Last Updated**: 2025-11-19 +**Next Review**: 2026-02-19 diff --git a/docs/security/threat-model.md b/docs/security/threat-model.md new file mode 100644 index 0000000000..66fbfbf758 --- /dev/null +++ b/docs/security/threat-model.md @@ -0,0 +1,719 @@ +# ToolHive Threat Model (V1.0) + +## 1. Executive Summary + +This threat model analyzes the security posture of the ToolHive platform across its three deployment modes (Local CLI, Local UI, Kubernetes). We use the STRIDE methodology (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) applied to each component in the system. + +### Scope +- **Local Deployment**: CLI (`thv` binary), Desktop UI (ToolHive Studio), Container Runtime +- **Kubernetes Deployment**: Operator (`thv-operator`), Proxy Runner (`thv-proxyrunner`), CRDs +- **Common Components**: MCP Servers, Proxy Layer, Middleware, Registry System, Secrets Management +- **External Integrations**: OAuth/OIDC providers, Remote MCP servers, Container registries + +### Methodology +STRIDE applied to data flow diagrams (DFDs) with focus on trust boundaries and critical assets. + +### Key Findings Summary + +| Component | Critical Threats | Priority | +|-----------|-----------------|----------| +| Secrets Management | Information Disclosure via keyring/file access | P0 | +| Container Runtime | Elevation of Privilege via socket access | P0 | +| Kubernetes Secrets | Information Disclosure via etcd/RBAC | P0 | +| OAuth Flow | Spoofing via issuer/PKCE bypass | P1 | +| Middleware | Tampering via JWT forgery | P1 | +| Registry System | Tampering via poisoned data | P1 | +| Desktop UI | Elevation of Privilege via Electron exploits | P2 | +| Network Isolation | Denial of Service via proxy bypass | P2 | + +## 2. Assets & Trust Boundaries + +### 2.1 Critical Assets + +| Asset | Sensitivity | Owner | Storage Location | Threat Priority | +|-------|-------------|-------|------------------|-----------------| +| **Secrets (API Keys, Tokens)** | Critical | User/Admin | OS Keyring, Encrypted file, K8s Secrets | P0 | +| **Container Runtime Socket** | Critical | System | `/var/run/docker.sock` | P0 | +| **OAuth Access Tokens** | High | User | Memory, optional cache | P0 | +| **OAuth Client Secrets** | High | Admin | K8s Secrets, Config files | P1 | +| **JWT Signing Keys** | High | Admin | Config files, K8s Secrets | P1 | +| **Cedar Authorization Policies** | High | Admin | Config files, ConfigMaps | P1 | +| **MCP Server Data** | Medium-High | User | Container volumes, K8s PVs | P2 | +| **Container Images** | Medium | Developer | Container registries | P2 | +| **Registry Metadata** | Medium | Admin | Git, ConfigMaps, API | P2 | +| **Audit Logs** | Medium | Security | Filesystem, SIEM | P2 | +| **User PII (if any)** | Medium | User | MCP server storage | P3 | + +### 2.2 Trust Boundaries + +```mermaid +graph TB + subgraph Untrusted["๐Ÿ”“ UNTRUSTED"] + User[End User] + Internet[Internet / External Services] + Registry[Public Container Registry] + end + + subgraph DMZ["โš ๏ธ DMZ / SEMI-TRUSTED"] + Desktop[Desktop UI
ToolHive Studio] + RemoteMCP[Remote MCP Server] + OAuth[OAuth/OIDC Provider] + end + + subgraph Trusted["๐Ÿ”’ TRUSTED"] + CLI[CLI Binary
thv] + APIServer[API Server
thv serve] + Operator[K8s Operator
thv-operator] + ProxyRunner[Proxy Runner
thv-proxyrunner] + Middleware[Middleware Chain] + ProxyLayer[Proxy Layer] + end + + subgraph HighlyTrusted["๐Ÿ” HIGHLY TRUSTED"] + ContainerRuntime[Container Runtime
Docker/Podman] + K8sAPI[Kubernetes API Server] + SecretsManager[Secrets Manager] + Keyring[OS Keyring] + etcd[etcd Cluster] + end + + subgraph Isolated["๐Ÿ”’ ISOLATED"] + MCPContainer[MCP Server Container] + EgressProxy[Egress Proxy] + DNSServer[DNS Server] + end + + User -->|Commands| CLI + User -->|UI Interaction| Desktop + Desktop -->|API Calls| APIServer + CLI -->|Container API| ContainerRuntime + APIServer -->|Container API| ContainerRuntime + + Operator -->|K8s API| K8sAPI + ProxyRunner -->|K8s API| K8sAPI + K8sAPI -->|State| etcd + + ContainerRuntime -->|Deploy| MCPContainer + K8sAPI -->|Deploy| MCPContainer + + CLI -->|Secret Access| SecretsManager + SecretsManager -->|Store| Keyring + Operator -->|Secret Access| K8sAPI + + ProxyLayer -->|HTTP/SSE| RemoteMCP + RemoteMCP -->|OAuth| OAuth + + MCPContainer -->|Egress Traffic| EgressProxy + MCPContainer -->|DNS Query| DNSServer + + ProxyLayer -->|Apply| Middleware + + Registry -->|Pull Images| ContainerRuntime + Registry -->|Pull Images| K8sAPI + Internet -->|Data Access| MCPContainer +``` + +### Trust Boundary Definitions + +1. **Untrusted โ†’ DMZ**: Input validation, authentication required +2. **DMZ โ†’ Trusted**: Strong authentication, authorization checks +3. **Trusted โ†’ Highly Trusted**: Authenticated API calls, credential validation +4. **Isolated**: Network and process isolation, permission profiles enforced + +## 3. Data Flow Diagrams (DFDs) + +### 3.1 Local CLI Mode Data Flow + +```mermaid +sequenceDiagram + participant User + participant CLI as thv CLI + participant Secrets as Secrets Manager + participant Runtime as Container Runtime + participant Proxy as Proxy Process + participant Container as MCP Container + + User->>CLI: thv run server-name --secret api-key + + Note over CLI,Secrets: Trust Boundary: Trusted โ†’ Highly Trusted + CLI->>Secrets: GetSecret("api-key") + Secrets-->>CLI: + + Note over CLI,Runtime: Trust Boundary: Trusted โ†’ Highly Trusted + CLI->>Runtime: CreateContainer(image, env, mounts) + Runtime-->>CLI: Container ID + + CLI->>Proxy: Fork detached process + + Note over Proxy,Container: Trust Boundary: Trusted โ†’ Isolated + Proxy->>Container: Attach stdio / HTTP proxy + Container-->>Proxy: MCP responses + + Proxy->>Proxy: Apply Middleware (Auth, Authz, Audit) + + Note over User,Proxy: Trust Boundary: Untrusted โ†’ Trusted + User->>Proxy: MCP Client Request + Proxy-->>User: MCP Response +``` + +### 3.2 Kubernetes Mode Data Flow + +```mermaid +sequenceDiagram + participant User + participant K8sAPI as Kubernetes API + participant Operator as thv-operator + participant ProxyPod as Proxy Runner Pod + participant MCPPod as MCP Server Pod + participant Secret as K8s Secret + + User->>K8sAPI: kubectl apply -f mcpserver.yaml + + Note over K8sAPI,Operator: Trust Boundary: Highly Trusted + Operator->>K8sAPI: Watch MCPServer CRD + K8sAPI-->>Operator: CRD Event + + Operator->>K8sAPI: Create Deployment (proxy-runner) + Operator->>K8sAPI: Create Service + Operator->>K8sAPI: Create ConfigMap (RunConfig) + + Note over ProxyPod,K8sAPI: Trust Boundary: Trusted โ†’ Highly Trusted + ProxyPod->>K8sAPI: Get ConfigMap + ProxyPod->>Secret: Mount Secret as EnvVar + + ProxyPod->>K8sAPI: Create StatefulSet (MCP server) + K8sAPI-->>ProxyPod: StatefulSet Created + + Note over ProxyPod,MCPPod: Trust Boundary: Trusted โ†’ Isolated + ProxyPod->>MCPPod: HTTP Proxy / stdio attach + MCPPod-->>ProxyPod: MCP Response + + ProxyPod->>ProxyPod: Apply Middleware Chain + + Note over User,ProxyPod: Trust Boundary: Untrusted โ†’ Trusted + User->>ProxyPod: MCP Client Request (via Service) + ProxyPod-->>User: MCP Response +``` + +### 3.3 Remote MCP Server Authentication Flow + +```mermaid +sequenceDiagram + participant CLI as thv CLI + participant RemoteMCP as Remote MCP Server + participant AuthServer as OAuth/OIDC Provider + participant Browser as User Browser + + CLI->>RemoteMCP: GET /endpoint + + Note over RemoteMCP,CLI: Trust Boundary: DMZ โ†’ Trusted + RemoteMCP-->>CLI: 401 + WWW-Authenticate header + + CLI->>RemoteMCP: GET /.well-known/oauth-protected-resource + RemoteMCP-->>CLI: Resource metadata + issuer + + Note over CLI,AuthServer: Trust Boundary: Trusted โ†’ DMZ + CLI->>AuthServer: GET /.well-known/openid-configuration + AuthServer-->>CLI: OIDC configuration + + alt Dynamic Registration + CLI->>AuthServer: POST /register (with client metadata) + AuthServer-->>CLI: client_id, client_secret + end + + CLI->>Browser: Open authorization URL (with PKCE challenge) + + Note over Browser,AuthServer: Trust Boundary: Untrusted โ†’ DMZ + Browser->>AuthServer: User authenticates + AuthServer-->>Browser: Redirect with authorization code + + Browser->>CLI: http://localhost:8765/callback?code=... + + CLI->>AuthServer: POST /token (code, PKCE verifier) + AuthServer-->>CLI: access_token, refresh_token + + CLI->>RemoteMCP: Request with Authorization: Bearer + RemoteMCP-->>CLI: MCP Response +``` + +## 4. STRIDE Analysis by Component + +### 4.1 CLI Binary (`thv`) + +| Threat Category | Threat Description | Impact | Mitigation | Priority | +|-----------------|-------------------|--------|------------|----------| +| **Spoofing** | Malicious binary masquerading as `thv` | Complete system compromise | Code signing, binary verification, secure distribution | P0 | +| **Spoofing** | Path injection to execute different binary | Code execution with user privileges | Absolute paths, binary hash verification | P1 | +| **Tampering** | Command injection via unsanitized flags | Arbitrary command execution | Input validation, parameterized commands | P0 | +| **Tampering** | Path traversal in `--from-config` flag | Read/write arbitrary files | Path canonicalization, whitelist validation | P0 | +| **Repudiation** | No audit trail of CLI commands | Cannot trace malicious actions | Command history logging, audit middleware | P2 | +| **Information Disclosure** | Secrets logged to stdout/files | Credential exposure | Redact secrets in logs, secure log permissions | P1 | +| **Information Disclosure** | Secrets in process memory | Memory dump reveals credentials | Memory encryption, secure allocators | P2 | +| **Denial of Service** | Resource exhaustion via unlimited containers | System unavailability | Resource limits, container quotas | P2 | +| **Elevation of Privilege** | Abuse of Docker socket mount | Full host compromise | Rootless containers, socket authentication | P0 | +| **Elevation of Privilege** | SUID/SGID misconfiguration on binary | Privilege escalation | Proper permissions (755), no SUID bit | P1 | + +**Key Mitigations Implemented:** +- โœ… Input validation on all CLI flags (`pkg/runner/config.go`) +- โœ… Path traversal protection (`pkg/permissions/profile.go`) +- โœ… Secret redaction in logs (Sentry integration) +- โš ๏ธ Partial: Audit logging (via middleware, not CLI-level) +- โŒ Missing: Binary signing for releases + +### 4.2 Desktop UI (ToolHive Studio) + +| Threat Category | Threat Description | Impact | Mitigation | Priority | +|-----------------|-------------------|--------|------------|----------| +| **Spoofing** | Fake update package | Malware installation | Code signing, HTTPS update channel | P0 | +| **Spoofing** | Phishing UI mimicking ToolHive | Credential theft | User education, verified downloads | P2 | +| **Tampering** | XSS in renderer process | Execute arbitrary JavaScript | CSP, input sanitization, context isolation | P1 | +| **Tampering** | IPC message injection | Bypass security checks | IPC validation, type checking | P1 | +| **Repudiation** | No UI action audit trail | Cannot trace user actions | Event logging to audit system | P2 | +| **Information Disclosure** | Secrets visible in renderer DevTools | Credential exposure | Hide secrets from renderer, main process only | P1 | +| **Information Disclosure** | Sensitive data in Electron logs | Log-based credential leakage | Disable DevTools in production, log redaction | P1 | +| **Denial of Service** | Memory leak in renderer | Application crash | Memory management, automatic restart | P3 | +| **Elevation of Privilege** | Electron vulnerability (e.g., CVE) | System-level code execution | Regular Electron updates, security patches | P0 | +| **Elevation of Privilege** | Node integration enabled in renderer | Full Node.js API access | Disable Node integration, use contextBridge | P0 | + +**Key Mitigations Implemented:** +- โœ… Context isolation enabled (`toolhive-studio/main/src/preload.ts`) +- โœ… Node integration disabled in renderer +- โœ… Auto-update mechanism with verification +- โš ๏ธ Partial: CSP headers configured +- โŒ Missing: Comprehensive IPC input validation + +### 4.3 Kubernetes Operator (`thv-operator`) + +| Threat Category | Threat Description | Impact | Mitigation | Priority | +|-----------------|-------------------|--------|------------|----------| +| **Spoofing** | Fake operator pod in cluster | Deploy malicious workloads | Namespace restrictions, pod identity | P1 | +| **Spoofing** | Compromised ServiceAccount token | Impersonate operator | Token rotation, short TTLs, bound tokens | P0 | +| **Tampering** | Malicious CRD injection | Deploy backdoored MCP servers | Admission webhooks, OPA/Kyverno policies | P0 | +| **Tampering** | Modify existing MCPServer CRDs | Workload manipulation | RBAC, audit logging, mutation detection | P1 | +| **Repudiation** | Operator actions not logged | Cannot trace malicious changes | Kubernetes audit logs, OpenTelemetry | P1 | +| **Information Disclosure** | Secrets exposed in CRD status | Credential leakage | Never put secrets in status, use SecretRef | P0 | +| **Information Disclosure** | Operator logs contain secrets | Log-based compromise | Redact secrets, structured logging | P1 | +| **Denial of Service** | Create infinite MCPServers | Resource exhaustion | ResourceQuotas, admission webhooks | P2 | +| **Denial of Service** | Crash operator via malformed CRD | Service unavailability | Input validation, error handling, panic recovery | P2 | +| **Elevation of Privilege** | Operator with excessive RBAC | Cluster-wide compromise | Least-privilege RBAC, namespace-scoped | P0 | +| **Elevation of Privilege** | Exploit reconciliation race condition | Bypass admission policies | Idempotent reconciliation, locking | P2 | + +**Key Mitigations Implemented:** +- โœ… Least-privilege RBAC (`deploy/charts/toolhive-operator/templates/rbac.yaml`) +- โœ… Secrets referenced via SecretKeyRef, never in CRD values +- โœ… Input validation on CRD specs +- โœ… Panic recovery in controllers +- โš ๏ธ Partial: Admission webhooks (planned) +- โŒ Missing: Comprehensive reconciliation lock + +### 4.4 Proxy Runner (`thv-proxyrunner`) + +| Threat Category | Threat Description | Impact | Mitigation | Priority | +|-----------------|-------------------|--------|------------|----------| +| **Spoofing** | Fake proxy pod | MITM MCP traffic | Pod identity verification, mTLS | P1 | +| **Tampering** | Modify MCP requests in transit | Data manipulation | TLS, message signing, integrity checks | P1 | +| **Tampering** | Bypass middleware chain | Skip auth/authz/audit | Middleware validation, immutable order | P0 | +| **Repudiation** | No audit of proxied requests | Cannot trace malicious requests | Audit middleware, structured logs | P1 | +| **Information Disclosure** | MCP traffic logged unencrypted | Sensitive data exposure | Encrypt logs, redact payloads | P2 | +| **Information Disclosure** | Secrets in proxy environment | Pod inspection reveals credentials | Use K8s secrets with projectedVolume, not env | P1 | +| **Denial of Service** | Proxy crash/restart | Service unavailability | Health checks, auto-restart, multiple replicas | P2 | +| **Denial of Service** | Slowloris attack on proxy | Resource exhaustion | Connection limits, timeouts, rate limiting | P2 | +| **Elevation of Privilege** | Proxy creates privileged StatefulSet | Deploy privileged MCP server | Pod Security Standards, admission control | P0 | +| **Elevation of Privilege** | Abuse K8s API permissions | Create resources outside namespace | Namespace-scoped RBAC, validation | P1 | + +**Key Mitigations Implemented:** +- โœ… Middleware chain enforcement (`pkg/middleware/`) +- โœ… Health checks and auto-restart +- โœ… Namespace-scoped permissions +- โš ๏ธ Partial: Rate limiting (application-level only) +- โŒ Missing: mTLS between proxy and MCP server +- โŒ Missing: Request signing + +### 4.5 MCP Server Containers + +| Threat Category | Threat Description | Impact | Mitigation | Priority | +|-----------------|-------------------|--------|------------|----------| +| **Spoofing** | Typosquatted container image | Deploy malicious server | Image verification, registry allow-list | P0 | +| **Spoofing** | Backdoored base image | Hidden malware in server | Image scanning, trusted registries | P1 | +| **Tampering** | Container escape to host | Full host compromise | Rootless containers, seccomp, AppArmor | P0 | +| **Tampering** | Modify host filesystem via volume | Data tampering | Read-only mounts, permission profiles | P1 | +| **Repudiation** | No container audit logs | Cannot trace malicious actions | Container stdout to audit system | P2 | +| **Information Disclosure** | Data exfiltration via network | Sensitive data leakage | Network isolation, egress proxy | P1 | +| **Information Disclosure** | Secrets in container image layers | Image-based credential exposure | Runtime secret injection, not build-time | P0 | +| **Denial of Service** | Resource exhaustion (CPU/memory) | Service unavailability | Resource limits, OOM killer | P2 | +| **Denial of Service** | Fork bomb | Host unavailability | PID limits, cgroup constraints | P2 | +| **Elevation of Privilege** | Privileged container | Full host access | Pod Security Standards, never privileged | P0 | +| **Elevation of Privilege** | CAP_SYS_ADMIN capability | Kernel-level access | Drop all capabilities by default | P0 | + +**Key Mitigations Implemented:** +- โœ… Permission profiles with capability dropping (`pkg/permissions/profile.go`) +- โœ… Network isolation with egress proxy (`pkg/networking/`) +- โœ… Resource limits in RunConfig +- โœ… Read-only root filesystem option +- โœ… Runtime secret injection via environment +- โš ๏ธ Partial: Image scanning (user responsibility) +- โŒ Missing: Mandatory image signature verification + +### 4.6 Secrets Management + +| Threat Category | Threat Description | Impact | Mitigation | Priority | +|-----------------|-------------------|--------|------------|----------| +| **Spoofing** | Fake keyring daemon | Intercept secret reads | OS-level keyring authentication | P1 | +| **Tampering** | Modify encrypted secrets file | Inject malicious secrets | File integrity checks, HMAC | P1 | +| **Tampering** | Downgrade encryption algorithm | Weaker security | Version checking, minimum standards | P2 | +| **Repudiation** | No audit of secret access | Cannot trace secret misuse | Audit logging on read/write/delete | P2 | +| **Information Disclosure** | Keyring password in memory | Memory dump reveals master key | Secure memory allocation, zeroing | P1 | +| **Information Disclosure** | Secrets file readable by all users | Unauthorized secret access | File permissions 0600, user-owned | P0 | +| **Information Disclosure** | Secrets in environment variables | Process inspection reveals secrets | Use file-based or memory-based injection | P1 | +| **Information Disclosure** | 1Password token in config | Token compromise | Secure token storage, rotation | P1 | +| **Denial of Service** | Corrupt secrets database | Cannot retrieve secrets | Backups, corruption detection | P2 | +| **Elevation of Privilege** | Access secrets of other users | Cross-user secret access | User-scoped storage, OS isolation | P1 | + +**Key Mitigations Implemented:** +- โœ… AES-256-GCM encryption for local secrets (`pkg/secrets/aes/`) +- โœ… OS keyring integration (`pkg/secrets/keyring/`) +- โœ… File permissions 0600 on secret storage +- โœ… Secret redaction in logs +- โš ๏ธ Partial: Secret access auditing (logs only) +- โŒ Missing: HMAC for file integrity +- โŒ Missing: Secret rotation automation + +### 4.7 Kubernetes Secrets + +| Threat Category | Threat Description | Impact | Mitigation | Priority | +|-----------------|-------------------|--------|------------|----------| +| **Spoofing** | Fake K8s API server | Intercept secret writes | Certificate validation, kubeconfig auth | P0 | +| **Tampering** | Modify secrets via K8s API | Inject malicious credentials | RBAC, admission webhooks | P0 | +| **Repudiation** | Secret modifications not logged | Cannot trace changes | Kubernetes audit logs, AlertManager | P1 | +| **Information Disclosure** | Direct etcd access | Read all secrets unencrypted | etcd encryption at rest, network isolation | P0 | +| **Information Disclosure** | RBAC misconfiguration | Unauthorized secret listing | Least-privilege, regular RBAC audits | P0 | +| **Information Disclosure** | Secrets in pod logs | Log aggregation exposes secrets | Never log secret values | P1 | +| **Information Disclosure** | Secrets in git (manifests) | Version control exposure | Use external-secrets or sealed-secrets | P0 | +| **Denial of Service** | Delete all secrets | Service outage | RBAC restrictions, backups, Velero | P1 | +| **Elevation of Privilege** | ServiceAccount with get/list secrets | Privilege escalation | Namespace isolation, bound tokens | P0 | + +**Key Mitigations Implemented:** +- โœ… RBAC with least-privilege (`deploy/charts/toolhive-operator/templates/rbac.yaml`) +- โœ… SecretRef pattern (never secrets in CRD values) +- โœ… Secrets mounted as volumes, not environment variables (where possible) +- โš ๏ธ Partial: etcd encryption (cluster admin responsibility) +- โŒ Missing: Automatic secret rotation +- โŒ Missing: External secrets operator integration + +### 4.8 Middleware Chain (Auth/Authz/Audit) + +| Threat Category | Threat Description | Impact | Mitigation | Priority | +|-----------------|-------------------|--------|------------|----------| +| **Spoofing** | Forge JWT with weak secret | Impersonate any user | Strong signing keys (RS256/ES256), key rotation | P0 | +| **Spoofing** | JWT algorithm confusion (none) | Bypass signature verification | Whitelist allowed algorithms, reject "none" | P0 | +| **Tampering** | Modify Cedar policies at runtime | Bypass authorization | Immutable policies, version control | P1 | +| **Tampering** | Inject claims into JWT | Gain elevated permissions | Validate JWT issuer, audience, signature | P0 | +| **Repudiation** | Audit logs tampered/deleted | Hide malicious activity | Append-only logs, SIEM integration | P1 | +| **Repudiation** | No correlation ID across requests | Cannot trace request chain | Distributed tracing (OpenTelemetry) | P2 | +| **Information Disclosure** | JWT contains sensitive claims | PII exposure | Minimize JWT payload, use opaque tokens | P2 | +| **Information Disclosure** | Cedar policies leaked | Authorization logic disclosure | Protect policy files, access control | P2 | +| **Denial of Service** | CPU-expensive JWT validation | Service slowdown | Caching, rate limiting | P2 | +| **Elevation of Privilege** | Cedar policy bypass via context injection | Unauthorized access | Validate context sources, sanitize inputs | P1 | +| **Elevation of Privilege** | IDOR on MCP resources | Access other users' tools | Implement resource ownership checks | P1 | + +**Key Mitigations Implemented:** +- โœ… JWT validation with issuer/audience checks (`pkg/auth/token.go`) +- โœ… Cedar policy evaluation (`pkg/authorization/cedar/`) +- โœ… Audit middleware (`pkg/middleware/audit/`) +- โœ… OpenTelemetry tracing integration +- โš ๏ธ Partial: JWT signing key rotation (manual) +- โŒ Missing: IDOR protection framework +- โŒ Missing: Cedar policy testing framework + +### 4.9 Registry System + +| Threat Category | Threat Description | Impact | Mitigation | Priority | +|-----------------|-------------------|--------|------------|----------| +| **Spoofing** | MITM registry fetch | Serve malicious registry data | HTTPS only, certificate pinning | P0 | +| **Spoofing** | Typosquatted registry URL | Use malicious registry | URL validation, trusted registry list | P1 | +| **Tampering** | Git repository compromise | Poisoned registry data | Git commit signatures, branch protection | P1 | +| **Tampering** | ConfigMap modification | Inject malicious servers | RBAC, admission webhooks | P1 | +| **Repudiation** | Registry updates not logged | Cannot trace malicious additions | Git history, K8s audit logs | P2 | +| **Information Disclosure** | Registry API key exposed | Unauthorized registry access | Secure API key storage, rotation | P1 | +| **Denial of Service** | Malformed registry JSON | Parser crash | Schema validation, error handling | P2 | +| **Denial of Service** | Registry sync loop | Operator CPU exhaustion | Sync interval limits, backoff | P2 | +| **Elevation of Privilege** | Registry metadata triggers code execution | RCE via JSON parsing | Safe parsing, input validation | P1 | + +**Key Mitigations Implemented:** +- โœ… HTTPS-only registry fetches +- โœ… JSON schema validation (`pkg/registry/`) +- โœ… Git source support with commit history +- โš ๏ธ Partial: Registry integrity checks (checksums) +- โŒ Missing: Certificate pinning +- โŒ Missing: Registry signing/verification + +### 4.10 OAuth/OIDC Remote Authentication + +| Threat Category | Threat Description | Impact | Mitigation | Priority | +|-----------------|-------------------|--------|------------|----------| +| **Spoofing** | Fake authorization server | Steal authorization codes | Issuer validation, well-known endpoint checks | P0 | +| **Spoofing** | Fake token endpoint | Steal client credentials | Certificate validation, HTTPS enforcement | P0 | +| **Tampering** | Authorization code hijacking | Session takeover | PKCE mandatory, short code expiry | P0 | +| **Tampering** | Token substitution | Impersonate different user | Audience validation, token binding | P1 | +| **Repudiation** | OAuth flow not logged | Cannot trace compromised sessions | Audit logging of auth events | P2 | +| **Information Disclosure** | Access token in URL | Token leakage via referrer/logs | POST-based token exchange, never GET | P0 | +| **Information Disclosure** | Refresh token stolen | Long-term access | Secure storage, rotation, revocation | P1 | +| **Denial of Service** | OAuth callback flood | Service unavailability | Rate limiting on callback endpoint | P2 | +| **Elevation of Privilege** | PKCE bypass via downgrade | Code interception attack | Enforce PKCE, reject flows without it | P0 | +| **Elevation of Privilege** | Redirect URI validation bypass | Open redirect to attacker | Strict redirect URI matching | P0 | + +**Key Mitigations Implemented:** +- โœ… PKCE mandatory by default (`pkg/auth/oauth/`) +- โœ… HTTPS enforcement (localhost exception) +- โœ… Issuer validation via OIDC discovery +- โœ… Audience validation in tokens +- โœ… RFC 9728 well-known URI discovery +- โš ๏ธ Partial: Token storage security (memory only) +- โŒ Missing: Token binding (RFC 8705) +- โŒ Missing: Refresh token rotation enforcement + +### 4.11 Network Isolation (Egress Proxy) + +| Threat Category | Threat Description | Impact | Mitigation | Priority | +|-----------------|-------------------|--------|------------|----------| +| **Spoofing** | Fake egress proxy | Intercept all outbound traffic | Proxy authentication, certificate validation | P1 | +| **Tampering** | Bypass proxy via direct routing | Unrestricted network access | Network policies, iptables rules | P0 | +| **Tampering** | DNS poisoning | Redirect to malicious servers | DNSSEC, trusted DNS server | P1 | +| **Repudiation** | No proxy access logs | Cannot trace exfiltration | Squid access logs, structured logging | P2 | +| **Information Disclosure** | Proxy logs contain sensitive data | Log-based credential leakage | Redact request bodies, header filtering | P2 | +| **Denial of Service** | Proxy crash | No network access | Health checks, proxy redundancy | P2 | +| **Denial of Service** | Connection exhaustion | Proxy unavailability | Connection limits, timeouts | P2 | +| **Elevation of Privilege** | Exploit Squid vulnerability | Proxy compromise | Regular updates, vulnerability scanning | P1 | +| **Elevation of Privilege** | ACL bypass via protocol tunneling | Escape network restrictions | Protocol inspection, deep packet inspection | P1 | + +**Key Mitigations Implemented:** +- โœ… Squid-based egress proxy (`pkg/container/docker/squid.go`) +- โœ… ACL-based host/port filtering +- โœ… DNS server for isolation mode +- โš ๏ธ Partial: Access logging (basic only) +- โŒ Missing: Protocol inspection (HTTP/HTTPS only) +- โŒ Missing: Proxy authentication +- โŒ Missing: DNSSEC validation + +## 5. Summary of Critical Threats + +### Top 10 Critical Threats (P0) + +| # | Threat | Component | Category | Impact | +|---|--------|-----------|----------|--------| +| 1 | **Secrets file world-readable** | Secrets Management | Information Disclosure | All API keys/tokens exposed | +| 2 | **Docker socket exposed without auth** | Container Runtime | Elevation of Privilege | Full host compromise | +| 3 | **etcd direct access without encryption** | K8s Secrets | Information Disclosure | All cluster secrets exposed | +| 4 | **JWT weak signing key** | Middleware | Spoofing | Forge any user identity | +| 5 | **PKCE bypass in OAuth** | Remote Auth | Elevation of Privilege | Session hijacking | +| 6 | **Privileged container allowed** | MCP Container | Elevation of Privilege | Kernel-level access | +| 7 | **CRD injection without admission control** | Operator | Tampering | Deploy malicious workloads | +| 8 | **RBAC allows secret listing** | K8s Secrets | Information Disclosure | Cross-namespace secret access | +| 9 | **Container image not verified** | MCP Container | Spoofing | Execute backdoored code | +| 10 | **Bypass network isolation** | Network Isolation | Tampering | Unrestricted data exfiltration | + +## 6. Recommended Security Controls + +### 6.1 Authentication & Authorization + +#### Implemented โœ… +- JWT-based authentication with issuer/audience validation +- Cedar policy engine for fine-grained authorization +- OAuth 2.0/OIDC with PKCE for remote servers +- Dynamic client registration (RFC 7591) +- RFC 9728 protected resource metadata discovery + +#### Recommended Additions โš ๏ธ +- Implement token binding (RFC 8705) for OAuth flows +- Add mutual TLS (mTLS) between proxy and MCP servers +- Enforce refresh token rotation +- Implement IDOR protection framework with resource ownership +- Add JWT signing key rotation automation + +### 6.2 Secrets Management + +#### Implemented โœ… +- AES-256-GCM encryption for local secrets +- OS keyring integration (keyctl, Keychain, DPAPI) +- K8s secrets with SecretKeyRef pattern +- Secret redaction in logs +- Runtime secret injection (not build-time) + +#### Recommended Additions โš ๏ธ +- Add HMAC for encrypted file integrity +- Implement automatic secret rotation +- Integrate with HashiCorp Vault / External Secrets Operator +- Add secrets access auditing +- Implement secret versioning and rollback + +### 6.3 Container Security + +#### Implemented โœ… +- Permission profiles with capability dropping +- Network isolation with egress proxy +- Resource limits (CPU, memory, PIDs) +- Read-only root filesystem option +- Rootless container support + +#### Recommended Additions โš ๏ธ +- Mandatory image signature verification +- Implement image scanning integration (Trivy, Grype) +- Add Pod Security Standards enforcement +- Implement seccomp/AppArmor profiles +- Add runtime security monitoring (Falco) + +### 6.4 Network Security + +#### Implemented โœ… +- Network isolation mode with egress proxy +- ACL-based host/port filtering +- HTTPS enforcement for external connections +- Certificate validation + +#### Recommended Additions โš ๏ธ +- Add mTLS between components +- Implement DNSSEC validation +- Add protocol inspection (DPI) for egress +- Implement rate limiting at multiple layers +- Add intrusion detection system (IDS) + +### 6.5 Kubernetes Security + +#### Implemented โœ… +- Least-privilege RBAC +- Namespace isolation +- SecretKeyRef pattern (no secrets in CRDs) +- Kubernetes audit logging +- Resource quotas + +#### Recommended Additions โš ๏ธ +- Implement admission webhooks for validation +- Add OPA/Kyverno policies +- Enable etcd encryption at rest +- Implement network policies +- Add Pod Security Standards + +### 6.6 Audit & Monitoring + +#### Implemented โœ… +- Audit middleware in proxy chain +- OpenTelemetry distributed tracing +- Structured logging +- Kubernetes audit logs + +#### Recommended Additions โš ๏ธ +- Centralized SIEM integration +- Real-time alerting on security events +- Correlation ID across all components +- Security event correlation and analysis +- Compliance reporting automation + +### 6.7 Supply Chain Security + +#### Implemented โœ… +- HTTPS-only registry fetches +- JSON schema validation +- Git source support with history + +#### Recommended Additions โš ๏ธ +- Implement SBOM generation and verification +- Add dependency scanning (Dependabot, Renovate) +- Registry signing and verification (Sigstore/Cosign) +- Reproducible builds +- Build provenance (SLSA) + +## 7. Security Testing Recommendations + +### 7.1 Unit Testing +- Test input validation on all user inputs +- Test JWT parsing edge cases (algorithm confusion, expired tokens) +- Test Cedar policy evaluation (boundary conditions, nested policies) +- Test secret encryption/decryption roundtrip +- Test path traversal prevention + +### 7.2 Integration Testing +- Test OAuth flow with malicious redirect URIs +- Test network isolation bypass attempts +- Test RBAC enforcement across namespaces +- Test container escape scenarios +- Test middleware chain ordering and bypass attempts + +### 7.3 Penetration Testing +- External penetration test of Kubernetes deployments +- Red team exercise simulating APT attack +- Container escape testing +- Secrets extraction attempts +- Supply chain attack simulation + +### 7.4 Security Scanning +- Static analysis (gosec, Semgrep) +- Dependency scanning (Snyk, Trivy) +- Container image scanning +- Kubernetes manifest scanning (Checkov, Kubesec) +- Dynamic analysis (DAST) on API endpoints + +## 8. Incident Response Plan + +### 8.1 Detection +- Monitor audit logs for anomalous activity +- Alert on unauthorized secret access +- Detect container escape attempts +- Track failed authentication attempts +- Monitor for unusual network patterns + +### 8.2 Containment +- Revoke compromised tokens immediately +- Isolate affected workloads (network policies) +- Rotate secrets across all systems +- Disable compromised user accounts +- Backup current state for forensics + +### 8.3 Eradication +- Remove malicious workloads +- Patch exploited vulnerabilities +- Reset credentials and secrets +- Remove backdoors and persistence mechanisms +- Rebuild compromised infrastructure + +### 8.4 Recovery +- Restore from clean backups +- Gradually restore services with monitoring +- Re-enable user accounts after verification +- Document lessons learned +- Update threat model and mitigations + +## 9. Compliance Considerations + +### 9.1 Regulatory Requirements +- **GDPR**: Ensure PII protection, right to deletion, breach notification +- **SOC 2**: Audit logging, access controls, change management +- **HIPAA**: Encryption at rest/transit, audit trails, access logs +- **PCI DSS**: Network segmentation, encryption, access control + +### 9.2 Industry Standards +- **CIS Kubernetes Benchmark**: Apply hardening guidelines +- **NIST SP 800-190**: Container security best practices +- **OWASP Container Security**: Follow OWASP guidelines +- **CNCF Security SIG**: Adopt CNCF security recommendations + +## 10. Maintenance and Review + +This threat model should be: +- **Reviewed quarterly** by security team and architects +- **Updated** after significant architectural changes +- **Revised** following security incidents or near-misses +- **Enhanced** when new components are added +- **Validated** through regular penetration testing + +### Review Schedule +- **Q1 2026**: Full STRIDE analysis review +- **Q2 2026**: Mitigation effectiveness assessment +- **Q3 2026**: Threat landscape update +- **Q4 2026**: Annual security posture evaluation + +**Document Version**: 1.0 +**Last Updated**: 2025-11-19 +**Next Review**: 2026-02-19 +**Owner**: Security Team +**Reviewers**: Architecture Team, DevOps Team, Product Security +