Skip to content

feat: HashiCorp Vault as a read-only external credential store#256

Open
preetbhinder wants to merge 2 commits into
Infisical:mainfrom
trmlabs:hashicorp-pr
Open

feat: HashiCorp Vault as a read-only external credential store#256
preetbhinder wants to merge 2 commits into
Infisical:mainfrom
trmlabs:hashicorp-pr

Conversation

@preetbhinder

Copy link
Copy Markdown

Summary

Adds HashiCorp Vault as a third read-only external credential store for Agent Vault, alongside the built-in store and Infisical. A vault's credentials can now be sourced from a HashiCorp Vault KV mount, synced read-only and brokered to agents like any other credential.

It is a faithful mirror of the existing internal/infisical integration.

Functional overview (how an operator uses it)

  1. Configure the server: VAULT_ADDR plus an auth method — VAULT_TOKEN, or VAULT_ROLE_ID + VAULT_SECRET_ID for AppRole. (Standard VAULT_NAMESPACE / VAULT_CACERT / VAULT_SKIP_VERIFY are honored.)
  2. Create a HashiCorp-backed vault, via CLI:
    agent-vault vault create my-app \
      --credential-store=hashicorp \
      --hashicorp-mount=secret \
      --hashicorp-path=agent-vault/demo \
      --hashicorp-kv-version=2
    
    …or via the web New vault form (HashiCorp Vault option + mount/path/KV-version fields).
  3. The KV item's key/value pairs become the vault's credentials, polled read-only at the per-vault cadence and served through the broker. Rotate a secret in HashiCorp Vault → it propagates on the next sync (or Manual sync). Local mutations are rejected; HashiCorp Vault is the source of truth.

Technical overview

  • internal/hashicorp (new) — mirrors internal/infisical: auth-method detection (token / AppRole), a vault/api client (KV v1 + v2), and a per-vault Syncer that fetches, encrypts with the DEK, and replaces the cached credentials.
  • store — migration 050 widens the vault_credential_stores.kind CHECK to allow 'hashicorp'; adds the CredentialStoreHashicorp constant.
  • server — a second sync worker; the shutdown drain now waits on all external-store syncers before wiping the DEK.
  • handlerscreateExternalVault and the manual-sync handler dispatch by kind; upstream topology (mount/path, sync errors, invalid-key names) is redacted from non-admin/owner viewers.
  • CLI--credential-store=hashicorp + --hashicorp-mount/-path/-kv-version.
  • web — HashiCorp option + fields in the create-vault form.

Dependency note — github.com/hashicorp/vault/api

This introduces vault/api v1.23.0 (+ ~14 transitive deps). The code uses a thin slice (DefaultConfig, NewClient, SetToken/SetNamespace, Logical().WriteWithContext for AppRole login, KVv1/KVv2.Get), so a ~150-line net/http client is technically feasible. We chose the official SDK because it: correctly handles the standard VAULT_* env surface operators expect to "just work" (VAULT_CACERT/VAULT_SKIP_VERIFY/VAULT_NAMESPACE); unwraps the KV v2 data/metadata envelope (a footgun to reimplement); and provides retry/backoff + rootcert handling. This is consistent with the precedent (the Infisical store already depends on infisical/go-sdk). go.mod/go.sum are go mod tidy-clean. Happy to swap to a thin client as a follow-up if maintainers prefer.

Type of change

  • Bug fix
  • New feature
  • Refactor / cleanup
  • Documentation
  • CI / build

Test plan

  • Existing tests pass (make test — full go test ./... green, incl. internal/mitm e2e)
  • Added/updated tests for new behavior — unit (auth detection + AppRole login, config parse/validate/defaults, stringValue coercion, Syncer refresh/invalid-key/kind-filter), a hermetic integration test driving the real vault/api client against an httptest stub Vault (KV v1, KV v2, not-found — no Docker, runs in make test), and an end-to-end test injecting a synced credential through the MITM proxy (internal/mitm/forward_hashicorp_test.go).
  • Manual testing — ran a live HashiCorp Vault (Docker), seeded secrets, created the vault via both CLI and the web form, confirmed the synced values match upstream, exercised injection through the proxy, and verified rotation propagates after sync. (Happy to record a screen walkthrough if useful.)

Security checklist

  • No secrets or credentials in code — tokens/RoleID/SecretID read from env only; .env.example uses empty placeholders; secret values are never logged or returned (only key counts / scrubbed messages).
  • No new unauthenticated endpoints — reuses existing routes; external-store create is instance-owner-gated; the available-stores list is owner-gated.
  • Input validation on new API surfaces — VaultConfig.Validate() (mount/path/kv_version); poll-interval floor enforced server-side and via DB CHECK; CLI + web pre-validate.
  • Checked for OWASP top 10 — AppRole login sends structured params to a fixed path (no injection); upstream error bodies (which can embed VAULT_ADDR) are scrubbed to logs; VAULT_SKIP_VERIFY downgrade is surfaced as an auditable warning; synced values encrypted at rest with the DEK before persistence; UPPER_SNAKE_CASE key validation enforced on every fetched key.

Functional overview
-------------------
Operators can back a vault's credentials with a HashiCorp Vault KV mount
instead of the local store. Set VAULT_ADDR plus an auth method (VAULT_TOKEN,
or VAULT_ROLE_ID + VAULT_SECRET_ID for AppRole) on the server, then create a
vault with `--credential-store=hashicorp --hashicorp-mount=<mount>
--hashicorp-path=<path> [--hashicorp-kv-version=1|2]`, or via the web
create-vault form. The KV item's key/value pairs become the vault's
credentials, polled read-only at a per-vault cadence and served through the
broker like any other credential. Local mutations are rejected; HashiCorp
Vault is the source of truth. Mirrors the existing Infisical store.

Technical overview
------------------
- internal/hashicorp: new package mirroring internal/infisical — auth-method
  detection (token / AppRole), a vault/api client (KV v1 + v2), and a
  per-vault Syncer that fetches, encrypts with the DEK, and replaces the
  cached credentials. Token auth is validated at startup via lookup-self so a
  bad token fails fast; VAULT_APPROLE_MOUNT overrides the AppRole mount path.
- store: migration widens the vault_credential_stores.kind CHECK to allow
  'hashicorp'; adds the CredentialStoreHashicorp constant.
- server: a second sync worker; the shutdown drain waits on all external-store
  syncers before wiping the DEK.
- handlers: createExternalVault and the manual-sync handler dispatch by kind;
  upstream topology is redacted from non-admin/owner viewers.
- CLI: --credential-store=hashicorp + --hashicorp-* flags.
- web: HashiCorp option + mount/path/KV-version fields in the create form.

Tests: unit (auth/config/sync/stringValue), a hermetic integration test
driving the real vault/api client against a stubbed Vault (KV v1/v2/404), and
an end-to-end test injecting a synced credential through the MITM proxy. Docs,
.env.example, and the CLI/env references updated.
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds HashiCorp Vault as a third read-only external credential store, mirroring the existing Infisical integration. It introduces internal/hashicorp (auth detection, vault/api client, periodic syncer), a SQLite migration to widen the kind CHECK constraint, server wiring for a second sync worker with correct DEK-drain ordering on shutdown, handler dispatch by store kind, CLI flags, and web form fields.

  • internal/hashicorp: New package with AppRole/Token auth detection, FetchSecrets for KV v1+v2, and a Syncer that mirrors the Infisical syncer's tick/in-flight/backoff pattern. Upstream error bodies are scrubbed from API responses; invalid-key names are gated to admin/owner callers.
  • Server/store: Migration 050 rebuilds vault_credential_stores with 'hashicorp' added to the kind CHECK; Start() correctly drains both external-store syncers before wiping the DEK; createHashicorpVault reuses the same owner-gated, probe-then-commit flow as createInfisicalVault.
  • CLI/docs: New --credential-store=hashicorp flags with matching validation and documentation.

Confidence Score: 4/5

Safe to merge with minor follow-up recommended on path validation and the login-goroutine context.

The core feature — auth-method detection, KV fetch, encrypted snapshot storage, syncer lifecycle, DEK-drain ordering, and owner-only gating — is implemented correctly and mirrors the proven Infisical path. The two areas worth attention before the next iteration are: the mount/secret_path fields accept ../ and Vault system-path prefixes without rejection (letting an owner probe beyond the intended KV namespace), and the login goroutine in attachHashicorpIfConfigured runs against context.Background() so it outlives its 10 s deadline. Neither causes a breach or data corruption in the normal path, but both are worth tightening.

internal/hashicorp/config.go (path validation gap) and cmd/server.go (login goroutine context).

Security Review

  • Path-level access not scoped to KV namespace (internal/hashicorp/config.go): VaultConfig.Validate() does not reject ../ traversal sequences or Vault system prefixes (e.g. sys/) in mount or secret_path. An instance owner can aim the broker at arbitrary paths on the configured Vault server. The host is fixed by the operator-set VAULT_ADDR, so this is not a host-level SSRF, but path-scope is broader than the intended KV read surface.
  • No new unauthenticated endpoints; external-store creation is correctly gated to the instance owner. Upstream error bodies are scrubbed before API responses. VAULT_TOKEN/VAULT_ROLE_ID/VAULT_SECRET_ID are read from env only and never logged or returned.

Important Files Changed

Filename Overview
internal/hashicorp/auth.go New file: auth-method detection (AppRole vs Token). Logic is correct; DetectAuthMethod has a vestigial error return that is always nil.
internal/hashicorp/client.go New file: vault/api client wrapper and FetchSecrets. Token lookup-self probe on startup is good. stringValue coercion is thorough.
internal/hashicorp/config.go New file: VaultConfig parsing and validation. Validate() lacks path traversal checks on mount/secret_path; mount trimming is asymmetric with secret_path.
internal/hashicorp/sync.go New file: periodic and manual syncer. In-flight guard, backoff-via-bump, DEK-drain on shutdown all correctly implemented. ErrInvalidKey surfacing to admin/owner only is well-gated.
cmd/server.go Adds attachHashicorpIfConfigured. Login goroutine receives context.Background() so it runs past the 10s deadline; a context-with-timeout should be passed instead.
internal/server/handle_vaults.go Adds createHashicorpVault and dispatches sync/list by kind. Upstream error scrubbing, config/error redaction for non-admin viewers, and owner gate all look correct.
internal/server/server.go Adds hashicorp client/syncer fields and startup wiring. DEK-wipe drain correctly waits on both syncers before zeroing encKey.
internal/store/migrations/050_hashicorp_credential_store.sql SQLite table-rebuild migration to widen the kind CHECK constraint. Correct pattern for SQLite; foreign keys turned off and back on; existing rows preserved.
cmd/vaults.go Adds hashicorp case to vaultCreateCmd and printCredentialStore. Validation matches server-side constraints.

Comments Outside Diff (1)

  1. cmd/server.go, line 78-92 (link)

    P2 Login goroutine not bounded by the 10s deadline

    hashicorp.NewClient receives context.Background(), so the WriteWithContext/LookupSelfWithContext calls inside login() are never cancelled when the time.After(10 * time.Second) case fires. If the Vault server is reachable but slow, the goroutine can block for the full TCP keep-alive or OS connection timeout (up to several minutes), silently holding the vault/api HTTP connection pool open. Because the channel is buffered the goroutine eventually completes and its result is discarded — but the delay and wasted resources are avoidable. Passing a context.WithTimeout(context.Background(), 10*time.Second) into the goroutine instead would cancel the underlying RPC when the outer deadline fires.

Reviews (1): Last reviewed commit: "feat: HashiCorp Vault as a read-only ext..." | Re-trigger Greptile

Comment on lines +31 to +42
func (c VaultConfig) Validate() error {
if strings.TrimSpace(c.Mount) == "" {
return fmt.Errorf("mount is required (e.g. \"secret\")")
}
if strings.TrimSpace(c.SecretPath) == "" {
return fmt.Errorf("secret_path is required")
}
if c.KVVersion != 1 && c.KVVersion != 2 {
return fmt.Errorf("kv_version must be 1 or 2")
}
return nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 security Path traversal not guarded on mount or secret_path

Validate() only checks that the two fields are non-empty. Neither field is checked for ../ sequences, percent-encoded variants, or null bytes. The vault/api SDK forwards them verbatim when constructing /v1/{mount}/data/{path} (KV v2) or /v1/{mount}/{path} (KV v1). An instance owner could set mount = "secret" and secret_path = "../../sys/policy/default" and cause the broker to fetch an unintended Vault path (the Vault server's path normalization might resolve this, but that relies on server-side behaviour rather than the client enforcing scope). Since the VAULT_ADDR is fixed by the operator the host cannot be redirected, but the path within the server is fully owner-controlled. A simple check that neither field contains .. or starts with /sys (or a strict allow-list of path characters) would pin the access scope to KV data.

Context Used: Flag SSRF risks (source)

Comment on lines +70 to +73
}
c.Mount = strings.TrimSpace(c.Mount)
c.SecretPath = strings.TrimPrefix(strings.TrimSpace(c.SecretPath), "/")
return c.withDefaults(), nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 mount trimming is asymmetric with secret_path

ParseConfigJSON strips a leading / from secret_path but applies only TrimSpace (no slash stripping) to mount. A mount value supplied as "secret/" would reach KVv2("secret/") and produce a double-slash in the constructed URL (/v1/secret//data/…). Whether the Vault server accepts this depends on its router, but it is inconsistent. Applying strings.Trim(c.Mount, "/") (matching the treatment of VAULT_APPROLE_MOUNT in auth.go) would make the behaviour predictable.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +51 to +74
func DetectAuthMethod(getenv func(string) string, logger *slog.Logger) (AuthMethod, error) {
var matches []AuthMethod
for _, probe := range authProbes {
complete := true
for _, key := range probe.required {
if getenv(key) == "" {
complete = false
break
}
}
if complete {
matches = append(matches, probe.method)
}
}
if len(matches) == 0 {
return "", nil
}
if len(matches) > 1 && logger != nil {
logger.Warn("multiple HashiCorp Vault auth methods configured; using highest-priority",
slog.String("using", string(matches[0])),
slog.Any("ignoring", matches[1:]))
}
return matches[0], nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 DetectAuthMethod always returns a nil error

The function signature is (AuthMethod, error) but every code path returns nil for the error — there is no case that produces a non-nil error. The calling code in NewClient checks the returned method for "" and returns ErrNoAuthMethod itself. The vestigial error return makes callers pattern-match if method, err := …; err != nil unnecessarily. Either remove the error return and let NewClient own the "no method" sentinel, or have DetectAuthMethod return ErrNoAuthMethod directly when len(matches) == 0. As written the inconsistency can confuse future maintainers who expect errors from this function.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@preetbhinder

Copy link
Copy Markdown
Author

@saifsmailbox98 Would you mind taking a look?

@saifsmailbox98

Copy link
Copy Markdown
Collaborator

@preetbhinder i'll check it in a few days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants