feat: HashiCorp Vault as a read-only external credential store#256
feat: HashiCorp Vault as a read-only external credential store#256preetbhinder wants to merge 2 commits into
Conversation
Functional overview ------------------- Operators can back a vault's credentials with a HashiCorp Vault KV mount instead of the local store. Set VAULT_ADDR plus an auth method (VAULT_TOKEN, or VAULT_ROLE_ID + VAULT_SECRET_ID for AppRole) on the server, then create a vault with `--credential-store=hashicorp --hashicorp-mount=<mount> --hashicorp-path=<path> [--hashicorp-kv-version=1|2]`, or via the web create-vault form. The KV item's key/value pairs become the vault's credentials, polled read-only at a per-vault cadence and served through the broker like any other credential. Local mutations are rejected; HashiCorp Vault is the source of truth. Mirrors the existing Infisical store. Technical overview ------------------ - internal/hashicorp: new package mirroring internal/infisical — auth-method detection (token / AppRole), a vault/api client (KV v1 + v2), and a per-vault Syncer that fetches, encrypts with the DEK, and replaces the cached credentials. Token auth is validated at startup via lookup-self so a bad token fails fast; VAULT_APPROLE_MOUNT overrides the AppRole mount path. - store: migration widens the vault_credential_stores.kind CHECK to allow 'hashicorp'; adds the CredentialStoreHashicorp constant. - server: a second sync worker; the shutdown drain waits on all external-store syncers before wiping the DEK. - handlers: createExternalVault and the manual-sync handler dispatch by kind; upstream topology is redacted from non-admin/owner viewers. - CLI: --credential-store=hashicorp + --hashicorp-* flags. - web: HashiCorp option + mount/path/KV-version fields in the create form. Tests: unit (auth/config/sync/stringValue), a hermetic integration test driving the real vault/api client against a stubbed Vault (KV v1/v2/404), and an end-to-end test injecting a synced credential through the MITM proxy. Docs, .env.example, and the CLI/env references updated.
|
| Filename | Overview |
|---|---|
| internal/hashicorp/auth.go | New file: auth-method detection (AppRole vs Token). Logic is correct; DetectAuthMethod has a vestigial error return that is always nil. |
| internal/hashicorp/client.go | New file: vault/api client wrapper and FetchSecrets. Token lookup-self probe on startup is good. stringValue coercion is thorough. |
| internal/hashicorp/config.go | New file: VaultConfig parsing and validation. Validate() lacks path traversal checks on mount/secret_path; mount trimming is asymmetric with secret_path. |
| internal/hashicorp/sync.go | New file: periodic and manual syncer. In-flight guard, backoff-via-bump, DEK-drain on shutdown all correctly implemented. ErrInvalidKey surfacing to admin/owner only is well-gated. |
| cmd/server.go | Adds attachHashicorpIfConfigured. Login goroutine receives context.Background() so it runs past the 10s deadline; a context-with-timeout should be passed instead. |
| internal/server/handle_vaults.go | Adds createHashicorpVault and dispatches sync/list by kind. Upstream error scrubbing, config/error redaction for non-admin viewers, and owner gate all look correct. |
| internal/server/server.go | Adds hashicorp client/syncer fields and startup wiring. DEK-wipe drain correctly waits on both syncers before zeroing encKey. |
| internal/store/migrations/050_hashicorp_credential_store.sql | SQLite table-rebuild migration to widen the kind CHECK constraint. Correct pattern for SQLite; foreign keys turned off and back on; existing rows preserved. |
| cmd/vaults.go | Adds hashicorp case to vaultCreateCmd and printCredentialStore. Validation matches server-side constraints. |
Comments Outside Diff (1)
-
cmd/server.go, line 78-92 (link)Login goroutine not bounded by the 10s deadline
hashicorp.NewClientreceivescontext.Background(), so theWriteWithContext/LookupSelfWithContextcalls insidelogin()are never cancelled when thetime.After(10 * time.Second)case fires. If the Vault server is reachable but slow, the goroutine can block for the full TCP keep-alive or OS connection timeout (up to several minutes), silently holding the vault/api HTTP connection pool open. Because the channel is buffered the goroutine eventually completes and its result is discarded — but the delay and wasted resources are avoidable. Passing acontext.WithTimeout(context.Background(), 10*time.Second)into the goroutine instead would cancel the underlying RPC when the outer deadline fires.
Reviews (1): Last reviewed commit: "feat: HashiCorp Vault as a read-only ext..." | Re-trigger Greptile
| func (c VaultConfig) Validate() error { | ||
| if strings.TrimSpace(c.Mount) == "" { | ||
| return fmt.Errorf("mount is required (e.g. \"secret\")") | ||
| } | ||
| if strings.TrimSpace(c.SecretPath) == "" { | ||
| return fmt.Errorf("secret_path is required") | ||
| } | ||
| if c.KVVersion != 1 && c.KVVersion != 2 { | ||
| return fmt.Errorf("kv_version must be 1 or 2") | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Path traversal not guarded on
mount or secret_path
Validate() only checks that the two fields are non-empty. Neither field is checked for ../ sequences, percent-encoded variants, or null bytes. The vault/api SDK forwards them verbatim when constructing /v1/{mount}/data/{path} (KV v2) or /v1/{mount}/{path} (KV v1). An instance owner could set mount = "secret" and secret_path = "../../sys/policy/default" and cause the broker to fetch an unintended Vault path (the Vault server's path normalization might resolve this, but that relies on server-side behaviour rather than the client enforcing scope). Since the VAULT_ADDR is fixed by the operator the host cannot be redirected, but the path within the server is fully owner-controlled. A simple check that neither field contains .. or starts with /sys (or a strict allow-list of path characters) would pin the access scope to KV data.
Context Used: Flag SSRF risks (source)
| } | ||
| c.Mount = strings.TrimSpace(c.Mount) | ||
| c.SecretPath = strings.TrimPrefix(strings.TrimSpace(c.SecretPath), "/") | ||
| return c.withDefaults(), nil |
There was a problem hiding this comment.
mount trimming is asymmetric with secret_path
ParseConfigJSON strips a leading / from secret_path but applies only TrimSpace (no slash stripping) to mount. A mount value supplied as "secret/" would reach KVv2("secret/") and produce a double-slash in the constructed URL (/v1/secret//data/…). Whether the Vault server accepts this depends on its router, but it is inconsistent. Applying strings.Trim(c.Mount, "/") (matching the treatment of VAULT_APPROLE_MOUNT in auth.go) would make the behaviour predictable.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| func DetectAuthMethod(getenv func(string) string, logger *slog.Logger) (AuthMethod, error) { | ||
| var matches []AuthMethod | ||
| for _, probe := range authProbes { | ||
| complete := true | ||
| for _, key := range probe.required { | ||
| if getenv(key) == "" { | ||
| complete = false | ||
| break | ||
| } | ||
| } | ||
| if complete { | ||
| matches = append(matches, probe.method) | ||
| } | ||
| } | ||
| if len(matches) == 0 { | ||
| return "", nil | ||
| } | ||
| if len(matches) > 1 && logger != nil { | ||
| logger.Warn("multiple HashiCorp Vault auth methods configured; using highest-priority", | ||
| slog.String("using", string(matches[0])), | ||
| slog.Any("ignoring", matches[1:])) | ||
| } | ||
| return matches[0], nil | ||
| } |
There was a problem hiding this comment.
DetectAuthMethod always returns a nil error
The function signature is (AuthMethod, error) but every code path returns nil for the error — there is no case that produces a non-nil error. The calling code in NewClient checks the returned method for "" and returns ErrNoAuthMethod itself. The vestigial error return makes callers pattern-match if method, err := …; err != nil unnecessarily. Either remove the error return and let NewClient own the "no method" sentinel, or have DetectAuthMethod return ErrNoAuthMethod directly when len(matches) == 0. As written the inconsistency can confuse future maintainers who expect errors from this function.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
@saifsmailbox98 Would you mind taking a look? |
|
@preetbhinder i'll check it in a few days |
Summary
Adds HashiCorp Vault as a third read-only external credential store for Agent Vault, alongside the built-in store and Infisical. A vault's credentials can now be sourced from a HashiCorp Vault KV mount, synced read-only and brokered to agents like any other credential.
It is a faithful mirror of the existing
internal/infisicalintegration.Functional overview (how an operator uses it)
VAULT_ADDRplus an auth method —VAULT_TOKEN, orVAULT_ROLE_ID+VAULT_SECRET_IDfor AppRole. (StandardVAULT_NAMESPACE/VAULT_CACERT/VAULT_SKIP_VERIFYare honored.)Technical overview
internal/hashicorp(new) — mirrorsinternal/infisical: auth-method detection (token / AppRole), avault/apiclient (KV v1 + v2), and a per-vaultSyncerthat fetches, encrypts with the DEK, and replaces the cached credentials.050widens thevault_credential_stores.kindCHECK to allow'hashicorp'; adds theCredentialStoreHashicorpconstant.createExternalVaultand the manual-sync handler dispatch by kind; upstream topology (mount/path, sync errors, invalid-key names) is redacted from non-admin/owner viewers.--credential-store=hashicorp+--hashicorp-mount/-path/-kv-version.Dependency note —
github.com/hashicorp/vault/apiThis introduces
vault/api v1.23.0(+ ~14 transitive deps). The code uses a thin slice (DefaultConfig,NewClient,SetToken/SetNamespace,Logical().WriteWithContextfor AppRole login,KVv1/KVv2.Get), so a ~150-linenet/httpclient is technically feasible. We chose the official SDK because it: correctly handles the standardVAULT_*env surface operators expect to "just work" (VAULT_CACERT/VAULT_SKIP_VERIFY/VAULT_NAMESPACE); unwraps the KV v2data/metadataenvelope (a footgun to reimplement); and provides retry/backoff + rootcert handling. This is consistent with the precedent (the Infisical store already depends oninfisical/go-sdk).go.mod/go.sumarego mod tidy-clean. Happy to swap to a thin client as a follow-up if maintainers prefer.Type of change
Test plan
make test— fullgo test ./...green, incl.internal/mitme2e)stringValuecoercion,Syncerrefresh/invalid-key/kind-filter), a hermetic integration test driving the realvault/apiclient against anhttpteststub Vault (KV v1, KV v2, not-found — no Docker, runs inmake test), and an end-to-end test injecting a synced credential through the MITM proxy (internal/mitm/forward_hashicorp_test.go).Security checklist
.env.exampleuses empty placeholders; secret values are never logged or returned (only key counts / scrubbed messages).VaultConfig.Validate()(mount/path/kv_version); poll-interval floor enforced server-side and via DB CHECK; CLI + web pre-validate.VAULT_ADDR) are scrubbed to logs;VAULT_SKIP_VERIFYdowngrade is surfaced as an auditable warning; synced values encrypted at rest with the DEK before persistence; UPPER_SNAKE_CASE key validation enforced on every fetched key.