feat(splunk_hec source): support second-stage framing and decoding#25312
feat(splunk_hec source): support second-stage framing and decoding#25312thomasqueirozb wants to merge 37 commits intomasterfrom
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 530acc0c9f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…one endpoint has a decoder
…from decode_payload
…secrets_template Allows sources to forward per-request secrets (e.g. authentication tokens) into the Deserializer pipeline so user-authored programs like VRL decoders can read them via get_secret!() during decoding. VrlDeserializer overrides parse_with_secrets to inject the secrets into the synthetic event before the VRL program executes, making them visible as %vector.secrets.* at runtime. All other Deserializer implementations use the default implementation which merges the template onto each emitted event after parsing, with the codec's own values taking priority.
Adds VrlDeserializerOptions.inject_metadata. When true, the source can call Decoder::with_metadata_template to pre-populate the synthetic event before VRL executes, making source context readable via % paths (e.g. %exec.host, %exec.command). VRL-produced values always win over injected values on collision. The exec source is the first consumer: it injects hostname and command into the decoder template at build time. Zero overhead when inject_metadata is false.
…ig::inject_metadata_enabled Move the inject_metadata check to DeserializerConfig::inject_metadata_enabled() so sources don't need VRL-specific knowledge. In handle_event, use try_insert for the vector-namespace metadata paths (host, command) when inject_metadata is enabled, so any value the VRL program wrote to %exec.host or %exec.command survives post-decode enrichment. No behavior change for Legacy namespace or when inject_metadata is false.
… always applies; revert exec source wiring The inject_metadata: bool config option on VrlDeserializerOptions is removed — sources decide whether to inject by calling with_metadata_template, no user flag needed. DeserializerConfig::inject_metadata_enabled replaced by is_vrl(). Exec source wiring reverted; splunk_hec will be the first consumer.
026f0df to
4777a95
Compare
…fore execution
When the second-stage decoder is VRL, build an EventMetadata template from the
current HEC envelope context (host, source, sourcetype, index, channel, token) and
attach it via Decoder::with_metadata_template. VRL programs running as the decoder
can now read %splunk_hec.host, %splunk_hec.channel, get_secret!("splunk_hec_token"),
etc. before the program executes. Post-decode enrichment still applies with
InsertIfEmpty semantics. No-op for non-VRL decoders.
…ace metadata in decoder path After the VRL decoder runs, post-decode metadata overlay (channel, fields, and DefaultExtractor calls for host/source/sourcetype/index) now uses try-insert semantics for the Vector-namespace metadata paths, matching the existing InsertIfEmpty behavior for Legacy-namespace event fields. This ensures decoder-produced %splunk_hec.* values survive post-decode enrichment in both namespaces.
…h to preserve top-level envelope precedence
Codex Code Review Findings1. Vector namespace decoded-path metadata writes to wrong keysThe schema declares
Downstream consumers reading the documented metadata paths will miss values whenever a decoder is configured. Use bare 2.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea6e39173b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
drichards-87
left a comment
There was a problem hiding this comment.
Left some feedback from Docs and approved the PR.
Note: It looks like the descriptions are repeated twice in the file? I only added suggestions to the first instance of a description.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5987d6f6c9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 27d87ead8c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Thanks for the review @drichards-87. Most of the changed lines in the cue files are actually with regards to decoders, which get inadvertently copy and pasted into every component that adds them unfortunately. I will apply your suggestions in another PR since those changes are not actually in scope here. |
Summary
Add optional
framinganddecodingconfiguration to thesplunk_hecsource. When set, the inner payload is decoded after the HEC envelope is parsed, with envelope metadata layered on top so decoder-produced fields win on conflict. Both endpoints supported; legacy behavior preserved when unset.Vector configuration
JSON second-stage decode on the
/eventendpoint:Per-token routing with a VRL decoder and
store_hec_token:Running the above config and:
Produced correctly tagged output:
{"environment":"production","message":"{\"message\":\"hello from team-a\",\"level\":\"info\"}","source_type":"splunk_hec","team":"team-a","timestamp":"2026-04-29T19:02:35.378032Z"} {"environment":"staging","message":"{\"message\":\"hello from team-b\",\"level\":\"warn\"}","source_type":"splunk_hec","team":"team-b","timestamp":"2026-04-29T19:02:37.838911Z"}How did you test this PR?
host/channel/index/source/sourcetype, fallback timestamp on/eventand/raw, decoder errors return HTTP 200, partial-decode requests do not return anackId,InvalidEventNumberreports envelope index (not fan-out event index), and schema definition includes the codec's root.make fmt,make check-clippy,make check-generated-docspass.curl ... -d '{"event":"{\"foo\":\"bar\"}","host":"client-host"}'.store_hec_token: trueand a VRL decoder that branches onget_secret!("splunk_hec_token")to tag events by team, confirmed both tokens produce correctly tagged output (see config above).Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
NA