Skip to content

feat(splunk_hec source): support second-stage framing and decoding#25312

Queued
thomasqueirozb wants to merge 37 commits intomasterfrom
splunk-hec-second-stage-decoder-framing
Queued

feat(splunk_hec source): support second-stage framing and decoding#25312
thomasqueirozb wants to merge 37 commits intomasterfrom
splunk-hec-second-stage-decoder-framing

Conversation

@thomasqueirozb
Copy link
Copy Markdown
Contributor

@thomasqueirozb thomasqueirozb commented Apr 27, 2026

Summary

Add optional framing and decoding configuration to the splunk_hec source. When set, the inner payload is decoded after the HEC envelope is parsed, with envelope metadata layered on top so decoder-produced fields win on conflict. Both endpoints supported; legacy behavior preserved when unset.

Vector configuration

JSON second-stage decode on the /event endpoint:

sources:
  hec:
    type: splunk_hec
    address: 0.0.0.0:8088
    valid_tokens: ["test-token"]
    event:
      decoding:
        codec: json

Per-token routing with a VRL decoder and store_hec_token:

sources:
  hec_in:
    type: splunk_hec
    address: 0.0.0.0:8088
    valid_tokens:
      - "token-team-a"
      - "token-team-b"
    store_hec_token: true
    event:
      decoding:
        codec: vrl
        vrl:
          source: |
            token = get_secret!("splunk_hec_token")

            if token == "token-team-a" {
              .team = "team-a"
              .environment = "production"
            } else if token == "token-team-b" {
              .team = "team-b"
              .environment = "staging"
            } else {
              abort
            }

sinks:
  out:
    type: console
    inputs: ["hec_in"]
    encoding:
      codec: json

Running the above config and:

curl -s -X POST http://localhost:8088/services/collector/event \
  -H "Authorization: Splunk token-team-a" \
  -H "Content-Type: application/json" \
  -d '{"event": {"message": "hello from team-a", "level": "info"}}'

curl -s -X POST http://localhost:8088/services/collector/event \
  -H "Authorization: Splunk token-team-b" \
  -H "Content-Type: application/json" \
  -d '{"event": {"message": "hello from team-b", "level": "warn"}}'

Produced correctly tagged output:

{"environment":"production","message":"{\"message\":\"hello from team-a\",\"level\":\"info\"}","source_type":"splunk_hec","team":"team-a","timestamp":"2026-04-29T19:02:35.378032Z"}
{"environment":"staging","message":"{\"message\":\"hello from team-b\",\"level\":\"warn\"}","source_type":"splunk_hec","team":"team-b","timestamp":"2026-04-29T19:02:37.838911Z"}

How did you test this PR?

  • 12 new unit tests covering: string/object/array event decoding, decoder-wins precedence for host/channel/index/source/sourcetype, fallback timestamp on /event and /raw, decoder errors return HTTP 200, partial-decode requests do not return an ackId, InvalidEventNumber reports envelope index (not fan-out event index), and schema definition includes the codec's root.
  • Full splunk_hec test suite (64 tests) green.
  • make fmt, make check-clippy, make check-generated-docs pass.
  • Manual smoke test with JSON codec and curl ... -d '{"event":"{\"foo\":\"bar\"}","host":"client-host"}'.
  • Manual smoke test with store_hec_token: true and a VRL decoder that branches on get_secret!("splunk_hec_token") to tag events by team, confirmed both tokens produce correctly tagged output (see config above).

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

NA

@github-actions github-actions Bot added domain: sources Anything related to the Vector's sources domain: external docs Anything related to Vector's external, public documentation work in progress labels Apr 27, 2026
Comment thread changelog.d/splunk_hec_source_codec.enhancement.md Fixed
@thomasqueirozb
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 530acc0c9f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sources/splunk_hec/mod.rs Outdated
Comment thread src/sources/splunk_hec/mod.rs Outdated
…secrets_template

Allows sources to forward per-request secrets (e.g. authentication tokens) into
the Deserializer pipeline so user-authored programs like VRL decoders can read
them via get_secret!() during decoding.

VrlDeserializer overrides parse_with_secrets to inject the secrets into the
synthetic event before the VRL program executes, making them visible as
%vector.secrets.* at runtime. All other Deserializer implementations use the
default implementation which merges the template onto each emitted event after
parsing, with the codec's own values taking priority.
Comment thread src/sources/splunk_hec/mod.rs
Adds VrlDeserializerOptions.inject_metadata. When true, the source can call
Decoder::with_metadata_template to pre-populate the synthetic event before VRL
executes, making source context readable via % paths (e.g. %exec.host,
%exec.command). VRL-produced values always win over injected values on collision.

The exec source is the first consumer: it injects hostname and command into the
decoder template at build time. Zero overhead when inject_metadata is false.
…ig::inject_metadata_enabled

Move the inject_metadata check to DeserializerConfig::inject_metadata_enabled() so
sources don't need VRL-specific knowledge. In handle_event, use try_insert for the
vector-namespace metadata paths (host, command) when inject_metadata is enabled, so
any value the VRL program wrote to %exec.host or %exec.command survives
post-decode enrichment. No behavior change for Legacy namespace or when
inject_metadata is false.
@github-actions github-actions Bot added the domain: core Anything related to core crates i.e. vector-core, core-common, etc label Apr 29, 2026
… always applies; revert exec source wiring

The inject_metadata: bool config option on VrlDeserializerOptions is removed —
sources decide whether to inject by calling with_metadata_template, no user flag
needed. DeserializerConfig::inject_metadata_enabled replaced by is_vrl(). Exec
source wiring reverted; splunk_hec will be the first consumer.
@thomasqueirozb thomasqueirozb force-pushed the splunk-hec-second-stage-decoder-framing branch from 026f0df to 4777a95 Compare April 29, 2026 17:39
…fore execution

When the second-stage decoder is VRL, build an EventMetadata template from the
current HEC envelope context (host, source, sourcetype, index, channel, token) and
attach it via Decoder::with_metadata_template. VRL programs running as the decoder
can now read %splunk_hec.host, %splunk_hec.channel, get_secret!("splunk_hec_token"),
etc. before the program executes. Post-decode enrichment still applies with
InsertIfEmpty semantics. No-op for non-VRL decoders.
…ace metadata in decoder path

After the VRL decoder runs, post-decode metadata overlay (channel, fields, and
DefaultExtractor calls for host/source/sourcetype/index) now uses try-insert
semantics for the Vector-namespace metadata paths, matching the existing
InsertIfEmpty behavior for Legacy-namespace event fields. This ensures
decoder-produced %splunk_hec.* values survive post-decode enrichment in both
namespaces.
Comment thread changelog.d/splunk_hec_source_codec.enhancement.md Outdated
Comment thread src/sources/splunk_hec/mod.rs
Comment thread src/sources/splunk_hec/mod.rs Outdated
@pront
Copy link
Copy Markdown
Member

pront commented May 1, 2026

Codex Code Review Findings

1. Vector namespace decoded-path metadata writes to wrong keys

The schema declares %splunk_hec.channel, %splunk_hec.index, %splunk_hec.source, %splunk_hec.sourcetype
(src/sources/splunk_hec/mod.rs:3911-3931), and the no-decoder paths correctly write to those (src/sources/splunk_hec/mod.rs:1196, :1760).
But the decoder-mode paths use the CHANNEL / INDEX / SOURCE / SOURCETYPE constants, which are "splunk_channel", "splunk_index",
etc.:

  • src/sources/splunk_hec/mod.rs:1294 writes %splunk_hec.splunk_channel
  • src/sources/splunk_hec/mod.rs:1812 writes %splunk_hec.splunk_channel
  • src/sources/splunk_hec/mod.rs:1672 concatenates metadata_key derived from those legacy paths, producing %splunk_hec.splunk_index /
    splunk_source / splunk_sourcetype

Downstream consumers reading the documented metadata paths will miss values whenever a decoder is configured. Use bare "channel", "index",
"source", "sourcetype" for Vector-namespace metadata; reserve the splunk_*-prefixed constants for legacy event fields.

2. VrlDeserializer::with_metadata_template overwrites entire metadata

At lib/codecs/src/decoding/format/vrl.rs:138:

*event.metadata_mut() = template.clone();

This replaces the freshly-created EventMetadata (including its newly-generated source_event_id) with a clone of the template. Every event
decoded with the same template will share the template's source_event_id. Fix by copying only the template's metadata value tree and secrets
into the event's existing metadata, preserving the per-event source_event_id.

3. Test coverage gaps

No runtime test asserts decoded Vector-namespace metadata paths. Recommend adding:

  • A test with log_namespace: Some(true) and a decoder enabled, asserting the metadata lands at %splunk_hec.channel / index / source / sourcetype
    (covers finding #1).
  • A test that decodes two frames through one VrlDeserializer + template, asserting the resulting source_event_ids differ (covers finding #2).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea6e39173b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sources/splunk_hec/mod.rs Outdated
@drichards-87 drichards-87 self-assigned this May 4, 2026
Copy link
Copy Markdown
Contributor

@drichards-87 drichards-87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some feedback from Docs and approved the PR.

Note: It looks like the descriptions are repeated twice in the file? I only added suggestions to the first instance of a description.

Comment thread website/cue/reference/components/sources/generated/splunk_hec.cue Outdated
Comment thread website/cue/reference/components/sources/generated/splunk_hec.cue Outdated
Comment thread website/cue/reference/components/sources/generated/splunk_hec.cue Outdated
Comment thread website/cue/reference/components/sources/generated/splunk_hec.cue Outdated
Comment thread website/cue/reference/components/sources/generated/splunk_hec.cue
Comment thread website/cue/reference/components/sources/generated/splunk_hec.cue
Comment thread website/cue/reference/components/sources/generated/splunk_hec.cue
Comment thread website/cue/reference/components/sources/generated/splunk_hec.cue
Comment thread website/cue/reference/components/sources/generated/splunk_hec.cue
Comment thread website/cue/reference/components/sources/generated/splunk_hec.cue
@drichards-87 drichards-87 removed their assignment May 4, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5987d6f6c9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sources/splunk_hec/mod.rs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27d87ead8c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sources/splunk_hec/mod.rs Outdated
@thomasqueirozb
Copy link
Copy Markdown
Contributor Author

Thanks for the review @drichards-87. Most of the changed lines in the cue files are actually with regards to decoders, which get inadvertently copy and pasted into every component that adds them unfortunately. I will apply your suggestions in another PR since those changes are not actually in scope here.

Comment thread lib/codecs/src/decoding/decoder.rs Outdated
Comment thread lib/codecs/src/decoding/mod.rs
Comment thread lib/codecs/src/decoding/mod.rs Outdated
Comment thread src/sources/splunk_hec/mod.rs
Comment thread src/sources/splunk_hec/mod.rs Outdated
Comment thread src/sources/splunk_hec/mod.rs
Comment thread src/sources/splunk_hec/mod.rs
Comment thread src/sources/splunk_hec/mod.rs
Comment thread src/sources/splunk_hec/mod.rs
Comment thread src/sources/splunk_hec/mod.rs
@thomasqueirozb thomasqueirozb added this pull request to the merge queue May 6, 2026
Any commits made after this event will not be merged.
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 6, 2026
@thomasqueirozb thomasqueirozb added this pull request to the merge queue May 7, 2026
Any commits made after this event will not be merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: core Anything related to core crates i.e. vector-core, core-common, etc domain: external docs Anything related to Vector's external, public documentation domain: sources Anything related to the Vector's sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants