feat(splunk_hec source): support second-stage framing and decoding by thomasqueirozb · Pull Request #25312 · vectordotdev/vector

thomasqueirozb · 2026-04-27T20:15:56Z

Summary

Add optional framing and decoding configuration to the splunk_hec source. When set, the inner payload is decoded after the HEC envelope is parsed, with envelope metadata layered on top so decoder-produced fields win on conflict. Both endpoints supported; legacy behavior preserved when unset.

Vector configuration

JSON second-stage decode on the /event endpoint:

sources:
  hec:
    type: splunk_hec
    address: 0.0.0.0:8088
    valid_tokens: ["test-token"]
    event:
      decoding:
        codec: json

Per-token routing with a VRL decoder and store_hec_token:

sources:
  hec_in:
    type: splunk_hec
    address: 0.0.0.0:8088
    valid_tokens:
      - "token-team-a"
      - "token-team-b"
    store_hec_token: true
    event:
      decoding:
        codec: vrl
        vrl:
          source: |
            token = get_secret!("splunk_hec_token")

            if token == "token-team-a" {
              .team = "team-a"
              .environment = "production"
            } else if token == "token-team-b" {
              .team = "team-b"
              .environment = "staging"
            } else {
              abort
            }

sinks:
  out:
    type: console
    inputs: ["hec_in"]
    encoding:
      codec: json

Running the above config and:

curl -s -X POST http://localhost:8088/services/collector/event \
  -H "Authorization: Splunk token-team-a" \
  -H "Content-Type: application/json" \
  -d '{"event": {"message": "hello from team-a", "level": "info"}}'

curl -s -X POST http://localhost:8088/services/collector/event \
  -H "Authorization: Splunk token-team-b" \
  -H "Content-Type: application/json" \
  -d '{"event": {"message": "hello from team-b", "level": "warn"}}'

Produced correctly tagged output:

{"environment":"production","message":"{\"message\":\"hello from team-a\",\"level\":\"info\"}","source_type":"splunk_hec","team":"team-a","timestamp":"2026-04-29T19:02:35.378032Z"}
{"environment":"staging","message":"{\"message\":\"hello from team-b\",\"level\":\"warn\"}","source_type":"splunk_hec","team":"team-b","timestamp":"2026-04-29T19:02:37.838911Z"}

How did you test this PR?

12 new unit tests covering: string/object/array event decoding, decoder-wins precedence for host/channel/index/source/sourcetype, fallback timestamp on /event and /raw, decoder errors return HTTP 200, partial-decode requests do not return an ackId, InvalidEventNumber reports envelope index (not fan-out event index), and schema definition includes the codec's root.
Full splunk_hec test suite (64 tests) green.
make fmt, make check-clippy, make check-generated-docs pass.
Manual smoke test with JSON codec and curl ... -d '{"event":"{\"foo\":\"bar\"}","host":"client-host"}'.
Manual smoke test with store_hec_token: true and a VRL decoder that branches on get_secret!("splunk_hec_token") to tag events by team, confirmed both tokens produce correctly tagged output (see config above).

Change Type

Is this a breaking change?

Yes
No

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

References

NA

thomasqueirozb · 2026-04-28T14:58:36Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 530acc0c9f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…nt/raw blocks

…one endpoint has a decoder

…from decode_payload

…secrets_template Allows sources to forward per-request secrets (e.g. authentication tokens) into the Deserializer pipeline so user-authored programs like VRL decoders can read them via get_secret!() during decoding. VrlDeserializer overrides parse_with_secrets to inject the secrets into the synthetic event before the VRL program executes, making them visible as %vector.secrets.* at runtime. All other Deserializer implementations use the default implementation which merges the template onto each emitted event after parsing, with the codec's own values taking priority.

…trait changes

Adds VrlDeserializerOptions.inject_metadata. When true, the source can call Decoder::with_metadata_template to pre-populate the synthetic event before VRL executes, making source context readable via % paths (e.g. %exec.host, %exec.command). VRL-produced values always win over injected values on collision. The exec source is the first consumer: it injects hostname and command into the decoder template at build time. Zero overhead when inject_metadata is false.

…ig::inject_metadata_enabled Move the inject_metadata check to DeserializerConfig::inject_metadata_enabled() so sources don't need VRL-specific knowledge. In handle_event, use try_insert for the vector-namespace metadata paths (host, command) when inject_metadata is enabled, so any value the VRL program wrote to %exec.host or %exec.command survives post-decode enrichment. No behavior change for Legacy namespace or when inject_metadata is false.

…tadata-template

… always applies; revert exec source wiring The inject_metadata: bool config option on VrlDeserializerOptions is removed — sources decide whether to inject by calling with_metadata_template, no user flag needed. DeserializerConfig::inject_metadata_enabled replaced by is_vrl(). Exec source wiring reverted; splunk_hec will be the first consumer.

…fore execution When the second-stage decoder is VRL, build an EventMetadata template from the current HEC envelope context (host, source, sourcetype, index, channel, token) and attach it via Decoder::with_metadata_template. VRL programs running as the decoder can now read %splunk_hec.host, %splunk_hec.channel, get_secret!("splunk_hec_token"), etc. before the program executes. Post-decode enrichment still applies with InsertIfEmpty semantics. No-op for non-VRL decoders.

…ace metadata in decoder path After the VRL decoder runs, post-decode metadata overlay (channel, fields, and DefaultExtractor calls for host/source/sourcetype/index) now uses try-insert semantics for the Vector-namespace metadata paths, matching the existing InsertIfEmpty behavior for Legacy-namespace event fields. This ensures decoder-produced %splunk_hec.* values survive post-decode enrichment in both namespaces.

…sertIfEmpty

…h to preserve top-level envelope precedence

pront · 2026-05-01T19:15:17Z

Codex Code Review Findings

1. Vector namespace decoded-path metadata writes to wrong keys

The schema declares %splunk_hec.channel, %splunk_hec.index, %splunk_hec.source, %splunk_hec.sourcetype
(src/sources/splunk_hec/mod.rs:3911-3931), and the no-decoder paths correctly write to those (src/sources/splunk_hec/mod.rs:1196, :1760).
But the decoder-mode paths use the CHANNEL / INDEX / SOURCE / SOURCETYPE constants, which are "splunk_channel", "splunk_index",
etc.:

src/sources/splunk_hec/mod.rs:1294 writes %splunk_hec.splunk_channel
src/sources/splunk_hec/mod.rs:1812 writes %splunk_hec.splunk_channel
src/sources/splunk_hec/mod.rs:1672 concatenates metadata_key derived from those legacy paths, producing %splunk_hec.splunk_index /
splunk_source / splunk_sourcetype

Downstream consumers reading the documented metadata paths will miss values whenever a decoder is configured. Use bare "channel", "index",
"source", "sourcetype" for Vector-namespace metadata; reserve the splunk_*-prefixed constants for legacy event fields.

2. `VrlDeserializer::with_metadata_template` overwrites entire metadata

At lib/codecs/src/decoding/format/vrl.rs:138:

*event.metadata_mut() = template.clone();

This replaces the freshly-created EventMetadata (including its newly-generated source_event_id) with a clone of the template. Every event
decoded with the same template will share the template's source_event_id. Fix by copying only the template's metadata value tree and secrets
into the event's existing metadata, preserving the per-event source_event_id.

3. Test coverage gaps

No runtime test asserts decoded Vector-namespace metadata paths. Recommend adding:

A test with log_namespace: Some(true) and a decoder enabled, asserting the metadata lands at %splunk_hec.channel / index / source / sourcetype
(covers finding #1).
A test that decodes two frames through one VrlDeserializer + template, asserting the resulting source_event_ids differ (covers finding #2).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea6e39173b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

drichards-87

Left some feedback from Docs and approved the PR.

Note: It looks like the descriptions are repeated twice in the file? I only added suggestions to the first instance of a description.

… JSON bytes in decoder path

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5987d6f6c9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27d87ead8c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

thomasqueirozb · 2026-05-04T21:43:37Z

Thanks for the review @drichards-87. Most of the changed lines in the cue files are actually with regards to decoders, which get inadvertently copy and pasted into every component that adds them unfortunately. I will apply your suggestions in another PR since those changes are not actually in scope here.

…dentation

feat(splunk_hec source): support second-stage framing and decoding

805e0d4

github-actions Bot added domain: sources Anything related to the Vector's sources domain: external docs Anything related to Vector's external, public documentation work in progress labels Apr 27, 2026

github-advanced-security AI found potential problems Apr 27, 2026

View reviewed changes

Comment thread changelog.d/splunk_hec_source_codec.enhancement.md Fixed

thomasqueirozb added 3 commits April 27, 2026 17:42

preserve original send_event/ack ordering when no decoder is set

4034cd0

Add authors to changelog

ad824b1

Format

530acc0

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

Comment thread src/sources/splunk_hec/mod.rs Outdated

20agbekodo reviewed Apr 28, 2026

View reviewed changes

Comment thread src/sources/splunk_hec/mod.rs Outdated

thomasqueirozb added 4 commits April 28, 2026 15:17

refactor(splunk_hec source): split codec config into per-endpoint eve…

c6e4974

…nt/raw blocks

fix(splunk_hec source): include legacy log shape in schema when only …

e86b093

…one endpoint has a decoder

address review: schema InsertIfEmpty parity + apply splunk_hec_token …

5454d2d

…from decode_payload

thomasqueirozb mentioned this pull request Apr 28, 2026

feat(codecs): Add inject_metadata to VRL decoder + exec source support #25322

Closed

9 tasks

20agbekodo reviewed Apr 29, 2026

View reviewed changes

Comment thread src/sources/splunk_hec/mod.rs

thomasqueirozb added 6 commits April 29, 2026 10:44

refactor: move metadata_template to VrlDeserializer only; no Decoder/…

a12389d

…trait changes

fmt

3f04870

Add authors line to changelog

bba7ae5

merge: bring in VRL decoder inject_metadata mechanism from decoder-me…

bdc4914

…tadata-template

github-actions Bot added the domain: core Anything related to core crates i.e. vector-core, core-common, etc label Apr 29, 2026

thomasqueirozb force-pushed the splunk-hec-second-stage-decoder-framing branch from 026f0df to 4777a95 Compare April 29, 2026 17:39

thomasqueirozb added 4 commits April 29, 2026 13:54

fmt

55dda8d

refactor: replace get-then-insert with log.try_insert for metadata In…

113fc11

…sertIfEmpty

pront reviewed May 1, 2026

View reviewed changes

Comment thread changelog.d/splunk_hec_source_codec.enhancement.md Outdated

fix(splunk_hec source): apply extractors before fields in decoder pat…

e1f25c9

…h to preserve top-level envelope precedence

pront approved these changes May 1, 2026

View reviewed changes

Comment thread src/sources/splunk_hec/mod.rs

Comment thread src/sources/splunk_hec/mod.rs Outdated

github-actions Bot removed the work in progress label May 1, 2026

thomasqueirozb added 2 commits May 1, 2026 16:47

Avoid roundtrip json parsing during decoding

095a52b

Fix stale documentation

ea6e391

chatgpt-codex-connector Bot reviewed May 1, 2026

View reviewed changes

Comment thread src/sources/splunk_hec/mod.rs Outdated

drichards-87 self-assigned this May 4, 2026

drichards-87 approved these changes May 4, 2026

View reviewed changes

drichards-87 removed their assignment May 4, 2026

thomasqueirozb added 3 commits May 4, 2026 16:23

Fix clippy by building DecodePayloadContext

657a791

refactor(splunk_hec source): extract validate_event_field and use raw…

17bfef1

… JSON bytes in decoder path

Update documentation

5987d6f

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread src/sources/splunk_hec/mod.rs

Address docs review

27d87ea

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

Comment thread src/sources/splunk_hec/mod.rs Outdated

fix(splunk_hec source): pass string events as raw bytes to decoder

7bd36cb

pront reviewed May 5, 2026

View reviewed changes

thomasqueirozb added 3 commits May 5, 2026 11:45

refactor(splunk_hec source): extract register_ack helper to reduce in…

50460ce

…dentation

EndpointCodecConfig -> CodecConfig

9c4e21f

simplify docs

8cd9c59

thomasqueirozb mentioned this pull request May 6, 2026

docs(codecs): fix wording in decoder and framing doc strings #25382

Open

9 tasks

thomasqueirozb enabled auto-merge May 6, 2026 19:10

Remove VrlDeserializer doc comment from with_metadata_template

0bcb9f1

pront approved these changes May 6, 2026

View reviewed changes

thomasqueirozb added this pull request to the merge queue May 6, 2026

Any commits made after this event will not be merged.

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 6, 2026

thomasqueirozb added this pull request to the merge queue May 7, 2026

Any commits made after this event will not be merged.

Conversation

thomasqueirozb commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Vector configuration

How did you test this PR?

Change Type

Is this a breaking change?

Does this PR include user facing changes?

References

Uh oh!

Uh oh!

thomasqueirozb commented Apr 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pront commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codex Code Review Findings

1. Vector namespace decoded-path metadata writes to wrong keys

2. VrlDeserializer::with_metadata_template overwrites entire metadata

3. Test coverage gaps

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

drichards-87 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

thomasqueirozb commented May 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

thomasqueirozb commented Apr 27, 2026 •

edited

Loading

pront commented May 1, 2026 •

edited

Loading

2. `VrlDeserializer::with_metadata_template` overwrites entire metadata