Skip to content

[9.4](backport #50417) filebeat/input/filestream: fix data races in test mocks#50436

Open
mergify[bot] wants to merge 1 commit into9.4from
mergify/bp/9.4/pr-50417
Open

[9.4](backport #50417) filebeat/input/filestream: fix data races in test mocks#50436
mergify[bot] wants to merge 1 commit into9.4from
mergify/bp/9.4/pr-50417

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify Bot commented Apr 30, 2026

Proposed commit message

Three filestream tests reported WARNING: DATA RACE when run with -race, all caused by test-only mocks whose internal state was read from the test goroutine and written from the prospector / harvester goroutines without synchronization:

  • TestProspectorHarvesterUpdateIgnoredFiles raced on mockMetadataUpdater.table (read in checkOffset from assert.Eventually, written in UpdateMetadata / ResetCursor from fileProspector.onFSEvent).
  • TestFileWatcher raced on the in-memory log buffer returned by logp.NewInMemoryLocal and on t.Log (used by logptest.NewTestingLogger) when the watcher goroutine kept logging after the subtest had returned.
  • TestFilestreamTruncateBlockedOutput raced on mockClient.publishing between waitUntilPublishingHasStarted (test goroutine) and PublishAll (cursorPublisher goroutine).

All three are pre-existing on upstream/main, not regressions from a specific PR.

Fixes (test-only)

  • Add a sync.RWMutex to mockMetadataUpdater and lock all map accesses (RWMutex because assert.Eventually polling makes the read paths dominate). Replace the two direct testStore.table[id] accesses in TestOnRenameFileIdentity with new setRaw/get helpers.
  • In the affected TestFileWatcher subtests, wrap fw.Run(ctx) in a goroutine that closes a runDone channel on exit and wait for it before reading the log buffer / before the subtest returns.
  • Add a publishingStarted atomic.Bool to mockClient that PublishAll sets for non-empty batches (preserving the previous len(c.publishing) > 0 semantics) and that waitUntilPublishingHasStarted reads without holding mtx. Holding mtx would deadlock with the blocking ack handler used by TestFilestreamTruncateBlockedOutput, which keeps the lock during ackHandler.ACKEvents.

Drive-by cleanup (second commit)

While addressing pre-existing lint warnings in the touched files (per project convention), the mockClient cancel mechanism was reworked: the context.WithCancel + canceler context.CancelFunc field was replaced with an idempotent cancel() method backed by a done chan struct{} and sync.Once. blockingACKer now blocks on the channel instead of busy-spinning on ctx.Err() == nil, which kills a CPU-burning loop that pinned a core per blocked client. Other lint fixes: logp.L()logp.NewNopLogger() in prospector_test.go and minor staticcheck/gosec cleanups.

Out of scope

A fourth race surfaces only when the integration suite runs as a whole (e.g. TestParsersJavaElasticsearchLogs, TestParsersCStyleLog): multiline.patternReader.state is written by Close (called from a goroutine spawned by ctxtool.WithFunc when the input context is cancelled) and read by Next from the harvester read loop. That is a real production race in libbeat/reader/multiline/pattern.go and is not caused by the mocks, so it is intentionally left for a separate change.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works. Where relevant, I have used the stresstest.sh script to run them under stress conditions and race detector to verify their stability.
  • I have added an entry in ./changelog/fragments using the changelog tool. Test-only change; skip-changelog label applied.

How to test this PR locally

Reproduce each race on upstream/main, then verify it disappears on this branch.

cd filebeat

# 1 — TestProspectorHarvesterUpdateIgnoredFiles (unit, 100% reproducer on main)
go test -race -count=1 -timeout=2m \
  -run '^TestProspectorHarvesterUpdateIgnoredFiles$' \
  ./input/filestream/

# 2 — TestFileWatcher (unit, two subtest races: empty-files and duplicate-globs)
go test -race -count=1 -timeout=2m \
  -run '^TestFileWatcher$' \
  ./input/filestream/

# 3 — TestFilestreamTruncateBlockedOutput (integration, in isolation)
go test -race -tags integration -count=1 -timeout=2m \
  -run '^TestFilestreamTruncateBlockedOutput$' \
  ./input/filestream/

# 4 — Full integration suite minus the unit-style tests (the race in
# TestFilestreamTruncateBlockedOutput only surfaces here on main; the
# patternReader race remains and is out of scope, see commit message)
go test -race -tags integration -count=1 -timeout=20m \
  -skip '^TestProspectorHarvesterUpdateIgnoredFiles$|^TestFileWatcher$' \
  ./input/filestream/

# Stress the previously-flaky tests
../script/stresstest.sh --race ./input/filestream \
  '^TestProspectorHarvesterUpdateIgnoredFiles$' -p 16
../script/stresstest.sh --race --tags integration ./input/filestream \
  '^TestFilestreamTruncateBlockedOutput$' -p 16

Locally on this branch:

  • All four target tests pass under -race.
  • 1296 stress runs of TestProspectorHarvesterUpdateIgnoredFiles and 528 stress runs of TestFilestreamTruncateBlockedOutput with 0 failures.
  • The full integration suite still surfaces the unrelated multiline.patternReader race in TestParsersJavaElasticsearchLogs / TestParsersCStyleLog (pre-existing, called out above).

Related issues

Logs

Sample race trace removed by this PR (one of three):

==================
WARNING: DATA RACE
Read at 0x… by goroutine 15:
  filestream.(*mockMetadataUpdater).checkOffset()  prospector_test.go:721
  filestream.TestProspectorHarvesterUpdateIgnoredFiles.func2()  prospector_test.go:441
  testify/assert.Eventually.func1()

Previous write at 0x… by goroutine 13:
  filestream.(*mockMetadataUpdater).UpdateMetadata()  prospector_test.go:746
  filestream.(*fileProspector).onFSEvent()  prospector.go:371
  filestream.(*fileProspector).Run.func2()  prospector.go:341
==================

This is an automatic backport of pull request #50417 done by [Mergify](https://mergify.com).

* filebeat/input/filestream: fix data races in test mocks

Three filestream tests reported `WARNING: DATA RACE` when run with
`-race`, all caused by test-only mocks whose internal state was read
from the test goroutine and written from the prospector / harvester
goroutines without synchronization:

- `TestProspectorHarvesterUpdateIgnoredFiles` raced on
  `mockMetadataUpdater.table` (read in `checkOffset` from
  `assert.Eventually`, written in `UpdateMetadata` / `ResetCursor` from
  `fileProspector.onFSEvent`).
- `TestFileWatcher` raced on the in-memory log buffer returned by
  `logp.NewInMemoryLocal` and on `t.Log` (used by
  `logptest.NewTestingLogger`) when the watcher goroutine kept logging
  after the subtest had returned.
- `TestFilestreamTruncateBlockedOutput` raced on `mockClient.publishing`
  between `waitUntilPublishingHasStarted` (test goroutine) and
  `PublishAll` (cursorPublisher goroutine).

Fixes (test-only):

- Add a `sync.RWMutex` to `mockMetadataUpdater` and lock all map
  accesses (RWMutex because `assert.Eventually` polling makes the read
  paths dominate). Replace the two direct `testStore.table[id]`
  accesses in `TestOnRenameFileIdentity` with new `setRaw`/`get`
  helpers.
- In the affected `TestFileWatcher` subtests, wrap `fw.Run(ctx)` in a
  goroutine that closes a `runDone` channel on exit and wait for it
  before reading the log buffer / before the subtest returns.
- Add a `publishingStarted atomic.Bool` to `mockClient` that
  `PublishAll` sets for non-empty batches (preserving the previous
  `len(c.publishing) > 0` semantics) and that
  `waitUntilPublishingHasStarted` reads without holding `mtx`. Holding
  `mtx` would deadlock with the blocking ack handler used by
  `TestFilestreamTruncateBlockedOutput`, which keeps the lock during
  `ackHandler.ACKEvents`.

Also clean up `script/stresstest.sh` so `--tags` and `--race` may
appear in any order before `package_path`, reject duplicate `--tags`,
and document the ordering and a second example in the help text.

A fourth race surfaces only when the integration suite runs as a
whole (e.g. `TestParsersJavaElasticsearchLogs`,
`TestParsersCStyleLog`): `multiline.patternReader.state` is written by
`Close` (called from a goroutine spawned by `ctxtool.WithFunc` when the
input context is cancelled) and read by `Next` from the harvester
read loop. That is a real production race in
`libbeat/reader/multiline/pattern.go` and is not caused by the mocks,
so it is intentionally left for a separate change.

* filebeat/input/filestream: clean up lint warnings in touched test files

The previous commit ("fix data races in test mocks") modified
environment_test.go and prospector_test.go. golangci-lint reported
pre-existing warnings in those files; per the project convention of
fixing existing linter issues in any file you touch, address them now.

- prospector_test.go: replace 11 calls to the global logp.L() with
  logp.NewNopLogger(), matching the established pattern already used
  in the rest of the file (forbidigo).
- environment_test.go: replace WriteString(fmt.Sprint(...)) with
  fmt.Fprint(...) (staticcheck QF1012).
- environment_test.go: rewrite the missing-events scan to iterate over
  events directly so there is no `events[i]` indexed access (gosec
  G602 false positive eliminated).
- environment_test.go: drop the context.WithCancel + canceler field on
  mockClient in favour of an idempotent cancel() method backed by a
  done channel and sync.Once. blockingACKer now blocks on the channel
  instead of busy-spinning on ctx.Err(), which both removes the gosec
  G118 false positive and fixes a CPU-burning loop that pinned a core
  per blocked client. input_integration_test.go updated for the
  renamed method.

(cherry picked from commit ce8b373)
@mergify mergify Bot requested a review from a team as a code owner April 30, 2026 12:58
@mergify mergify Bot added the backport label Apr 30, 2026
@mergify mergify Bot requested review from khushijain21 and leehinman and removed request for a team April 30, 2026 12:58
@botelastic botelastic Bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 30, 2026
@github-actions github-actions Bot added Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team skip-changelog labels Apr 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • /test : Run the Buildkite pipeline.

@infra-vault-gh-plugin-prod
Copy link
Copy Markdown

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic Bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 30, 2026
@mergify
Copy link
Copy Markdown
Contributor Author

mergify Bot commented May 4, 2026

This pull request has not been merged yet. Could you please review and merge it @orestisfl? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant