[9.4](backport #50417) filebeat/input/filestream: fix data races in test mocks#50436
Open
mergify[bot] wants to merge 1 commit into9.4from
Open
[9.4](backport #50417) filebeat/input/filestream: fix data races in test mocks#50436mergify[bot] wants to merge 1 commit into9.4from
mergify[bot] wants to merge 1 commit into9.4from
Conversation
* filebeat/input/filestream: fix data races in test mocks
Three filestream tests reported `WARNING: DATA RACE` when run with
`-race`, all caused by test-only mocks whose internal state was read
from the test goroutine and written from the prospector / harvester
goroutines without synchronization:
- `TestProspectorHarvesterUpdateIgnoredFiles` raced on
`mockMetadataUpdater.table` (read in `checkOffset` from
`assert.Eventually`, written in `UpdateMetadata` / `ResetCursor` from
`fileProspector.onFSEvent`).
- `TestFileWatcher` raced on the in-memory log buffer returned by
`logp.NewInMemoryLocal` and on `t.Log` (used by
`logptest.NewTestingLogger`) when the watcher goroutine kept logging
after the subtest had returned.
- `TestFilestreamTruncateBlockedOutput` raced on `mockClient.publishing`
between `waitUntilPublishingHasStarted` (test goroutine) and
`PublishAll` (cursorPublisher goroutine).
Fixes (test-only):
- Add a `sync.RWMutex` to `mockMetadataUpdater` and lock all map
accesses (RWMutex because `assert.Eventually` polling makes the read
paths dominate). Replace the two direct `testStore.table[id]`
accesses in `TestOnRenameFileIdentity` with new `setRaw`/`get`
helpers.
- In the affected `TestFileWatcher` subtests, wrap `fw.Run(ctx)` in a
goroutine that closes a `runDone` channel on exit and wait for it
before reading the log buffer / before the subtest returns.
- Add a `publishingStarted atomic.Bool` to `mockClient` that
`PublishAll` sets for non-empty batches (preserving the previous
`len(c.publishing) > 0` semantics) and that
`waitUntilPublishingHasStarted` reads without holding `mtx`. Holding
`mtx` would deadlock with the blocking ack handler used by
`TestFilestreamTruncateBlockedOutput`, which keeps the lock during
`ackHandler.ACKEvents`.
Also clean up `script/stresstest.sh` so `--tags` and `--race` may
appear in any order before `package_path`, reject duplicate `--tags`,
and document the ordering and a second example in the help text.
A fourth race surfaces only when the integration suite runs as a
whole (e.g. `TestParsersJavaElasticsearchLogs`,
`TestParsersCStyleLog`): `multiline.patternReader.state` is written by
`Close` (called from a goroutine spawned by `ctxtool.WithFunc` when the
input context is cancelled) and read by `Next` from the harvester
read loop. That is a real production race in
`libbeat/reader/multiline/pattern.go` and is not caused by the mocks,
so it is intentionally left for a separate change.
* filebeat/input/filestream: clean up lint warnings in touched test files
The previous commit ("fix data races in test mocks") modified
environment_test.go and prospector_test.go. golangci-lint reported
pre-existing warnings in those files; per the project convention of
fixing existing linter issues in any file you touch, address them now.
- prospector_test.go: replace 11 calls to the global logp.L() with
logp.NewNopLogger(), matching the established pattern already used
in the rest of the file (forbidigo).
- environment_test.go: replace WriteString(fmt.Sprint(...)) with
fmt.Fprint(...) (staticcheck QF1012).
- environment_test.go: rewrite the missing-events scan to iterate over
events directly so there is no `events[i]` indexed access (gosec
G602 false positive eliminated).
- environment_test.go: drop the context.WithCancel + canceler field on
mockClient in favour of an idempotent cancel() method backed by a
done channel and sync.Once. blockingACKer now blocks on the channel
instead of busy-spinning on ctx.Err(), which both removes the gosec
G118 false positive and fixes a CPU-burning loop that pinned a core
per blocked client. input_integration_test.go updated for the
renamed method.
(cherry picked from commit ce8b373)
Contributor
🤖 GitHub commentsJust comment with:
|
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
6 tasks
orestisfl
approved these changes
Apr 30, 2026
Contributor
Author
|
This pull request has not been merged yet. Could you please review and merge it @orestisfl? 🙏 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed commit message
Three filestream tests reported
WARNING: DATA RACEwhen run with-race, all caused by test-only mocks whose internal state was read from the test goroutine and written from the prospector / harvester goroutines without synchronization:TestProspectorHarvesterUpdateIgnoredFilesraced onmockMetadataUpdater.table(read incheckOffsetfromassert.Eventually, written inUpdateMetadata/ResetCursorfromfileProspector.onFSEvent).TestFileWatcherraced on the in-memory log buffer returned bylogp.NewInMemoryLocaland ont.Log(used bylogptest.NewTestingLogger) when the watcher goroutine kept logging after the subtest had returned.TestFilestreamTruncateBlockedOutputraced onmockClient.publishingbetweenwaitUntilPublishingHasStarted(test goroutine) andPublishAll(cursorPublisher goroutine).All three are pre-existing on
upstream/main, not regressions from a specific PR.Fixes (test-only)
sync.RWMutextomockMetadataUpdaterand lock all map accesses (RWMutex becauseassert.Eventuallypolling makes the read paths dominate). Replace the two directtestStore.table[id]accesses inTestOnRenameFileIdentitywith newsetRaw/gethelpers.TestFileWatchersubtests, wrapfw.Run(ctx)in a goroutine that closes arunDonechannel on exit and wait for it before reading the log buffer / before the subtest returns.publishingStarted atomic.BooltomockClientthatPublishAllsets for non-empty batches (preserving the previouslen(c.publishing) > 0semantics) and thatwaitUntilPublishingHasStartedreads without holdingmtx. Holdingmtxwould deadlock with the blocking ack handler used byTestFilestreamTruncateBlockedOutput, which keeps the lock duringackHandler.ACKEvents.Drive-by cleanup (second commit)
While addressing pre-existing lint warnings in the touched files (per project convention), the
mockClientcancel mechanism was reworked: thecontext.WithCancel+canceler context.CancelFuncfield was replaced with an idempotentcancel()method backed by adone chan struct{}andsync.Once.blockingACKernow blocks on the channel instead of busy-spinning onctx.Err() == nil, which kills a CPU-burning loop that pinned a core per blocked client. Other lint fixes:logp.L()→logp.NewNopLogger()inprospector_test.goand minorstaticcheck/goseccleanups.Out of scope
A fourth race surfaces only when the integration suite runs as a whole (e.g.
TestParsersJavaElasticsearchLogs,TestParsersCStyleLog):multiline.patternReader.stateis written byClose(called from a goroutine spawned byctxtool.WithFuncwhen the input context is cancelled) and read byNextfrom the harvester read loop. That is a real production race inlibbeat/reader/multiline/pattern.goand is not caused by the mocks, so it is intentionally left for a separate change.Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesstresstest.shscript to run them under stress conditions and race detector to verify their stability.I have added an entry inTest-only change;./changelog/fragmentsusing the changelog tool.skip-changeloglabel applied.How to test this PR locally
Reproduce each race on
upstream/main, then verify it disappears on this branch.Locally on this branch:
-race.TestProspectorHarvesterUpdateIgnoredFilesand 528 stress runs ofTestFilestreamTruncateBlockedOutputwith 0 failures.multiline.patternReaderrace inTestParsersJavaElasticsearchLogs/TestParsersCStyleLog(pre-existing, called out above).Related issues
Logs
Sample race trace removed by this PR (one of three):
This is an automatic backport of pull request #50417 done by [Mergify](https://mergify.com).