Skip to content

SLCORE-1555 Run medium tests in parallel#1438

Draft
damien-urruty-sonarsource wants to merge 9 commits into
masterfrom
task/dam/SLCORE-1555-parallel-tests
Draft

SLCORE-1555 Run medium tests in parallel#1438
damien-urruty-sonarsource wants to merge 9 commits into
masterfrom
task/dam/SLCORE-1555-parallel-tests

Conversation

@damien-urruty-sonarsource
Copy link
Copy Markdown
Contributor

@damien-urruty-sonarsource damien-urruty-sonarsource commented Jul 23, 2025

SLCORE-1555


Summary by Gitar

  • Parallelized medium tests:
    • Updated medium-tests/pom.xml to enable test parallelization with forkCount set to 0.25C.
  • Test infrastructure improvements:
    • Migrated WebSocket server implementation in test-utils from Tomcat to Java-WebSocket and added NetworkUtils to manage dynamic port allocation.
    • Updated SonarLintBackendFixture and various medium tests to use URI for WebSocket endpoints and wait for configuration scope registration.
  • Analysis engine stability:
    • Improved AnalysisScheduler to wait for the analysis thread to join before stopping components, ensuring clean shutdowns.
    • Added @PreDestroy methods to FindingsSynchronizationService, WebSocketManager, FileExclusionService, and ConnectionSuggestionProvider for proper resource cleanup.
  • Test reliability:
    • Increased various Awaitility timeout durations across medium tests to reduce flakiness.
    • Configured SmartNotifications poll period for faster test execution.
    • Removed redundant JuliSLF4JDelegatingLog and associated test-utils.

This will update automatically on new commits.

@damien-urruty-sonarsource damien-urruty-sonarsource force-pushed the task/dam/SLCORE-1555-parallel-tests branch 2 times, most recently from af5870d to 96f72cd Compare July 23, 2025 10:10
@sonarqube-next
Copy link
Copy Markdown

Quality Gate failed Quality Gate failed

Failed conditions
3 New issues
81.4% Coverage on New Code (required ≥ 85%)

See analysis details on SonarQube

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE SonarQube for IDE

@damien-urruty-sonarsource damien-urruty-sonarsource force-pushed the task/dam/SLCORE-1555-parallel-tests branch 3 times, most recently from ac777c7 to 343fa95 Compare July 23, 2025 15:07
@sonarqubecloud
Copy link
Copy Markdown

🤖 Pull Request summary

Updates test infrastructure for better parallelization and dynamic port allocation.

Maven configuration: Added surefire plugin with forkCount=1C and reuseForks=true for parallel test execution
Test timeout adjustment: Increased assertion timeout from 1 to 2 seconds in EffectiveRulesMediumTests
WebSocket URI handling: Changed from String to URI type for WebSocket endpoints and switched to dynamic port allocation
New NetworkUtils utility: Added comprehensive port allocation logic with blocked port filtering and collision avoidance
Test cleanup: Removed unused EMBEDDED_SERVER capability imports

Review focus: The NetworkUtils port allocation logic is complex - verify the blocked ports list and collision detection work correctly across parallel test execution.

💬 Please send your feedback

Quality Gate Failed Quality Gate failed

Failed conditions
5 New issues

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@damien-urruty-sonarsource damien-urruty-sonarsource force-pushed the task/dam/SLCORE-1555-parallel-tests branch 2 times, most recently from d829266 to 66f0245 Compare December 24, 2025 14:00
@damien-urruty-sonarsource damien-urruty-sonarsource force-pushed the task/dam/SLCORE-1555-parallel-tests branch from 66f0245 to a6cc8a7 Compare January 2, 2026 08:29
@voz
Copy link
Copy Markdown

voz commented Apr 7, 2026

@damien-urruty-sonarsource can we close it or define path forward if we need to? Thx for helping to keep things tidy.

@damien-urruty-sonarsource damien-urruty-sonarsource force-pushed the task/dam/SLCORE-1555-parallel-tests branch 12 times, most recently from 7aa0417 to e5ac64e Compare May 11, 2026 15:47
@damien-urruty-sonarsource damien-urruty-sonarsource force-pushed the task/dam/SLCORE-1555-parallel-tests branch 2 times, most recently from e5ac64e to fcf750b Compare May 11, 2026 16:03
@damien-urruty-sonarsource damien-urruty-sonarsource force-pushed the task/dam/SLCORE-1555-parallel-tests branch from fcf750b to 434c9c5 Compare May 18, 2026 16:05
@damien-urruty-sonarsource damien-urruty-sonarsource force-pushed the task/dam/SLCORE-1555-parallel-tests branch from 434c9c5 to 88f3f1d Compare May 22, 2026 19:26
@damien-urruty-sonarsource damien-urruty-sonarsource force-pushed the task/dam/SLCORE-1555-parallel-tests branch from 88f3f1d to da95ada Compare May 25, 2026 07:40
@sonarqube-next
Copy link
Copy Markdown

We are canceling all commands, but this just changes the cancelation flag in the monitor. Analysis can still continue to run. We need to wait for proper termination of the analysis queue and thread before stopping components
@damien-urruty-sonarsource damien-urruty-sonarsource force-pushed the task/dam/SLCORE-1555-parallel-tests branch from fcfe06a to 9a4d13d Compare June 2, 2026 18:57
@sonarqube-next
Copy link
Copy Markdown

sonarqube-next Bot commented Jun 2, 2026

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Jun 2, 2026

CI failed: Integration test failures occurred in the `SonarCloudTests$TaintVulnerabilities` suite, likely due to race conditions or resource contention introduced by recent changes to parallel execution and WebSocket infrastructure.

Overview

One integration test suite failure was detected in the build, specifically within its.SonarCloudTests$TaintVulnerabilities. Given the PR scope, the failure is highly likely related to the introduction of parallel test execution or the refactoring of the WebSocket server implementation.

Failures

its.SonarCloudTests$TaintVulnerabilities (confidence: medium)

  • Type: test
  • Affected jobs: 79150086546
  • Related to change: yes
  • Root cause: The test suite encountered an org.apache.commons.exec.ExecuteException: Process exited with an error: 1. This typically indicates that a child process launched during the integration test failed, potentially due to race conditions, thread leakage, or incorrect cleanup of WebSocket resources now that tests are running in parallel.
  • Suggested fix: Inspect the detailed logs within its/tests/target/surefire-reports for the specific failing test case. Verify that the new WebSocketServer implementation and the analysis scheduler properly shut down resources between tests to prevent cross-contamination or resource exhaustion in the parallelized test environment.
Code Review 👍 Approved with suggestions 6 resolved / 7 findings

Enables parallel execution for medium tests by implementing robust WebSocket server management and analysis engine lifecycle improvements. Replace string concatenation with parameterized logging in the analysis queue post method to avoid performance overhead in hot paths.

💡 Quality: Debug logging with string concatenation in hot path

📄 backend/analysis-engine/src/main/java/org/sonarsource/sonarlint/core/analysis/AnalysisScheduler.java:112-113

Lines 112-113 in post() use string concatenation ("Post: " + Thread.currentThread().getName() + ...) which allocates strings even when debug logging is disabled. Consider using parameterized logging or guarding with LOG.isDebugEnabled().

✅ 6 resolved
Bug: NetworkUtils.ALREADY_ALLOCATED is not thread-safe for parallel tests

📄 test-utils/src/main/java/org/sonarsource/sonarlint/core/test/utils/server/NetworkUtils.java:31 📄 test-utils/src/main/java/org/sonarsource/sonarlint/core/test/utils/server/NetworkUtils.java:60 📄 test-utils/src/main/java/org/sonarsource/sonarlint/core/test/utils/server/NetworkUtils.java:68
The ALREADY_ALLOCATED set is a plain HashSet accessed from getNextAvailablePort without any synchronization. Since the purpose of this PR is to run tests in parallel, multiple test threads will concurrently call getNextAvailablePort, leading to race conditions: potential ConcurrentModificationException, lost updates, or duplicate port assignments.

Use a thread-safe set instead.

Bug: WebSocketServer.onError will NPE when conn is null

📄 test-utils/src/main/java/org/sonarsource/sonarlint/core/test/utils/server/websockets/WebSocketServer.java:105-107 📄 test-utils/src/main/java/org/sonarsource/sonarlint/core/test/utils/server/websockets/WebSocketServer.java:95-96 📄 test-utils/src/main/java/org/sonarsource/sonarlint/core/test/utils/server/websockets/WebSocketServer.java:100-101
The org.java_websocket.server.WebSocketServer#onError callback can be invoked with a null conn parameter for server-level errors (e.g., bind failures, selector exceptions). The current implementation unconditionally calls connections.get(conn) which will either throw NPE directly (null key in synchronized map) or return null and then call .setIsError(ex) on it.

Similarly, onClose and onMessage have no null-safety if the connection isn't in the map (though less likely to trigger).

Edge Case: Race in stop(): command taken from queue may never be cancelled

📄 backend/analysis-engine/src/main/java/org/sonarsource/sonarlint/core/analysis/AnalysisScheduler.java:60-74
In AnalysisScheduler, there is a race window between stop() and executeQueuedCommands() where a command can be orphaned (neither executed nor cancelled):

  1. Analysis thread: takeNextCommand() returns a command
  2. stop(): sets termination, calls executingCommand.getAndSet(null) → gets null (command not yet set), interrupts thread, removes all from queue
  3. Analysis thread: executingCommand.set(command), checks termination.get() != null → breaks

The command was removed from the queue (step 1) before removeAll() runs, and executingCommand was read as null before the thread set it. The command is never cancelled, so its CompletableFuture will never complete. Callers waiting on that future during shutdown could hang (until they timeout).

Suggested fix: after the while loop exits (before running the termination action), cancel the command that's still in executingCommand.

Edge Case: WebSocketServer.restart() doesn't clear stale connections

📄 test-utils/src/main/java/org/sonarsource/sonarlint/core/test/utils/server/websockets/WebSocketServer.java:51-56
connectionsBySocket is a field of the outer WebSocketServer class, shared across Impl instances. When restart() is called, a new Impl is created but old (closed) connections remain in connectionsBySocket. This means getConnections() will return stale closed connections after a restart, which could confuse test assertions that check connection count or iterate connections.

Considering this is test utility code, it's minor, but could lead to flaky tests when assertions like webSocketServer.getConnections().size() == 1 pass prematurely due to stale entries.

Edge Case: Race in stop(): dequeued command can escape cancellation

📄 backend/analysis-engine/src/main/java/org/sonarsource/sonarlint/core/analysis/AnalysisScheduler.java:74-76 📄 backend/analysis-engine/src/main/java/org/sonarsource/sonarlint/core/analysis/AnalysisScheduler.java:100-108
Between takeNextCommand() (line 74) and executingCommand.set(command) (line 101 inside shouldStopBeforeExecuting), there is a window where stop() can run and find: (1) executingCommand is null, so it won't cancel the command, and (2) the command is already removed from the queue, so removeAll() won't find it either. The commandDequeuedHook (line 75) widens this window further.

Although the analysisThread.interrupt() will eventually break out of takeNextCommand or the hook, and the while (termination.get() == null) check or shouldStopBeforeExecuting will catch it on the next iteration, the specific command that was dequeued during the race will proceed to shouldStopBeforeExecuting which does handle it correctly (it sees termination != null and cancels). So the command is actually handled — the real concern is the commandDequeuedHook: if the hook blocks (as it does in tests), stop() will call analysisThread.interrupt() which unblocks it, and then shouldStopBeforeExecuting cancels the command. This appears safe in practice due to join() waiting, but the design is fragile and hard to reason about.

...and 1 more resolved from earlier reviews

🤖 Prompt for agents
Code Review: Enables parallel execution for medium tests by implementing robust WebSocket server management and analysis engine lifecycle improvements. Replace string concatenation with parameterized logging in the analysis queue post method to avoid performance overhead in hot paths.

1. 💡 Quality: Debug logging with string concatenation in hot path
   Files: backend/analysis-engine/src/main/java/org/sonarsource/sonarlint/core/analysis/AnalysisScheduler.java:112-113

   Lines 112-113 in `post()` use string concatenation (`"Post: " + Thread.currentThread().getName() + ...`) which allocates strings even when debug logging is disabled. Consider using parameterized logging or guarding with `LOG.isDebugEnabled()`.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants