Fix CI Tests by isolating environment#158
Fix CI Tests by isolating environment#158ncurrie3 wants to merge 7 commits intoGoogleCloudPlatform:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances test isolation and reliability by introducing a ForceRuntime configuration in the runtimebroker and ensuring tests use temporary directories for HOME and configuration files. Key changes include updating resolveManagerForOpts to respect the forced runtime and refactoring numerous tests to create isolated .scion environments. Feedback identifies a potential JSON marshaling issue on Windows when injecting file paths into strings and notes that using os.Chdir in test helpers prevents parallel test execution.
| // Isolate from repo .scion by changing CWD to a temp dir containing its own .scion | ||
| origWd, err := os.Getwd() | ||
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| tmpDir := t.TempDir() | ||
| if err := os.Chdir(tmpDir); err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| t.Cleanup(func() { | ||
| _ = os.Chdir(origWd) | ||
| }) |
There was a problem hiding this comment.
Changing the current working directory using os.Chdir is a global side effect that affects the entire process. This prevents tests from being run in parallel (using t.Parallel()) as they would interfere with each other's working directory. While t.Cleanup ensures the directory is restored, it does not solve the concurrency issue. A more robust approach would be to refactor the code under test to accept an explicit base directory for configuration discovery, or to use an environment variable that the configuration logic respects, rather than relying on the process's working directory.
There was a problem hiding this comment.
Good point. However, these tests are not currently running in parallel (they do not use t.Parallel()), so os.Chdir does not cause concurrency issues here. Refactoring the configuration discovery logic to accept an explicit base path would require changing production code in pkg/config, which is out of scope for this PR. We should consider that refactor in a future cleanup task
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a ForceRuntime configuration to the runtime broker to override runtime resolution and adds a check in the service manager to prevent race conditions during restarts. It also updates numerous tests to improve environment isolation by setting the HOME variable and using temporary directories. Review feedback recommends optimizing the ForceRuntime resolution by checking cached runtimes and using consistent logging. Additionally, it was noted that using os.Chdir in test helpers prevents parallel test execution.
| if s.config.ForceRuntime != "" { | ||
| if s.config.ForceRuntime == s.runtime.Name() { | ||
| return s.manager | ||
| } | ||
| slog.Warn("ForceRuntime does not match default runtime, falling back to settings resolution", "force", s.config.ForceRuntime, "default", s.runtime.Name()) | ||
| } |
There was a problem hiding this comment.
The ForceRuntime logic can be optimized by checking the auxiliaryRuntimes cache before falling back to settings resolution. This avoids unnecessary filesystem operations when the forced runtime has already been resolved and cached. Additionally, using the server's agentLifecycleLog (which is a *slog.Logger) instead of the global slog maintains consistency with the logging patterns used elsewhere in the package.
if s.config.ForceRuntime != "" {
if s.config.ForceRuntime == s.runtime.Name() {
return s.manager
}
s.auxiliaryRuntimesMu.RLock()
aux, ok := s.auxiliaryRuntimes[s.config.ForceRuntime]
s.auxiliaryRuntimesMu.RUnlock()
if ok {
return aux.Manager
}
s.agentLifecycleLog.Warn("ForceRuntime does not match default runtime, falling back to settings resolution", "force", s.config.ForceRuntime, "default", s.runtime.Name())
}There was a problem hiding this comment.
Great suggestion. I've updated the code to check the auxiliaryRuntimes cache first to avoid unnecessary settings resolution, and I've switched to using the server's agentLifecycleLog for consistency. Pushed to branch
Fixes #157
Granular Details of Fixes
This PR fixes failures in
make test-fastby ensuring that tests are isolated from the local environment and repository state.Changes made:
t.Setenv("HOME", t.TempDir())to tests inpkg/runtimebrokerandpkg/configto prevent them from reading real user configurations and active agents in~/.scion.runtimebrokerhandlers to avoid discovering the real repository's.sciondirectory.templates/defaultandtemplates/claudein test environments so that agent creation tests do not fail when trying to load templates.GIT_CONFIG_COUNTenvironment variables to allow bare repositories ininit_test.go, resolving Git security complaints.Why "Tests pass" is true
I have verified locally that
make test-fastpasses successfully after these changes. All tests inpkg/runtimebrokerthat were failing due to environment leakage are now passing reliably.Documentation
I have created a detailed issue description artifact describing the problem and solution (see linked issue). No changes were made to user-facing documentation as these are purely internal test fixes.