Skip to content

Resource check on event#122

Open
perk11 wants to merge 85 commits intomainfrom
resourse-check-on-event
Open

Resource check on event#122
perk11 wants to merge 85 commits intomainfrom
resourse-check-on-event

Conversation

@perk11
Copy link
Owner

@perk11 perk11 commented Dec 28, 2025

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements event-driven resource checking to prevent the use of outdated resource check results. Instead of relying solely on periodic timer-based checks, the system now triggers resource availability checks immediately when services connect or disconnect.

Key Changes:

  • Replaced polling-based resource waiting with an event-driven channel-based system
  • Added immediate resource check triggering on service lifecycle events (connection/disconnection)
  • Upgraded Go version from 1.22 to 1.23 to support new language features (iter.Seq)

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
monitor_resources_test.go Updated test to remove outdated retry logic and added new test implementation for event-driven resource checking
monitor_resources.go Refactored monitoring to use select statement with timer and pause/resume channel, added broadcast functions for resource change events
management_api.go Updated to dereference resourcesAvailable map directly instead of through pointer
main_test.go Added new test case configuration for validating that outdated resource check results are not used
main.go Extended ResourceManager with new mutexes and channels for event-driven coordination, updated resource reservation to use channel-based waiting instead of sleep polling
go.mod Upgraded Go version to 1.23
.github/workflows/tests.yml Updated CI workflow to use Go 1.23

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


time.Sleep(3500 * time.Millisecond)

serviceOneHealthCheckResponse := getHealthcheckResponse(t, serviceTwoHealthCheckAddress)
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is named "serviceOneHealthCheckResponse" but it's actually fetching the response from serviceTwoHealthCheckAddress. This is confusing and should be renamed to "serviceTwoHealthCheckResponse" for clarity.

Copilot uses AI. Check for mistakes.
main.go Outdated
rm.storeRunningServiceNoLock(name, runningService)
if runningService.activeConnections == 0 {
log.Printf("[%s] All connections closed, sending resourceChange event", name)
rm.broadcastResourceChanges(maps.Keys(serviceConfigByName[name].ResourceRequirements))
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential deadlock: broadcastResourceChanges is called while serviceMutex is held (line 113). This method acquires resourceChangeByResourceMutex, which could lead to lock ordering issues if another goroutine tries to acquire these locks in a different order.

Copilot uses AI. Check for mistakes.
Comment on lines 141 to 148
//time.Sleep(1 * time.Second)
//statusResponse = getStatusFromManagementAPI(t, managementApiAddress)
//assertPortsAreClosed(t, []string{serviceOneHealthCheckAddress, serviceTwoHealthCheckAddress})
//verifyServiceStatus(t, statusResponse, serviceOneName, true, map[string]int{resourceName: 11})
//verifyServiceStatus(t, statusResponse, serviceTwoName, false, map[string]int{resourceName: 0})
//verifyTotalResourcesAvailable(t, statusResponse, map[string]int{resourceName: 11})
//verifyTotalResourceUsage(t, statusResponse, map[string]int{resourceName: 10})

Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented-out code block (lines 141-147) should be removed if it's not needed. Leaving large blocks of commented code reduces code maintainability and can cause confusion about whether it should be used or not.

Suggested change
//time.Sleep(1 * time.Second)
//statusResponse = getStatusFromManagementAPI(t, managementApiAddress)
//assertPortsAreClosed(t, []string{serviceOneHealthCheckAddress, serviceTwoHealthCheckAddress})
//verifyServiceStatus(t, statusResponse, serviceOneName, true, map[string]int{resourceName: 11})
//verifyServiceStatus(t, statusResponse, serviceTwoName, false, map[string]int{resourceName: 0})
//verifyTotalResourcesAvailable(t, statusResponse, map[string]int{resourceName: 11})
//verifyTotalResourceUsage(t, statusResponse, map[string]int{resourceName: 10})

Copilot uses AI. Check for mistakes.
handleNotEnoughResource(requestingService, outputError, true, resource, currentlyAvailableAmount, resourceRequirements[resource])
return &resource
}
//todo: maxwaittime
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO comment without a clear action plan or issue reference. Consider either implementing this or adding a reference to a tracking issue, or removing the comment if it's not needed.

Suggested change
//todo: maxwaittime
// NOTE: no maximum wait time is enforced here; callers are expected to wait until resources become available.

Copilot uses AI. Check for mistakes.
main.go Outdated
runningService.stderrWriter.FinalFlush()
releaseResourcesWhenServiceMutexIsLocked(service.ResourceRequirements)
delete(resourceManager.runningServices, service.Name)
resourceManager.broadcastResourceChanges(maps.Keys(service.ResourceRequirements))
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential deadlock: broadcastResourceChanges is called while serviceMutex is held (line 1458). This is the same issue as line 113 - calling a method that acquires resourceChangeByResourceMutex while holding serviceMutex can lead to lock ordering issues and potential deadlocks.

Copilot uses AI. Check for mistakes.
if !ok {
panic("pauseResumeChan closed unexpectedly, crashing to avoid infinite loop")
}
timer.Stop()
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timer is stopped (line 28) but not drained. After calling Stop(), if the timer has already fired, the channel may still contain a value. This should be drained to avoid a race condition. Consider using a select with default case to drain the channel after stopping.

Suggested change
timer.Stop()
if !timer.Stop() {
select {
case <-timer.C:
default:
}
}

Copilot uses AI. Check for mistakes.
resourceManager.resourceChangeByResourceMutex.Unlock()
}

func UnpauseResourceAvailabilityMonitoring(resourceName string) {
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name is inconsistent with Go naming conventions. Public function names should be PascalCase, but should also follow consistent patterns. Consider renaming to "unpauseResourceAvailabilityMonitoring" (lowercase first letter) if this is meant to be package-private, or ensure it follows a consistent naming pattern with other public functions in the package.

Suggested change
func UnpauseResourceAvailabilityMonitoring(resourceName string) {
func unpauseResourceAvailabilityMonitoring(resourceName string) {

Copilot uses AI. Check for mistakes.
@perk11 perk11 changed the title Resourse check on event Resource check on event Dec 29, 2025
@perk11 perk11 force-pushed the resourse-check-on-event branch from 95af14b to 5549e36 Compare January 4, 2026 07:02
@perk11 perk11 force-pushed the resourse-check-on-event branch from 9e58013 to 96f4f99 Compare January 15, 2026 02:32
@perk11 perk11 requested a review from Copilot January 15, 2026 02:32
@perk11 perk11 force-pushed the resourse-check-on-event branch from 96f4f99 to c494ecf Compare January 15, 2026 02:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

README.md Outdated
- `listen_port`: Port the service listens on
- `is_running`: Whether the service is currently running
- `status`: stopped, waiting_for_resources, starting or running.
- `active_connections`: Number of active connections to the service
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation field 'active_connections' is outdated and no longer exists in the API. According to the changes in management_api.go, this should document 'waiting_connections' and 'proxied_connections' as separate fields instead.

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 100 to 104
func (rm ResourceManager) broadcastResourceChanges(resources iter.Seq[string], recheckNeeded bool) {
for resource := range resources {
rm.broadcastResourceChangeWhenResourceChangeByResourceMutexIsLocked(resource, recheckNeeded)
}
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function broadcastResourceChanges calls broadcastResourceChangeWhenResourceChangeByResourceMutexIsLocked which, based on its name, expects resourceChangeByResourceMutex to already be locked. However, this function does not acquire the mutex before calling it. This function is called from incrementConnection (line 119 in main.go) and cleanUpStoppedServiceWhenServiceMutexIsLocked (line 1527 in main.go), both of which only hold serviceMutex. This will cause race conditions when accessing resourceChangeByResourceChans. Add mutex locking: acquire resourceChangeByResourceMutex at the start of this function and release it after the loop completes.

Copilot uses AI. Check for mistakes.
main.go Outdated
Comment on lines 748 to 751
if interrupted {
return nil, fmt.Errorf("interrupt signal was received")
}

Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate check of the one on lines 745-746. The second check is redundant and can be removed to simplify the code.

Suggested change
if interrupted {
return nil, fmt.Errorf("interrupt signal was received")
}

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

main.go Outdated
}
resourceManager.serviceMutex.Lock()
cleanUpStoppedServiceWhenServiceMutexIsLocked(&serviceConfig, runningService, false)
cleanUpStoppedServiceWhenServiceMutexIsLocked(&serviceConfig, runningService, true)
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The third parameter 'true' is unclear without context. Consider extracting this to a named constant like 'shouldReleaseResources' to improve code readability.

Suggested change
cleanUpStoppedServiceWhenServiceMutexIsLocked(&serviceConfig, runningService, true)
shouldReleaseResources := true
cleanUpStoppedServiceWhenServiceMutexIsLocked(&serviceConfig, runningService, shouldReleaseResources)

Copilot uses AI. Check for mistakes.
handleNotEnoughResource(requestingService, outputError, true, resource, currentlyAvailableAmount, resourceRequirements[resource])
return &resource
}
//todo: maxwaittime
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo comment should be properly formatted with a capital 'T' and proper spacing: "TODO: max wait time".

Suggested change
//todo: maxwaittime
// TODO: max wait time

Copilot uses AI. Check for mistakes.
main.go Outdated
stopService(*findServiceConfigByName(earliestLastUsedService))
continue
}
//TODO: add resource amounts
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO comment is vague. It should specify what resource amounts need to be added and where (e.g., "TODO: add resource amounts to log message").

Suggested change
//TODO: add resource amounts
// TODO: Add requested and currently available amounts for *missingResource to the following log message.

Copilot uses AI. Check for mistakes.
config.go Outdated
Amount int `json:"Amount"`
CheckCommand string `json:"CheckCommand"`
CheckIntervalMilliseconds uint `json:"CheckIntervalMilliseconds"`
CheckIntervalMilliseconds uint `json:"CheckWhenNotEnoughIntervalMilliseconds"`
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct field name 'CheckIntervalMilliseconds' doesn't match the JSON tag 'CheckWhenNotEnoughIntervalMilliseconds'. This will cause incorrect unmarshaling. The field name should be 'CheckWhenNotEnoughIntervalMilliseconds' to match the JSON tag.

Copilot uses AI. Check for mistakes.
verifyTotalResourceUsage(t, statusResponse, map[string]int{resourceName: 0})
verifyServiceStatus(t, statusResponse, serviceOneName, ServiceStateStopped, 0, 0, map[string]int{resourceName: 0})
verifyServiceStatus(t, statusResponse, serviceTwoName, ServiceStateStopped, 0, 0, map[string]int{resourceName: 0})
verifyResourceUsage(t, statusResponse, map[string]int{resourceName: 0}, map[string]int{resourceName: 1}, map[string]int{resourceName: 0}, map[string]int{resourceName: 0})
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple consecutive map[string]int parameters make this call hard to read and error-prone. Consider using a struct with named fields (e.g., ExpectedResourceUsage{Reserved: ..., Free: ..., Used: ..., Total: ...}) to improve clarity.

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +835 to 837
"healthcheck timed out, not starting another healthcheck command: not enough time left for another HealthcheckInterval(%v) in StartupTimeout(%v)",
sleepDuration,
timeout,
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message format string references variable names instead of using formatting placeholders. The message should format the duration values properly instead of printing "%dms" and "%s" literally.

Copilot uses AI. Check for mistakes.
}
}
}
if !enoughOfResource && (!firstCheckNeeded || resourceConfig.CheckCommand == "") {
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the logic at line 1056, the condition checks (!firstCheckNeeded || resourceConfig.CheckCommand == "") which means resources without CheckCommand will immediately return as not enough resource without proper error logging when firstCheckNeeded is true. This seems inconsistent - resources without CheckCommand should be handled the same way regardless of the firstCheckNeeded flag.

Suggested change
if !enoughOfResource && (!firstCheckNeeded || resourceConfig.CheckCommand == "") {
if resourceConfig.CheckCommand == "" {
if !enoughOfResource {
handleNotEnoughResource(requestingService, outputError, false, resource, currentlyAvailableAmount, requiredAmount)
return &resource
}
} else if !firstCheckNeeded && !enoughOfResource {

Copilot uses AI. Check for mistakes.
Comment on lines +1130 to +1140
reservedAmount, ok := resourceManager.resourcesReserved[resource]
if !ok {
log.Printf(
"[%s] ERROR: Resource \"%s\" is missing from the list of the reserved resources. This shouldn't be happening",
requestingService,
resource,
)
reservedAmount = 0
}
enoughOfResource := requiredAmount <= currentlyAvailableAmount-reservedAmount
return currentlyAvailableAmount, enoughOfResource
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential race condition in accessing resourceManager.resourcesReserved at line 1130. This field is accessed without holding serviceMutex, but it's modified in other places while holding serviceMutex. This could lead to inconsistent reads or race detector warnings. The access should be protected by acquiring serviceMutex first.

Copilot uses AI. Check for mistakes.
Comment on lines +1061 to +1062
//This is not optimal since it is blocked until the first resource change, which could take longer than the consequent ones.
//But in a scenario when there are enough resources, we need to wait for them all anyway, and if it's not enough, well, we're still waiting.
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "This is not optimal since it is blocked until the first resource change" describes a known limitation but doesn't explain the impact or why this approach was chosen. Consider expanding this comment to explain the tradeoffs and whether this is acceptable for the current use case.

Suggested change
//This is not optimal since it is blocked until the first resource change, which could take longer than the consequent ones.
//But in a scenario when there are enough resources, we need to wait for them all anyway, and if it's not enough, well, we're still waiting.
// This is not optimal since it is blocked until the first resource change for each
// resource, which could take longer than some of the subsequent changes. We accept
// this coarse‑grained blocking because:
// * this path is only used for resources with a CheckCommand, and the number of
// such resources per service is expected to be small;
// * keeping the logic as a simple "wait for the first change, then re‑evaluate"
// avoids more complex, concurrent coordination and reduces the chance of subtle
// race conditions in the resource manager; and
// * in the common case where enough resources eventually become available, the
// service must wait for all required resources anyway, so the additional
// latency from this blocking approach is acceptable for our current workloads.

Copilot uses AI. Check for mistakes.
}
}
time.Sleep(100 * time.Millisecond) //Give lmp time to catch up
//bug in the actual code here?
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment at line 198 has a potential issue. It states "bug in the actual code here?" which is unclear. This comment should either be clarified with specific details about what bug is suspected, or it should be removed if the bug was already addressed.

Suggested change
//bug in the actual code here?

Copilot uses AI. Check for mistakes.
}

func findFirstMissingResourceWhenServiceMutexIsLocked(resourceRequirements map[string]int, requestingService string, outputError bool) *string {
func findFirstMissingResourceWhenServiceMutexIsLocked(resourceRequirements map[string]int, requestingService string, outputError bool, firstCheckNeeded bool) *string {
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name 'firstCheckNeeded' is confusing and doesn't clearly convey its purpose. Based on the usage, it appears to control whether to wait for the first check command result. A clearer name would be 'shouldWaitForCheckCommand' or 'needsInitialCheckCommand'.

Copilot uses AI. Check for mistakes.
Comment on lines +69 to +70
free = resourceManager.resourcesAvailable[resourceName]
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential race condition when accessing resourceManager.resourcesAvailable at line 69. This field should be protected by resourcesAvailableMutex according to the comment in the struct definition, but here it's accessed while holding serviceMutex. Either acquire resourcesAvailableMutex before reading, or ensure the locking strategy is documented and consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +121
func (rm ResourceManager) broadcastFirstChangeIfMutexIsLocked(resourceName string) {
resourceChangeByResourceChans, ok := rm.checkCommandFirstChangeByResourceChans[resourceName]
if !ok {
return //map is not initialized for resources without CheckCommand, so it being missing is ok
}
sendSignalToChannels(resourceChangeByResourceChans, resourceName, "checkCommandFirstChangeByResourceChans", struct{}{})
}
func (rm ResourceManager) broadcastResourceChangeWhenResourceChangeByResourceMutexIsLocked(resourceName string, recheckNeeded bool) {
serviceChannels, ok := rm.resourceChangeByResourceChans[resourceName]
if !ok {
log.Printf("[Resource Monitor][%s] ERROR: resourceChangeByResourceChans map is not initialized", resourceName)
return
}
sendSignalToChannels(serviceChannels, resourceName, "resourceChangeByResourceChans", recheckNeeded)
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions use a receiver value type instead of a pointer receiver. Since they operate on and modify state in the ResourceManager (via locking and channel operations), they should use pointer receivers (*ResourceManager) for consistency with other methods like storeRunningService and incrementConnection.

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +38
resourceManager.resourceChangeByResourceMutex.Lock()
if len(resourceManager.resourceChangeByResourceChans[resourceName]) > 0 || len(resourceManager.checkCommandFirstChangeByResourceChans[resourceName]) > 0 {
timer.Reset(checkInterval)
}
resourceManager.resourceChangeByResourceMutex.Unlock()
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition checks if there are items in the channels by checking the length, but channels can become empty between the check and the timer reset. This could lead to the timer not being reset when it should be. Consider using a more robust signaling mechanism or ensuring the check and reset are properly synchronized.

Copilot uses AI. Check for mistakes.
Comment on lines +1084 to +1089
resourceManager.resourceChangeByResourceMutex.Lock()
//This might be not necessary if there is no CheckCommand
for _, resourceToCheck := range resourceManager.checkCommandFirstChangeByResourceChans {
delete(resourceToCheck, requestingService)
}
resourceManager.resourceChangeByResourceMutex.Unlock()
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment at line 1085 states "This might be not necessary if there is no CheckCommand" which indicates uncertainty about the code's behavior. Either the cleanup is necessary or it isn't - this should be clarified and the comment updated to explain when and why this cleanup is needed.

Copilot uses AI. Check for mistakes.
@perk11 perk11 requested a review from Copilot February 7, 2026 04:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 619 to +626
runningService.manageMutex.Lock()
resourceManager.serviceMutex.Lock()
_, ok := resourceManager.runningServices[serviceConfig.Name]
if ok {
resourceManager.serviceMutex.Unlock()
log.Printf("[%s] ERROR: Trying to start a service while it is already present in the list of running services", serviceConfig.Name)
return nil, fmt.Errorf("service already started")
}
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early-return path leaks runningService.manageMutex (it’s locked on line 619 and never unlocked if ok is true), which can deadlock future operations on this RunningService. Unlock runningService.manageMutex before returning (and consider creating/locking the mutex only after confirming the service isn’t already present).

Copilot uses AI. Check for mistakes.
Comment on lines 1460 to 1461
runningService.stdoutWriter.FinalFlush()
runningService.stderrWriter.FinalFlush()
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdoutWriter/stderrWriter can be nil (e.g., if runServiceCommand fails and returns nil writers, or if cleanup happens before they’re set). Calling FinalFlush() will panic. Guard these calls with nil checks (or ensure writers are always initialized to non-nil no-op implementations).

Suggested change
runningService.stdoutWriter.FinalFlush()
runningService.stderrWriter.FinalFlush()
if runningService.stdoutWriter != nil {
runningService.stdoutWriter.FinalFlush()
}
if runningService.stderrWriter != nil {
runningService.stderrWriter.FinalFlush()
}

Copilot uses AI. Check for mistakes.
Comment on lines +1015 to 1021
currentlyAvailableAmount,
enoughOfResource =
isEnoughResourceForServiceWithCheckCommandThatRan(
resource,
config.ResourcesAvailable[resource].Amount,
currentlyAvailableAmount,
resourceManager.resourcesReserved[resource],
requiredAmount,
)
} else {
log.Printf(
"[%s] Not enough %s to start. Total: %d, Reserved by running services: %d, Required: %d",
requestingService,
resource,
config.ResourcesAvailable[resource].Amount,
resourceManager.resourcesInUse[resource],
requiredAmount,
)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment is not valid Go syntax as written and will not compile. It should be a single multi-assign statement (e.g., currentlyAvailableAmount, enoughOfResource = ...) with correct formatting/indentation.

Suggested change
currentlyAvailableAmount,
enoughOfResource =
isEnoughResourceForServiceWithCheckCommandThatRan(
resource,
config.ResourcesAvailable[resource].Amount,
currentlyAvailableAmount,
resourceManager.resourcesReserved[resource],
requiredAmount,
)
} else {
log.Printf(
"[%s] Not enough %s to start. Total: %d, Reserved by running services: %d, Required: %d",
requestingService,
resource,
config.ResourcesAvailable[resource].Amount,
resourceManager.resourcesInUse[resource],
requiredAmount,
)
currentlyAvailableAmount, enoughOfResource = isEnoughResourceForServiceWithCheckCommandThatRan(
resource,
requestingService,
requiredAmount,
)

Copilot uses AI. Check for mistakes.
Comment on lines +1029 to +1033
//This is not optimal since it is blocked until the first resource change, which could take longer than the consequent ones.
//But in a scenario when there are enough resources, we need to wait for them all anyway, and if it's not enough, well, we're still waiting.
for resource, changeChan := range firstChangeChanByResource {
<-changeChan
currentlyAvailableAmount, enoughOfResource := isEnoughResourceForServiceWithCheckCommandThatRan(
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blocks indefinitely waiting for changeChan. If the check command fails (execution/parsing error), checkResourceAvailabilityWithKnownCommand returns early and never broadcasts the ‘first change’ signal, so this can deadlock a request forever. Add a timeout/cancellation path (e.g., reuse the existing max-wait timer) and/or broadcast a first-change signal even on check errors so waiters can re-evaluate or fail fast.

Copilot uses AI. Check for mistakes.
Comment on lines +205 to +206
} else if (sortOrder === "proxied_connections") {
// proxied_connections is no longer a valid sort option but may exist in localStorage; this migration code can be removed in the future.
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous UI stored the sort key as active_connections (per the removed <option value=\"active_connections\">). Migrating proxied_connections won’t help users upgrading from the immediate prior version. Update the migration to handle active_connections (and any other previously valid values) so the UI doesn’t fall back incorrectly.

Suggested change
} else if (sortOrder === "proxied_connections") {
// proxied_connections is no longer a valid sort option but may exist in localStorage; this migration code can be removed in the future.
} else if (
sortOrder === "proxied_connections" ||
sortOrder === "active_connections"
) {
// proxied_connections and active_connections are no longer valid sort options but may exist in localStorage; this migration code can be removed in the future.

Copilot uses AI. Check for mistakes.
Comment on lines +461 to +464
assert.Equal(t, expectedReserved[resource], resourceInfo.ReservedByStartingServices, "Resource %s reserved by starting services", resource)
assert.Equal(t, expectedFree[resource], resourceInfo.Free, "Resource %s free", resource)
assert.Equal(t, expectedUsed[resource], resourceInfo.InUse, "Resource %s used", resource)
assert.Equal(t, expectedTotal[resource], resourceInfo.Total, "Resource %s total available", resource)
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexing expectedFree/expectedUsed/expectedTotal by key without checking presence can silently treat missing keys as 0, producing misleading passes/failures if maps are mis-specified but the lengths happen to match. Prefer asserting key existence per resource (e.g., value, ok := expectedFree[resource] + assert.True(t, ok, ...)) before comparing.

Suggested change
assert.Equal(t, expectedReserved[resource], resourceInfo.ReservedByStartingServices, "Resource %s reserved by starting services", resource)
assert.Equal(t, expectedFree[resource], resourceInfo.Free, "Resource %s free", resource)
assert.Equal(t, expectedUsed[resource], resourceInfo.InUse, "Resource %s used", resource)
assert.Equal(t, expectedTotal[resource], resourceInfo.Total, "Resource %s total available", resource)
// Ensure all expected maps contain this resource key before comparison.
reservedExpected, ok := expectedReserved[resource]
assert.True(t, ok, "Expected reserved resources to contain key %s", resource)
freeExpected, ok := expectedFree[resource]
assert.True(t, ok, "Expected free resources to contain key %s", resource)
usedExpected, ok := expectedUsed[resource]
assert.True(t, ok, "Expected used resources to contain key %s", resource)
totalExpected, ok := expectedTotal[resource]
assert.True(t, ok, "Expected total resources to contain key %s", resource)
assert.Equal(t, reservedExpected, resourceInfo.ReservedByStartingServices, "Resource %s reserved by starting services", resource)
assert.Equal(t, freeExpected, resourceInfo.Free, "Resource %s free", resource)
assert.Equal(t, usedExpected, resourceInfo.InUse, "Resource %s used", resource)
assert.Equal(t, totalExpected, resourceInfo.Total, "Resource %s total available", resource)

Copilot uses AI. Check for mistakes.

With this configuration, Qwen and Automatic1111 can run at the same time. Assuming they do, a request for Gemma will unload the one least recently used. If they are currently in use, a request to Gemma will have to wait for one of the other services to free up.
`ResourcesAvailable` can include any resource metrics, CPU cores, multiple VRAM values for multiple GPUs, etc. These values are not checked against actual usage.
`CheckCommand` is a shell command that will ran any time a service requiring this resource needs to be started. If there is not enough resources, it will run every `CheckWhenNotEnoughIntervalMilliseconds` and any time a service using this resource stops until the value returned satisfies the amount of resource required.
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix grammar: 'will ran' should be 'will run'.

Suggested change
`CheckCommand` is a shell command that will ran any time a service requiring this resource needs to be started. If there is not enough resources, it will run every `CheckWhenNotEnoughIntervalMilliseconds` and any time a service using this resource stops until the value returned satisfies the amount of resource required.
`CheckCommand` is a shell command that will run any time a service requiring this resource needs to be started. If there is not enough resources, it will run every `CheckWhenNotEnoughIntervalMilliseconds` and any time a service using this resource stops until the value returned satisfies the amount of resource required.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,2 @@
todos in the code
make tests stable
Copy link

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committing a generic todo.txt at repo root tends to become stale and isn’t actionable (no owners, links, or context). Consider moving these items into tracked issues, or expand this file with concrete references (file/line or issue IDs) so it remains useful.

Suggested change
make tests stable
[OWNER: qa-team] Stabilize flaky integration tests (see ISSUE-123 in tracker for details).

Copilot uses AI. Check for mistakes.
@perk11 perk11 force-pushed the resourse-check-on-event branch from 962d9ef to 8998f5f Compare February 7, 2026 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant