Conversation
selfupdate.go
Outdated
| log.Print("requesting reboot") | ||
| // TODO: change call from target.Reboot to something like target.ScheduleReboot | ||
| // which can asyncronouly perform the task instead of waiting | ||
| // for all the services to shutdown and then ack the reboot, | ||
| // othewise this causes a deadlock as the selfupdate service won't SIGTERM cleanly | ||
| // until the reboot ack is received, but won't receive the ack until all services shut down, | ||
| // causing a delayed reboot until SIGKILL kicks in. | ||
| if err := target.Reboot(); err != nil { | ||
| return fmt.Errorf("reboot: %v", err) | ||
| } |
There was a problem hiding this comment.
This one will need some thought
There was a problem hiding this comment.
Hmm, why will selfupdate not SIGTERM cleanly in this state? I would have expected it to. SIGINT and SIGTERM are hooked up to context cancellation in selfupdate’s main function.
There was a problem hiding this comment.
The problem here is that the Reboot() is blocking:
target.Reboot()calls- the
/reboothttp endpoint:
https://github.com/gokrazy/gokrazy/blob/8916805aa0c7fad3597670de2afd35c48f10f06c/update.go#L333-L365 that calls - the
killSupervisedServicesAndUmountPerm(signalDelay)func (here) that calls - the
killSupervisedServices()func (here) - the
killSupervisedServices(here) then issues the SIGTERM to all the processes (includingselfupdate), and waits for them to terminate. If they don't terminate cleanly, it issues a SIGKILL to all of them aftersignalDelay(100s), including forselfupdate. - So basically
selfupdateis blocked ontarget.Reboot(), which in turn is blocked on the selfupdate process to exit cleanly, which won't because is waiting ontarget.Reboot().
As such I can think of two options here:
- either we make selfupdate exit cleanly on SIGTERM by adding a ctx to cancel the
target.Reboot()call - or we add a
target.AsyncReboot(), to call in this selfupdate case, that passes anasync=truearg to the/rebootHTTP call, which acks the/rebootHTTP API call as 200 and runs thekillSupervisedServicesAndUmountPermasync in a separate goroutine, which should avoid the waiting
WDYT?
There was a problem hiding this comment.
I think handling target.Reboot getting interrupted by SIGTERM is the better option here, as it seems simpler and does not require increasing the API surface.
There was a problem hiding this comment.
Ok cool, created a PR for that here: gokrazy/updater#2
There was a problem hiding this comment.
Actually, I just noticed that there is another case where the blocking behavior of /reboot gets in the way: when updating gokrazy via Caddy (e.g. for using Tailscale for authentication, see gokrazy/gokrazy#257), we’ll also encounter an error, and a timeout.
Maybe we should give the asynchronous reboot a try. Do we need another parameter for that or could we just change how reboot works? Did you experiment with that?
(We can still introduce a context to gokrazy/updater independently.)
There was a problem hiding this comment.
Yes let's give it a go, see it implemented here:
main.go
Outdated
| ticker := time.NewTicker(frequency) | ||
|
|
||
| for { | ||
| for c := time.Tick(frequency); ; { |
There was a problem hiding this comment.
Changing this so then we get an immediate first check without waiting an extra frequency interval.
Also changing the skipWaiting to only skip the Jitter once, the first time.
These two changes combined still achieve the same initially defined behaviour for skipWaiting, but improve the first check also in cases where skipWaiting is set to false
to be able to cancel it from the caller side. Ref: gokrazy/selfupdate#4 (comment)
to be able to cancel it from the caller side. Ref: gokrazy/selfupdate#4 (comment)
…to be able to cancel operations from the caller side. Ref: gokrazy/selfupdate#4 (comment)
this updates to the latest gokrazy/updater to leverage the context change to allow cancelling the target.Reboot() call and finish performing the update cleanly. Context: #4 (comment)
82b56da to
0254e5b
Compare
0254e5b to
e096dd0
Compare
As part of the update system efforts, we are introducing the ability to specify plugins that allow to push and pull updates from many different sources (e.g. OCI, S3, ...) in addition to the natively supported HTTP file server source.
As such in gokrazy/gokrazy#173 we want to give
selfupdatethe ability to understand payload links that differ from HTTP and may be coming from a GUS update response.To be able to do this, selfupdate must give the user the ability to specify a plugin name (which will be used by
selfupdateto reference the plugin binary) and some credentials that the plugin will use to fetch the update from the source registries (think for example at the credentials necessary to fetch an OCI update payload from an OCI registry or at the AWS credentials necessary to pull a file from AWS S3).This PR aims at implementing this ability for
selfupdate.Note:
With this approach
selfupdatewill be able to load only one plugin at a time in addition to the already natively supported HTTP fetching mechanism.