Skip to content

fix: Do not consider cancelled tasks as failed#85

Open
johannwagner wants to merge 1 commit intomainfrom
fix/cancel-is-not-failed
Open

fix: Do not consider cancelled tasks as failed#85
johannwagner wants to merge 1 commit intomainfrom
fix/cancel-is-not-failed

Conversation

@johannwagner
Copy link
Copy Markdown
Contributor

Discussions, please!

@johannwagner
Copy link
Copy Markdown
Contributor Author

I would user-cancelled tasks not see as failed, because this is a valid action, if someone finds out, that the Diff looks silly or something. No need to poison the entire lock, imho.

If we are not merging the change, I would resolve the poisoning after the cancellation manually.

@Kek5chen
Copy link
Copy Markdown
Contributor

As we discussed, that decision was mainly made so that other tasks which depend on a lock of the cancelled task, are not able to start running during a deployment until manually cleared and verified to run safely.

This is mainly meant as a safety precaution which insists on another human verification step with the idea to improve rollout safety. It's supposed to slow down any automation chain failures that might arise on a continued poisonous rollout where one of the tasks in the series had been cancelled due to a problem. Giving the operator time to cancel the rest or decide next steps. After all a cancel usually means something was problematic and has to be investigated.

Now to the meat, as you had mentioned that tasks might get stuck in a New state which is a bit of an oversight at the moment imo:

Because of how this idea is supposed to work, it would make sense to introduce a way to cancel a "New" task from the Web Interface as well, so that an operator can resolve stuck tasks. That could make the intentions clearer. After all i've already changed that lock badges of poisoned locks are displayed red on a task (not sure if you noticed).

The best way to handle this is for the operator to clear the lock poison manually. This forces a decision and initiates thoughts on if the rest of the deployment is supposed to run or not.

Finally, Locks should be set as granularity as a deployment allows. If one sites deployment does not depend on the success on another sites deployment, locks can be fit more granularly like WRITE "site/a200".

If you have any other thoughts on this and think this safety concern isn't as real as I might consider it to be, then do say.

Copy link
Copy Markdown
Contributor

@Kek5chen Kek5chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR:

I'm in favor of this workflow, and adding a cancel button to tasks in a "New" state:

If we are not merging the change, I would resolve the poisoning after the cancellation manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants