fix: Do not consider cancelled tasks as failed#85
Conversation
|
I would user-cancelled tasks not see as failed, because this is a valid action, if someone finds out, that the Diff looks silly or something. No need to poison the entire lock, imho. If we are not merging the change, I would resolve the poisoning after the cancellation manually. |
|
As we discussed, that decision was mainly made so that other tasks which depend on a lock of the cancelled task, are not able to start running during a deployment until manually cleared and verified to run safely. This is mainly meant as a safety precaution which insists on another human verification step with the idea to improve rollout safety. It's supposed to slow down any automation chain failures that might arise on a continued poisonous rollout where one of the tasks in the series had been cancelled due to a problem. Giving the operator time to cancel the rest or decide next steps. After all a cancel usually means something was problematic and has to be investigated. Now to the meat, as you had mentioned that tasks might get stuck in a New state which is a bit of an oversight at the moment imo: Because of how this idea is supposed to work, it would make sense to introduce a way to cancel a "New" task from the Web Interface as well, so that an operator can resolve stuck tasks. That could make the intentions clearer. After all i've already changed that lock badges of poisoned locks are displayed red on a task (not sure if you noticed). The best way to handle this is for the operator to clear the lock poison manually. This forces a decision and initiates thoughts on if the rest of the deployment is supposed to run or not. Finally, Locks should be set as granularity as a deployment allows. If one sites deployment does not depend on the success on another sites deployment, locks can be fit more granularly like If you have any other thoughts on this and think this safety concern isn't as real as I might consider it to be, then do say. |
Kek5chen
left a comment
There was a problem hiding this comment.
TLDR:
I'm in favor of this workflow, and adding a cancel button to tasks in a "New" state:
If we are not merging the change, I would resolve the poisoning after the cancellation manually.
Discussions, please!