quicker unhappy jobs

At present, an unhappy job finishes after a timeout of the hanging job timer.
The rationale is to allow sufficient time for a critical event to arrive, which triggers an alarm condition.  This makes sense, but has the below costs.

There are a few problems with this:
1. It reuses a timer that is intended for other purposes.
2. It is not configurable separately from the hanging job.
3. It it a long timer which means that unhappy jobs are held in memory for a long time increasing the number of concurrent jobs such that it could be a memory risk.  At 50 jobs per second and a 30 second hanging job timer, this could expand to 1500 jobs waiting to end.  This impacts our max jobs per worker setting.

It might be good to add a configuration value for this timer.
Another option is to use the intra-event timer which can be very short.

A thought would be to allow the unhappy job to finish quickly, but detect critical events in the Job Gone Horribly Wrong state, which is entered if a "stray event" from a previous job arrives.

 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quicker unhappy jobs #233

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

quicker unhappy jobs #233

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions