Browse Source
* Add Check.last_start_rid field * Fill Check.last_start_rid on every start event * Clear Check.last_start on every "fail" event * Clear Check.last_start on success event if either case is true: - the event's rid matches Check.last_start_rid - the event does not specify rid In human terms, the alerting logic will be: we track the execution time of the most recent "start" event only. It would take a major redesign to track the execution time of all concurrent "start" events and send alerts when *any* of them overshoots the time budget. So, whenever we see a "start" event, the timer resets. Example: * 00:00 client sends start signal with rid=A, timer starts * 00:10 client sends start signal with rid=B, timer resets * 00:20 client sends success signal with rid=A, timer does not reset because rid A does not match the rid seen in the most recent start signal (it was B) * 00:30 the grace time runs out, the check's status shows as started + failed At this point the check can be reset to a healthy state in 3 different ways: * send a success signal with rid=B * send a failure signal with any rid value or without it * send a success signal without a rid valuepull/733/head
Pēteris Caune
1 year ago
7 changed files with 112 additions and 12 deletions
Loading…
Reference in new issue