You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the basic case, we want to time out or limit the rare bad request so we can maintain a good SLA. However, when problems happen (maybe the database takes 110ms rather than 100ms for all requests because of a DB issue), we don't want to fail 100% of requests and would rather increase our timeout by a bit while requests are slow, and move it back down when things normalize.
Idea
Move all limits from static numbers to (min/max/rate of change). For example, you could have a timeout normally at 100ms, but allow it to increase by 10ms per some unit if requests are slower than 100ms, but not allow requests to ever be slower than 300ms. Then, when things settle down, allow requests to timeout at 100ms again.
Instead of opening on some threshold, it could detect why the circuit is failing (if it is because of too many timeouts or concurrency limits). If it is, it would modify the thread safe config on the circuit https://github.com/cep21/circuit/blob/master/circuit.go#L71 to increase the timeout. On concurrent Success, we can inspect the timeouts and lower the limit if things recover.
Similarly, on ErrConcurrencyLimitReject calls, we could increase the concurrency limits up to a point, and decrease it on Success without ErrInterrupt.
The text was updated successfully, but these errors were encountered:
Problem
In the basic case, we want to time out or limit the rare bad request so we can maintain a good SLA. However, when problems happen (maybe the database takes 110ms rather than 100ms for all requests because of a DB issue), we don't want to fail 100% of requests and would rather increase our timeout by a bit while requests are slow, and move it back down when things normalize.
Idea
Move all limits from static numbers to (min/max/rate of change). For example, you could have a timeout normally at 100ms, but allow it to increase by 10ms per some unit if requests are slower than 100ms, but not allow requests to ever be slower than 300ms. Then, when things settle down, allow requests to timeout at 100ms again.
Solution
Circuit open/close logic is defined inside https://github.com/cep21/circuit/blob/master/closers.go#L9 and they listen to all the events on https://github.com/cep21/circuit/blob/master/metrics.go#L164
The function
ShouldOpen
is called when a circuit decides if it should open: https://github.com/cep21/circuit/blob/master/closers.go#L14Right now, for hystrix, we open directly on error percentage https://github.com/cep21/circuit/blob/master/closers/hystrix/opener.go#L140
Instead of opening on some threshold, it could detect why the circuit is failing (if it is because of too many timeouts or concurrency limits). If it is, it would modify the thread safe config on the circuit https://github.com/cep21/circuit/blob/master/circuit.go#L71 to increase the timeout. On concurrent
Success
, we can inspect the timeouts and lower the limit if things recover.Similarly, on
ErrConcurrencyLimitReject
calls, we could increase the concurrency limits up to a point, and decrease it on Success without ErrInterrupt.The text was updated successfully, but these errors were encountered: