-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PrometheusBasedResourceLimitChecks does not obey limits when the checked entry is not cached #3659
Comments
Not waiting for the query to complete is intentional because we want to rather accept a few operations that would exceed the limit than blocking (and eventually failing) otherwise appropriate operations. We could introduce a (very small) timeout instead of immediately falling back to accepting the operation but this value would need to be adapted to the concrete setup (and the expected responsiveness of Prometheus) by the admin. IMHO this will require quite some experience and testing to find a suitable value. From our point of view this simply was not worth the effort when we created the code. WDYT? |
The problem is that this occurs now every time the cache entry is not available, by default after every 60 seconds. And after this minute there is again accepted requests over the limit until new cache entry is present. At a very slow pace tenant this limit check will never trigger actually with current defaults. But if this is intentional, would it be ok to add a configuration flag to change the behaviour, and keep this short wait period for waiting that missing cache entry gets created what I started to implement already? I'm not so sure does the timeout value need to be optimally short, some second or two would suffice in my opinion similar value as the whole Prometheus query timeout is, which I would rather rely on as the backoff timeout here too, the Prometheus query triggers timeout and any other errors there might arise sooner. |
For such a client you will most likely not be concerned about exceeding the data limit ;-) I would be ok to add a configuration option for a duration that will be waited for the cache entry to become available instead of immediately falling back to the default value. This option would need to be off by default then and should be documented appropriately, i.e. describing the impact that enabling this option might have (as I described above). Would that work for you? |
I need to admit that after pondering over the current implementation and the rationale for it, I've started to look into this issue from totally different perspective actually. And adding a new enforced shorter timeout value for the cache get isn't necessarily the best option for any of us. But I didn't notice this tiny detail earlier on how Caffeine cache is initialised currently;
And how do you see such an idea that |
I'd rather not change the semantics of the |
Yes this is something that suffices for my needs at least, to harden exceeding especially the transfer limit slightly. I will prepare PR for this. |
…cached item refresh time duration
@harism can this be closed? |
Yes on my behalf at least, I'm happy with the addition of suggested configurable cache timeout divisor which is now implemented already. |
PrometheusBasedResourceLimitChecks
methodisMessageLimitReached
for example does not give any timeout for cache read needed situation on its code line 204 - this causes the byte sized payload limit to reset at the cache invalidate interval and the retrieve request will go to theelse
branch always and let payload pass through even over the limits.Should this line 204
value.isDone()
check be changed with something having a short wait timeout too? Or is this intentional not to wait at all for the cache entry getting available?The text was updated successfully, but these errors were encountered: