-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 [BUG] - use built-in metric to create Latency SLO On Dynatrace #342
Comments
Hi @GeoffroyLatourDK, thanks for reporting this behavior. Have you actually tried using these built-in metrics? If so, could you share the output and error message(s). Ideally I'd like to reproduce the issue. |
Hello @lvaylet, In fact, it's more the number of events that doesn't seem to correspond. On the screenshot below, I'm using the builtin:service.errors.client.successCount metric, which allows me to calculate the number of successful calls. There's a big difference between the 6k83 calls on one side and the 60 calls on the other. |
Can you share your SLO definition, either as YAML or JSON? |
Hello @lvaylet of course !
|
Thanks @GeoffroyLatourDK. Then can you also enable debug mode and share the output? For example by setting the $ DEBUG=1 slo-generator compute -f <SLO_CONFIG_PATH> -c <SHARED_CONFIG_PATH>
[...] In the mean time, I am trying my best to get my hands on a Dynatrace environment. |
Looking at your SLO definition, can you also share what the |
Hello @lvaylet did you had some time to investigate on this issue ? |
Hi @GeoffroyLatourDK. Apologies for the late reply. I was on vacation and off the grid. I do not see anything suspicious with your SLO definition. This being said, I am surprised by the huge difference between the expected (6k83) and actual (60) values. That is two orders of magnitude! Are we really looking at the same metric? With the same filters (or absence of filters)? Over the same duration? Debug mode lets us check the actual requests to the Dynatrace API. For example on lines 67, 68 and 69 of
Have you tried running these queries in the Dynatrace UI to confirm you get the same values? Have you also tried setting the |
On an unrelated topic, I just noticed this performance warning at line 324 in the debug output:
Most probably no consequence on the output but worth considering anyway. |
Hi again, I also noticed that your SLO definition sets |
Hello, as an update i've check a bit more parameter via the UI and found the Fold transformation parameter and when I change it from auto to count i have the same result as the SLO Generator output but for the moment i don't know how to explain the huge difference between SLO generator SLI and Dynatrace SLI. i will make an other update soon :) |
Hi @GeoffroyLatourDK, any update to share? |
Hello @lvaylet , as far as my investigation were going it is not a bug but a problem of precision on the part of dynatrace. In fact, when you request data extraction via the api for "large" periods of time, Dynatrace won't send all the data, but only averages for a hundred or so periods of time. For example, over a 28-day period, Dynatrace will send us the average response time over a 6-hour period. In my case, however, over a 6-hour period I may have many peaks above my limit value, but these will not be taken into account because the average will be below the threshold. and to finish with I don't know if Dynatrace will do it will every kind of builtin metric and work diffrently with other type of metrics we only work with builtin metric. So we can close the issue :) |
SLO Generator Version
v2.3.4
Python Version
3.10.11
What happened?
in the documentation, we're shown using the threshold method with an ext: metric from a OneAgent or ActiveGate extension.
ext:app.request_latency
is this mandatory, or can we use built-in metrics like those ones ?
builtin:service.response.client
builtin:service.keyRequest.response.time
it can be great to add a kind of list of metric that we can use in the documentation :)
What did you expect?
expected to have a valid result using those two builtin metrics
builtin:service.response.client
builtin:service.keyRequest.response.time
Screenshots
![DESCRIPTION](LINK.png)
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: