You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m encountering the log message [error] [opentelemetry] snappy decompression failed. We are relating it to large volumes of metrics 1M+ being prometheus remote written into Fluent Bit (but it is a guess for the moment). After investigating, I believe the root of this error lies in the function flb_snappy_uncompress_framed_data(...) and we want to dig a bit deeper into it, but we can't.
Observed Problem
The function flb_snappy_uncompress_framed_data(...) returns various specific error codes (-1 to -6) which could be helpful in diagnosing different decompression issues. However, the current implementation appears to override any non-zero return code, casting it generically to -1, which significantly reduces the granularity of error handling and makes troubleshooting more challenging.
Code Reference
In the code snippet below, we see the initial capture of the function's return code:
However, immediately after this, the code maps any non-zero return value to -1, disregarding the specific error codes that flb_snappy_uncompress_framed_data(...) provides:
Retaining the specific error codes returned by flb_snappy_uncompress_framed_data(...) would enable more precise logging and debugging, turning the current "guesswork" approach into one where issues can be addressed more systematically. My suggestion would be as simple as extend the current log message to include the error code verbatim, no need to resolve it to specific human strings as to debug looking the source code with the error code already provides useful information.
If you think this would be a valuable improvement, I’d be happy to contribute to implementing this change. Please let me know your thoughts.
Thank you!
The text was updated successfully, but these errors were encountered:
Hi,
I’m encountering the log message
[error] [opentelemetry] snappy decompression failed
. We are relating it to large volumes of metrics 1M+ being prometheus remote written into Fluent Bit (but it is a guess for the moment). After investigating, I believe the root of this error lies in the functionflb_snappy_uncompress_framed_data(...)
and we want to dig a bit deeper into it, but we can't.Observed Problem
The function
flb_snappy_uncompress_framed_data(...)
returns various specific error codes (-1
to-6
) which could be helpful in diagnosing different decompression issues. However, the current implementation appears to override any non-zero return code, casting it generically to-1
, which significantly reduces the granularity of error handling and makes troubleshooting more challenging.Code Reference
In the code snippet below, we see the initial capture of the function's return code:
fluent-bit/plugins/in_prometheus_remote_write/prom_rw_prot.c
Lines 159 to 165 in a14fcfc
However, immediately after this, the code maps any non-zero return value to
-1
, disregarding the specific error codes thatflb_snappy_uncompress_framed_data(...)
provides:fluent-bit/plugins/in_prometheus_remote_write/prom_rw_prot.c
Lines 166 to 169 in a14fcfc
Suggestion
Retaining the specific error codes returned by
flb_snappy_uncompress_framed_data(...)
would enable more precise logging and debugging, turning the current "guesswork" approach into one where issues can be addressed more systematically. My suggestion would be as simple as extend the current log message to include the error code verbatim, no need to resolve it to specific human strings as to debug looking the source code with the error code already provides useful information.If you think this would be a valuable improvement, I’d be happy to contribute to implementing this change. Please let me know your thoughts.
Thank you!
The text was updated successfully, but these errors were encountered: