Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss of Specific Error Codes in flb_snappy_uncompress_framed_data Leading to Debugging Challenges #9587

Open
thewillyhuman opened this issue Nov 12, 2024 · 0 comments

Comments

@thewillyhuman
Copy link

Hi,

I’m encountering the log message [error] [opentelemetry] snappy decompression failed. We are relating it to large volumes of metrics 1M+ being prometheus remote written into Fluent Bit (but it is a guess for the moment). After investigating, I believe the root of this error lies in the function flb_snappy_uncompress_framed_data(...) and we want to dig a bit deeper into it, but we can't.

Observed Problem

The function flb_snappy_uncompress_framed_data(...) returns various specific error codes (-1 to -6) which could be helpful in diagnosing different decompression issues. However, the current implementation appears to override any non-zero return code, casting it generically to -1, which significantly reduces the granularity of error handling and makes troubleshooting more challenging.

Code Reference

In the code snippet below, we see the initial capture of the function's return code:

int ret;
ret = flb_snappy_uncompress_framed_data(input_buffer,
input_size,
output_buffer,
output_size);

However, immediately after this, the code maps any non-zero return value to -1, disregarding the specific error codes that flb_snappy_uncompress_framed_data(...) provides:

if (ret != 0) {
flb_error("[opentelemetry] snappy decompression failed");
return -1;

Suggestion

Retaining the specific error codes returned by flb_snappy_uncompress_framed_data(...) would enable more precise logging and debugging, turning the current "guesswork" approach into one where issues can be addressed more systematically. My suggestion would be as simple as extend the current log message to include the error code verbatim, no need to resolve it to specific human strings as to debug looking the source code with the error code already provides useful information.

If you think this would be a valuable improvement, I’d be happy to contribute to implementing this change. Please let me know your thoughts.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant