-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: wasm log aws.appmesh.ingress_http_stats and aws.appmesh.egress_http_stats panicked #484
Comments
Saw a slightly different error today
I rolled back a service and gateway to v1.25.1.0. This is not happening in that version of envoy |
Hi @tnsardesai , Thanks for the bug report. We are looking into it. To help us better understand the issue, could you share a bit more information about your mesh setup? You mentioned this is only observed when in high load, do you by any chance have the snapshot of the envoy stats (the Envoy crashed so you might not be able to retrieve it)? Have you tried any other version of Envoy images? For example v1.25.3.0 and onward. Lastly, if you have Premium Support, feel free to engage them to cut us a ticket. |
The crash is from within proxy-wasm-rust-sdk. The bytes returned is not valid utf-8 bytes.
This easily reproducible if I ran following code, see the playground: #![allow(unused)]
fn main() {
use std::str;
// some bytes, in a vector
let problematic_bytes = vec! [49, 32, 192, 167, 192, 162, 37, 50, 53, 50, 55, 37, 50, 53, 50, 50, 44, 32, 49, 53, 52, 46, 51, 56, 46, 49, 55, 50, 46, 50, 52, 51, 44, 32, 49, 53, 46, 49, 53, 56, 46, 52, 55, 46, 49, 52, 49];
let result = str::from_utf8(&problematic_bytes).unwrap();
} I submitted a bug report to proxy wasm sdk - proxy-wasm/proxy-wasm-rust-sdk#217. Meanwhile I wonder if there is any breaking change introduced in Envoy that would produce non-UTF-8 encoding bytes. |
Thanks for your response. I previously thought it was due to high load but looking at the data over 7 days it is happening at all times of the day. I believe this is happening on a specific bot or attacker request with bad headers. All of these errors are on our externally exposed virtual gateways and applications. We are not seeing this in any internal applications. I have rolled back all our applications and gateways which see error to v.1.25.1.0. I could try upgrading some of our applications next week to help pin point what version broke this if that will aid your investigation. I have already engaged aws support, case id is 170303481101460 for your reference Can you also take a look at the error in #484 (comment). It has a different stack trace. I also saw this other log today which is different than the other two.
We have noticed an increase in non-UTF-8 bytes in our access logs. For example right around the time of this stack trace I see this access log. |
@tnsardesai this is different than the UTF-8 bug from the initial comment, and it looks like a bug in |
I created a #485 for the other bug! Over the weekend I saw this bug even on v1.25.1.0 so definitely different since I haven't seen the bug in this issue over the weekend on v1.25.1.0. This week I plan on upgrading envoy versions to help pin point the exact version of envoy where things broke! |
Hey I upgraded a service from
|
Hi @tnsardesai , just so you know that we are working on a Envoy patch release. It will contain the fix for this issue and #485. I cannot provide a specific date, but it will be soon. |
This issue is fixed by Envoy release v1.27.3.0 #486 |
Summary
What are you observing that doesn't seem right?
Seeing a panic
I also see a similar log for
[93][critical][wasm] [source/extensions/common/wasm/context.cc:1157] wasm log aws.appmesh.egress_http_stats: panicked at /codebuild/output/src3353/src/s3/00/wasm/cargo-project/vendor/proxy-wasm/src/hostcalls.rs:1192:42:
Steps to Reproduce
What are the steps you can take to reproduce this issue?
No idea. But pretty sure this happens only under high load. Looking at our logs I am seeing this error only on services which receive a high amount of load.
Are you currently working around this issue?
How are you currently solving this problem?
Even after seeing this log the container keeps functioning and responding to requests. I am not sure what the impact on this panic is. We are also seeing envoy metrics are not getting reported so that might be the impact.
Additional context
Anything else we should know?
This started after we updated our sidecar to 1.27.2.0.
The text was updated successfully, but these errors were encountered: