You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary
What are you observing that doesn't seem right?
Seeing high number of 503s when making cross region requests via a virtual gateway
Steps to Reproduce
What are the steps you can take to reproduce this issue?
The diagram below should describe our architecture
service-A in us-east-1 wants to call service-B in us-west-1
us-west-1 service-B is setup using virtual service, virtual router, virtual node. The virtual node provider is using cloudmap service discovery type and is setup using and ECS fargate service
us-east-1 service-B is also setup using virtual service, virtual router, virtual node. This virtual node provider is using DNS service discovery type and is pointing to the address of the NLB for virtual gateway
us-east-1 service-A's virtual node has setup service-B as a backend.
errors are intermittent. We have not noticed any pattern right now. We are also seeing this in dev so I doubt this has anything to do with scale.
NLB metrics show a high number of loadbalancer reset and a small number of client reset
Are you currently working around this issue?
How are you currently solving this problem?
We are not :(. Attempting different configurations of timeouts, retries and outlier detection to minimize the number of errors.
Additional context
Anything else we should know?
internal support case id - 12565299101
Attachments
envoy debug logs from one of the failed cross region request. extract-2023-04-21T16_31_02.134Z.csv.txt
Main error I see is remote address:10.24.19.114:80,TLS error: 33554536:system library:OPENSSL_internal:Connection reset by peer 33554464:system library:OPENSSL_internal:Broken pipe which suggests that envoy in us-east-1 (client) is not closing the connection with NLB (even though idle timeout in envoy is set to 150s), so NLB is sending a TCP RST once 350 seconds have passed and it receives a new request. Any help debugging this would be appreciated
The text was updated successfully, but these errors were encountered:
Summary
What are you observing that doesn't seem right?
Seeing high number of 503s when making cross region requests via a virtual gateway
Steps to Reproduce
What are the steps you can take to reproduce this issue?
The diagram below should describe our architecture
Are you currently working around this issue?
How are you currently solving this problem?
We are not :(. Attempting different configurations of timeouts, retries and outlier detection to minimize the number of errors.
Additional context
Anything else we should know?
internal support case id - 12565299101
Attachments
envoy debug logs from one of the failed cross region request.
extract-2023-04-21T16_31_02.134Z.csv.txt
Main error I see is
remote address:10.24.19.114:80,TLS error: 33554536:system library:OPENSSL_internal:Connection reset by peer 33554464:system library:OPENSSL_internal:Broken pipe
which suggests that envoy in us-east-1 (client) is not closing the connection with NLB (even though idle timeout in envoy is set to 150s), so NLB is sending a TCP RST once 350 seconds have passed and it receives a new request. Any help debugging this would be appreciatedThe text was updated successfully, but these errors were encountered: