Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support separate HTTP Codes in RetryPolicy #451

Open
mkielar opened this issue Dec 29, 2022 · 1 comment
Open

Feature Request: Support separate HTTP Codes in RetryPolicy #451

mkielar opened this issue Dec 29, 2022 · 1 comment

Comments

@mkielar
Copy link

mkielar commented Dec 29, 2022

If you want to see App Mesh implement this idea, please upvote with a 👍.

Tell us about your request
When #7 was first published, there were some bold statements about supporting configuration of retryPolicy where one could specify HTTP Codes separately by passing values like 4xx, 5xx and so on, alongside gateway-error and others. See this excerpt from #7:

        ...
        "http": [
            "server-error",          // retriable_status_codes: [ "500", "501", "505", "506", "507", 508", "509", "510", "511" ]
            "gateway-error",         // retriable_status_codes: [ "502", "503, "504" ]
            "client-error" ,         // retriable_status_codes: [ "409" ]
            "stream-error",          // retry_on: refused-stream (h2)
            "<1xx|2xx|3xx|4xx|5xx>", // retriable_status_codes: "xx" is expanded to valid IANA HTTP codes
        ],
        ...

and a quote, which sounds like somebody was really eager to implement this:

Most interesting event in the schema is the HTTP code expansion field. If field such as “1xx” is added to the list of http events to retry on, “xx” will be expanded to a full list of IANA supported HTTP codes.

However, in the final release none of this seems to be available, and the attempt to pass values like 5xx or even explicit 502 results in below error (this is a terraform error, but it looks like a fault returned by AWS API):

Error: error updating App Mesh route: BadRequestException: HttpEvent must be one of [server-error,gateway-error,client-error,stream-error,reset].

This is a feature request, to:

  1. Implement the feature as it was designed in Retry Policy #7
  2. More importantly
    • implement the possibility to pass separate HTTP Codes
    • implement some "special" indicator to explicitly allow retry-on-timeout (e.g. timeout).

Which integration(s) is this request for?
We're using AppMesh with ECS Fargate Services, but this is not relevant. The configuration should be available for any kind of service.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
We are wrapping several legacy, 3rd party applications with weird and obsolete APIs with AppMesh. We're provisioning Virtual Nodes, Services, Routers, and Router Routes for them. In Router Routes we set up different retry/timeout configurations for our internal clients (AppMesh Services) to "talk" to these 3rd parties. We have multiple Routes provisioned for each 3rd party, matching different endpoints by HTTP Method and Path Prefix and applying different retry / timeout rules. This works pretty well.

However, these apps do not really conform to REST standard, and sometimes they respond with 5xx or 4xx errors. after actually performing requests and modifying data internally in a way that will cause failure if the request is retried (please, don't ask)...

What we'd like to do, is to configure our Virtual Router Routes in a way, that would allow Envoys to perform retry on timeouts, and 504 errors from upstream, but not on 502 and 503, because they're "in use".

Are you currently working around this issue?
We currently only use stream-error for http retry events, and that disables retry-on-timeout. We then implement retry mechanisms in code, which completely contradicts the idea of using AppMesh.

@mkielar
Copy link
Author

mkielar commented Dec 29, 2022

See:

Looks like Envoy already supports both cases:

  1. The xx expansion is available by specifying values like 5xx or retriable-4xx.
  2. The selective configuration should be doable by using retriable-status-codes and a comma-separated list of HTTP Status Codes.

When looking at Envoy, it feels like a decision was made for AppMesh to provide unnecessary abstraction of the feature, preventing users from utilizing full functionality. I'm a bit confused trying to understand that decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant