Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

route cache control #421

Open
wbpcode opened this issue Oct 6, 2024 · 16 comments
Open

route cache control #421

wbpcode opened this issue Oct 6, 2024 · 16 comments

Comments

@wbpcode
Copy link
Contributor

wbpcode commented Oct 6, 2024

In the Envoy, the route cache will be cleared by default if a header is modified by the wasm filter. It would be better to provide a way to control the behavior by the wasm filter self.

But seems this require the proxy wasm to provide a new ABI?

@PiotrSikora
Copy link
Member

We had proxy_clear_route_cache call in the v0.1.0, but it was removed in v0.2.0 because it was very Envoy specific and error-prone, since each change to HTTP headers had to be followed by a call to proxy_clear_route_cache, and if the plugin author missed it, then it would break Envoy's routing logic (which I believe could result in security issues because of misrouted requests).

This functionality has been replaced by automatic clearing of that cache when HTTP headers are updated.

What would work better with the manual control? Which use cases don't work with the automatic clearing?

@wbpcode
Copy link
Contributor Author

wbpcode commented Oct 6, 2024

@PiotrSikora I personally think automatic clearing is not a good choice in practice.

We may change the host, path, headers in a filter because various reasons (for example, rewrite the path according to custom filter configuration). But we only actually want to refresh the route in limited scenarios.

Automatic clearing actually prohibit the host/path updating in the filter because it may result in unexpected 404.

The correct way is to clear the route cache only when it's required explicitly. (Most envoy filters work as this way, except wasm and lua now.)

@PiotrSikora
Copy link
Member

PiotrSikora commented Oct 6, 2024

The correct way is to clear the route cache only when it's required explicitly. (Most envoy filters work as this way, except wasm and lua now.)

And what happens when the route cache isn't cleared when it should (either by mistake or intentionally by a malicious plugin)?

@wbpcode
Copy link
Contributor Author

wbpcode commented Oct 6, 2024

The correct way is to clear the route cache only when it's required explicitly. (Most envoy filters work as this way, except wasm and lua now.)

And what happens when the route cache isn't cleared when it should (either by mistake or intentionally by a malicious plugin)?

As I said before, the route cache should only be cleared when the filter require it explicitly. In all other cases, the initial matched route will be used.

Basically, we only use this feature when we cannot select the correct route according to client requests and need some additional inputs from the filter (like user info from auth filters.)

@leonm1
Copy link

leonm1 commented Oct 11, 2024

I'm trying to understand the feature request in here.

Do plugin authors want users to intentionally route a request to a host/path that does not resolve in the Envoy's route configuration?

For example,

  • User requests https://www.example.com/a.png, which is routed through an Envoy proxy with a wasm filter.
  • The proxy routes www.example.com/a.png to cluster upstream-1.
  • The proxy-wasm plugin changes the :host header to internal-only-domain.example
    • Note: the internal-only-domain.example does not resolve with public DNS resolvers, nor does it exist in the Envoy route map, but it is expected by the origin server.
    • The plugin does not clear the route cache.
  • Alice's request is sent to upstream-1 with the host internal-only-domain.example and path /a.png

@wbpcode
Copy link
Contributor Author

wbpcode commented Oct 12, 2024

@leonm1 yeah, you are right 👍

@leonm1
Copy link

leonm1 commented Oct 15, 2024

I went looking for prior art, and I also found a few related (non-wasm) Envoy issues:

It seems as though at least making this configurable is important.

I also took a look at how Apache Traffic Server handles this. It appears like ATS supports both:

  • "global" plugins are called after initial route is mapped and requires manually refreshing the route cache, while
  • "remap" plugins are called during route selection and automatically effect the route selection

Setting safe defaults is important. As Piotr points out, changing the host without refreshing the cached route can be dangerous in some circumstances. It sounds like refreshing the route cache by default may be a safe default for plugin authors, but we could allow plugin authors to opt-in to refreshing the cached route. This would apply to both ATS and Envoy, and I assume similar behavior applies to other proxies as well (though I did not check).

ext_proc also has an enum which can be set on a per-request basis.

@PiotrSikora given the use cases and prior art above, what do you think about adding a knob to let plugin authors choose?

@wbpcode does this satisfy the feature request?

@wbpcode
Copy link
Contributor Author

wbpcode commented Oct 16, 2024

Basically, I think keep the same default behavior with envoy is good enough. But at least a control flag is better than no control anyway.

@PiotrSikora
Copy link
Member

PiotrSikora commented Oct 16, 2024

@leonm1 thanks for digging into this! However, I think that we should step back a bit, and ask what's the intended use case, instead of focusing on Envoy's implementation details.

Is the feature request to:

  1. modify HTTP request headers without affecting selected upstream cluster (generic feature),
  2. be able to select different upstream cluster (again, generic feature),
  3. something else that's only achievable by re-picking selected route?

Also, regarding route cache in Envoy, the official documentation says that it must be cleared when changes to headers would affect routing, and since plugin authors generally don't (and shouldn't) know all the possible configurations the plugin is going to be deployed to, they cannot know whether the route cache should be cleared or not. See https://github.com/envoyproxy/envoy/blob/f86d1294e764af4429732d322c8a8a9fedead8a2/envoy/http/filter.h#L361-L365:

clearRouteCache() - Clears the route cache for the current request. This must be called when a filter has modified
the headers in a way that would affect routing.

IMHO, this decision should be done in proxy's configuration and not in the plugin itself (unless we want to make plugins deployment-specific).

Notably, envoyproxy/data-plane-api#569 added match_incoming_request_route to gRPC/JSON transcoder filter to disable automatic clearing of the route cache, so to keep things consistent within Envoy, that's probably the easiest and best way to address this.

@wbpcode
Copy link
Contributor Author

wbpcode commented Oct 16, 2024

I will try to remove the automatic refresh in the Envoy's implementation and add the foreign functions to clear the route cache.

By this way, the wasm plugin could get similar logic with the Envoy's native extension. The plugin could refreh the route according to it's plugin configuration or code.

@PiotrSikora
Copy link
Member

I will try to remove the automatic refresh in the Envoy's implementation and add the foreign functions to clear the route cache.

By this way, the wasm plugin could get similar logic with the Envoy's native extension. The plugin could refreh the route according to it's plugin configuration or code.

Erm, this is the opposite of what I suggested.

Also, please remember that:

  1. Changing defaults will break existing plugins.
  2. As mentioned before, there was proxy_clear_route_cache in Proxy-Wasm ABI v0.1.0, but based on the feedback from Istio plugin developers (@kyessenov?), we removed it in ABI v0.2.0 and made this behavior automatic. It was 5 years ago, so I don't recall the details now, but we shouldn't be flipping back-and-forth based on edge cases that can be addressed differently.
  3. IMHO, it should be proxy operators, and not plugins authors, that control whether this mechanism can be bypassed.

@kyessenov
Copy link
Collaborator

@PiotrSikora automatic on/off for clearing route cache will not work for selective re-routing, e.g. as it happens with JWT authentication filter that may or may not impact routing via dynamic metadata. So that means we do need an explicit host call. That can co-exist with the plugin-level config to turn on/off routing automatically. So in the end a solution probably needs:

  • a new field in the Wasm filter to control whether the route cache is cleared always/never.
  • a new host call to explicitly clear it when the above is set to "never".

@wbpcode
Copy link
Contributor Author

wbpcode commented Oct 17, 2024

If we hope the wasm extension has similar experience with the native one, then the framework shouldn't provide any automatical refresh. It should only be decided by the plugin configuration and the plugin logic.

Basically, the new current work basically is additional fee of the 5 years agos decision.
I think this is similar to the stop iteration support.

@wbpcode
Copy link
Contributor Author

wbpcode commented Oct 17, 2024

Is it possible to bring the proxy_clear_route_cache back? And we can also add the stop iteration support back.
All these changes could be placed in a new ABI version?

Or proxy-wasm community now treat it as a envoy-specifc feature so we cannot bring it back?

@PiotrSikora
Copy link
Member

If we hope the wasm extension has similar experience with the native one, then the framework shouldn't provide any automatical refresh.

Wasm isn't a native extension, but an external one (same as Lua and ext_proc).

If you look at those two, then Lua always clears route cache when HTTP headers are modified (same as Wasm), and ext_proc is clearing it by default (again, same as Wasm), but that behavior can be disabled in xDS configuration.

At the very least, you should try to match that behavior (i.e. what @kyessenov described above), instead of introducing a completely different one.

It should only be decided by the plugin configuration and the plugin logic.

Again, this can bypass security constrains on various routes, so this should not be something controlled only by the possibly untrusted plugin.

But to address some use cases, I think we can agree that this behavior could be disabled in xDS configuration.

Is it possible to bring the proxy_clear_route_cache back?
[...]
Or proxy-wasm community now treat it as a envoy-specifc feature so we cannot bring it back?

This is very much Envoy-specific feature and should be treated as such, so it should be added as a custom hostcall and then we can add wrappers for it to various SDKs. The good news about it is that it doesn't require a new ABI version.

And we can also add the stop iteration support back. All these changes could be placed in a new ABI version?

This will be addressed by the new ABI version (see: proxy-wasm/spec#63).

@wbpcode
Copy link
Contributor Author

wbpcode commented Oct 18, 2024

If you look at those two, then Lua always clears route cache when HTTP headers are modified (same as Wasm), and ext_proc is clearing it by default (again, same as Wasm), but that behavior can be disabled in xDS configuration.

I also don't think that lua do the correct thing but we can never change it becasue the lua is stable now. Ext_proc will only clear the route cache when the external plugin require it.

  enum RouteCacheAction {
    // The default behavior is to clear the route cache only when the
    // :ref:`clear_route_cache <envoy_v3_api_field_service.ext_proc.v3.CommonResponse.clear_route_cache>`
    // field is set in an external processor response.
    DEFAULT = 0;

    // Always clear the route cache irrespective of the clear_route_cache bit in
    // the external processor response.
    CLEAR = 1;

    // Do not clear the route cache irrespective of the clear_route_cache bit in
    // the external processor response. Setting to RETAIN is equivalent to set the
    // :ref:`disable_clear_route_cache <envoy_v3_api_field_extensions.filters.http.ext_proc.v3.ExternalProcessor.disable_clear_route_cache>`
    // to true.
    RETAIN = 2;
  }

Again, this can bypass security constrains on various routes, so this should not be something controlled only by the possibly untrusted plugin.

Basically when you load a untrust code, you can hard to prohbit it to do bad thing completely.

Specific to this feature, for the untrust plugin, we can also cannot expect it's modification is trust. Then refresh route based on the untrust modifiction automatically typically won't improve the safety. (So, at least the automatically refresh should be removed, this could be guard by a long-live runtime flag.)

And considering the acutal requirement from the pratice, route refresh is necessary, so, we will need a custom host call to do that. (We will add it as foreign call for now.)

We may also want disable the refresh completely for some plugins in the future. This could be treat as independent feature request of the Envoy.

Anyway, I think we should care the actual requirements first. Because at most time, we run the trust plugin that developed by ourselves. After that, we then could consider to make it safer by adding limit to specific plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants