3scale batcher policy #685

davidor · 2018-04-27T15:04:35Z

This PR introduces a new policy. Its goal is to reduce latency and increase throughput by significantly reducing the number of requests made to the 3scale backend. In order to achieve that, this policy caches authorization statuses and batches reports. In order to achieve that, it makes some trade-offs regarding rate-limits accuracy. I've tried to document everything in the README included in this PR.

I've tagged this as WIP because there are some things that need to be fixed. I've documented them in a TODO section included in the first lines of the new policy module. I decided to open the PR because I'd appreciate some early feedback, @mikz

I need to perform more tests to check that the policy is indeed fast and correct, but my first tests are really promising. The performance of the policy depends heavily on the cache hit ratio. For use cases with a relatively low number of services, apps, etc. this could easily bring a big improvement over the APIcast policy by sacrificing some accuracy when applying rate limits.

mikz

👍 good stuff.

We should set some goals so we know we are there or not and can choose the right compromise between speed and safety.

mikz · 2018-05-01T11:27:04Z

gateway/src/apicast/backend_client.lua

@@ -168,6 +173,40 @@ function _M:authorize(...)
  return call_backend_transaction(self, auth_uri, authorize_options(using_oauth), ...)
 end

+local function add_transaction(transactions, index, cred_type, cred, reports)
+  transactions['transactions[' .. index .. '][' .. cred_type .. ']'] = cred


We should use string.format for this to minimize string allocation.

Right 👍
Fixed.

mikz · 2018-05-01T11:27:42Z

gateway/src/apicast/backend_client.lua

@@ -120,6 +121,10 @@ local function auth_path(using_oauth)
         '/transactions/authorize.xml'
 end

+local function report_path()


This could be just module property instead of a function, right?

Right 👍
Fixed.

mikz · 2018-05-01T11:43:11Z

gateway/src/apicast/policy/3scale_batcher/README.md

+stored in the 3scale backend database. In summary, going over the defined usage
+limits is easier. The APIcast policy reports to 3scale backend every time it
+receives a request. Reports are asynchronous and that means that we can go over
+the limits for a brief window of time. On the other hand, this policy reports


I'd like to explore different strategy for reporting.

Right now there is a timer to report every N seconds.

I could imagine a strategy where it would report continuously.
Lets say there is a request and it triggers backend call. Then another request comes and because the backend call is still active it would be added to shmem. Then when the backend call finishes it can collect cached reports and issue new call.

So basically it would just cache the parallel calls and make them 1:1 to backend.

All these optimizations have compromises and to chose the right one we should have some target. There is plenty ways how to do this with different tradeoffs and performance characteristics. We should define some bar we want to reach and chose the correct way to get there.

Fully caching everything and then reporting to backend in one batch definitely has the best performance, but also the highest chance of going wrong.
This continuous reporting would not do 10x gain, maybe just 2x or 3x, but would be safer and more accurate.

Backend report scheduling strategy is a hard problem. Definitely, trial an error with different approaches is the best strategy to come up with the best solution within our constraints.

I think it may depend on cache hit ratio. When hit ratio is high, the best approximation might be waiting as much as we can to achieve the highest batch size in allowed time window. On the other hand, when hit ratio is low, waiting does not make sense, keeping memory file descriptors and timers (through fd's) is expensive.

We could implement a dynamic algorithm that changes strategies depending on cache hit ratio that we could measure.

On event based systems, locks are very painful with unfordable penalties in performance.

I suggest we do not use shared resources and IPC mechanisms. Each worker keeps its own state. Performance and low latency is the advantage. The drawback is increasing backend traffic since batching level is lower. Again depending on cache hit ratio. Good when cache hit ratio is high. Does not make sense on high cache miss scenarios.

mikz · 2018-05-01T11:44:13Z

gateway/src/apicast/policy/3scale_batcher/3scale_batcher.lua

+  local service_id = service.id
+  local credentials = context.credentials
+
+  ensure_report_timer_on(self, service_id, backend)


We definitely have to expose some way to init/destroy policies so they can do stuff like this. Global policies can do this in init_worker and we need some mechanics for local policies too. Including destructors because they can get GC'd.

mikz · 2018-05-01T11:54:36Z

gateway/src/apicast/policy/3scale_batcher/3scale_batcher.lua

+-- - Timeouts in locks are not handled properly. The request simply fails when
+-- this happens.
+--
+-- - Evaluate using a local hash and semaphores in the reports batcher instead


Local hash would be lost when worker crashes. Maybe using shdict but still parallelizing the collection / reporting across workers?

Using https://github.com/openresty/lua-nginx-module#ngxshareddictrpush we could immediately add reports to the list and they could be processed by several workers.

Maybe to limit congestion and improve cache locality we could have a semaphore to signal how many workers there should be running.

rpush returns the length of the list. We could simply call semaphore:push(1) when that length is over some batch size. That would wake up the timer.every loop waiting for the semaphore by semaphore:wait(t). When timer loop reaches empty list it can go back to semaphore:wait() to wait for some elements in the list.

Just food for though. There are some nice algorithms we could implement like https://elixir-lang.org/blog/2016/07/14/announcing-genstage/

mikz · 2018-05-01T11:56:13Z

gateway/src/apicast/policy/3scale_batcher/3scale_batcher.lua

+    end
+  else
+    if cached_auth.status == 200 then
+      self.reports_batcher:add(service_id, credentials, usage)


This could go to :log phase so client does not see the impact.

Good idea 👍
Why log instead of post_action ?

We can't do that @mikz

resty.lock.lock() calls ngx.sleep() which cannot be run in the log phase:
https://github.com/openresty/lua-resty-lock/blob/master/lib/resty/lock.lua#L151
https://github.com/openresty/lua-nginx-module#ngxsleep

I realized because I saw errors like this one in the logs:

[error] 7127#7127: *150 failed to run log_by_lua*: /usr/local/openresty/lualib/resty/lock.lua:151: API disabled in the context of log_by_lua*

It would be possible to do that by creating a timer.at(0) to run in the background, but that we can explore later.

mikz · 2018-05-02T13:20:29Z

I believe post_action delay would still be visible to the client but log would be entirely out of band. 2. 5. 2018 v 11:06, David Ortiz <[email protected]>:

…

@davidor commented on this pull request. In gateway/src/apicast/policy/3scale_batcher/3scale_batcher.lua: > + local backend_res = backend:authorize(formatted_usage, credentials) + local backend_status = backend_res.status + + if backend_status == 200 then + self.auths_cache:set(service_id, credentials, usage, 200) + self.reports_batcher:add(service_id, credentials, usage) + elseif backend_status >= 400 and backend_status < 500 then + local rejection_reason = rejection_reason_from_headers(backend_res.headers) + self.auths_cache:set(service_id, credentials, usage, backend_status, rejection_reason) + return error(service, rejection_reason) + else + return error(service) + end + else + if cached_auth.status == 200 then + self.reports_batcher:add(service_id, credentials, usage) Good idea 👍 Why log instead of post_action ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

andrewdavidmackenzie · 2018-05-02T13:34:54Z

Three related questions David:

did you have it as a goal to reduce (gateway) latency (I didn't, but that's OK :-) )?
do you think it actually will reduce latency?
will you be doing any latency measurements at scale to confirm/deny improvement?

davidor · 2018-05-02T13:56:44Z

@andrewdavidmackenzie The main goal was to increase throughput by reducing the backend load, but I think latencies will improve too.

Regarding measurements, I did some performance tests on an Openshift cluster with the traffic profile designed by @eguzki . The results were very good: >10x throughput with a TTL for auths of 10s (per gateway), and 10s of batching (per Apicast worker). I plan to add more tests, make some minor changes, and check that everything works as expected. After that, I'll repeat the benchmarks and paste the results here.

andrewdavidmackenzie · 2018-05-02T14:31:18Z

Understand, cool. Unclear if test results will include latency measurements also...

On Wed, May 2, 2018, 15:57 David Ortiz ***@***.***> wrote: @andrewdavidmackenzie <https://github.com/andrewdavidmackenzie> The main goal was to increase throughput by reducing the backend load, but I think latencies will improve too. Regarding measurements, I did some performance tests on an Openshift cluster with the traffic profile designed by @eguzki <https://github.com/eguzki> . The results were very good: ~10x throughput with a TTL for auths of 10s (per gateway), and 10s of batching (per Apicast worker). I plan to add more tests, make some minor changes, and check that everything works as expected. After that, I'll repeat the benchmarks and paste the results here. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#685 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFReLBk_R1OvWawX5gcoC0ZfTaAf9Jb7ks5tubsmgaJpZM4TqiP3> .

-- regards Andrew Andrew Mackenzie Director of Software Engineering 3scale by Red Hat

davidor · 2018-05-08T16:42:57Z

I tried this and it works quite well.

I performed the tests in an Openshift cluster with apicast and 3scale backend deployed. The cluster has 4 c4.xlarge compute nodes. I configured the policy to cache auths for 10 seconds (per apicast) and batch reports for 10 seconds (per apicast worker).

I tested with a pattern traffic designed by @eguzki . It creates 3 services, and 100 apps and 3 metrics for each of those services. All requests report 1, 2, or 3 metrics.

Using this policy, it’s possible to achieve 18k rps. Which is more than 10x what we get without using this policy. Of course, as explained in the design document added in this PR, the comparison is not fair because of the trade-offs that this policy makes regarding the accuracy of rate-limits. However, given the significant improvement in throughput, I believe this policy can be very useful in some use cases.

I think there’s room for improvement. Also, I’d like to try different strategies for batching the reports as @mikz and @eguzki suggested. However, I’d like to merge this PR first. I think the code is now in mergeable state and well tested. @mikz I’d like you to do a final review before merging to confirm. After merging this, we’ll be able to iterate on this and try other strategies, even evaluate if it would make sense to include config params to select different strategies for batching. For example, some users might prefer to perform a per-worker batching to achieve higher throughput at the risk of losing some reports if a specific worker dies for some reason.

@mikz The majority of the changes I did since your review are in the last 4 commits. I included a README explaining the trade-offs of this policy, and also added unit tests for the policy module (3scale_batcher_spec.lua) and integration tests (apicast-policy-3scale-batcher.t). I also cleaned a bit the code in other classes, added some documentation and addressed your comments.

Let me know what you think @mikz

mikz · 2018-05-09T08:15:14Z

gateway/conf/nginx.conf.liquid

@@ -159,6 +159,13 @@ http {

  lua_shared_dict limiter 1m;

+  # This shared dictionaries are only used in the 3scale batcher policy.
+  # This is not ideal, but they'll need to be here until we allow policies to
+  # modify this template.


Just for the record: #477

mikz · 2018-05-09T08:17:30Z

gateway/src/apicast/backend_client.lua

+  end
+end
+
+local function format_transactions(reports_batch)


I'm not sure this method should live in the backend client.
It depends on ReportsBatch.

IMO this client should be unaware of any of other objects and just work with plain tables (and possibly __tostring metamethod on them).

edit: or some API exposed by this backend client returning tables that properly serialize.

mikz · 2018-05-09T08:35:11Z

gateway/src/apicast/policy/3scale_batcher/3scale_batcher.lua

+      self.reports_batcher:add(to_batch.service_id, to_batch.credentials, to_batch.usage)
+    elseif backend_status >= 400 and backend_status < 500 then
+      local rejection_reason = rejection_reason_from_headers(backend_res.headers)
+      self.auths_cache:set(service_id, credentials, usage, backend_status, rejection_reason)


If service_id, credentials and usage are always passed together to the auths_cache then they might deserve own object. And that can be reused to the reports batcher ?

This looks like clear parameter coupling and makes the parameter list really long.

Yes, I thought about that too. I think there are some refactoring opportunities there. Maybe we could extract some classes like Report or Authorization. I'd rather leave this for a future PR though.

So they can be returned from other policies.

This is needed to have policies that perform authorize and reporting to 3scale. If another policy have already done that, we need a why to avoid repeating the work in the Apicast policy.

mikz · 2018-05-09T15:30:51Z

gateway/src/apicast/policy/3scale_batcher/reports_batcher.lua

+    local add_ok, add_err = self.storage:safe_add(key, deltas[metric])
+
+    if not add_ok then
+      if add_err == 'no_memory' then


The error is no memory (with instead of _)

source: https://github.com/openresty/lua-nginx-module/blob/80fa32c8d437eb055e8dbd56fc630d6cb832ade4/src/ngx_http_lua_shdict.c#L1180

Sorry 🤦‍♂️

Good catch! Should be fixed now.

…e dict

… evicted from the dict

mikz

👍

octobot assigned davidor Apr 27, 2018

davidor force-pushed the split-3scale-auth-and-rep-policy branch 9 times, most recently from a0a36ab to 7fe0c01 Compare April 30, 2018 16:13

mikz reviewed May 1, 2018

View reviewed changes

davidor force-pushed the split-3scale-auth-and-rep-policy branch 2 times, most recently from a368a16 to 339089f Compare May 2, 2018 09:08

davidor force-pushed the split-3scale-auth-and-rep-policy branch 8 times, most recently from 61c0992 to b1185c7 Compare May 7, 2018 14:27

davidor changed the title ~~[WIP] 3scale batcher policy~~ 3scale batcher policy May 8, 2018

mikz reviewed May 9, 2018

View reviewed changes

davidor added 20 commits May 9, 2018 17:23

Extract Errors module

a7f580e

So they can be returned from other policies.

conf/nginx.conf.liquid: add shdicts for cached auths and batched reports

3d1ea79

policy/apicast: introduce flags to skip access() and post_action()

94ac269

This is needed to have policies that perform authorize and reporting to 3scale. If another policy have already done that, we need a why to avoid repeating the work in the Apicast policy.

policy/3scale_batcher: add helper to generate storage keys

6b4e0f7

spec/policy/3scale_batcher: add specs for KeysHelper

efb7484

policy/3scale_batcher: add a cache for the authorizations

eac9265

spec/policy/3scale_batcher: add specs for the auths cache

98f2ea0

policy/3scale_batcher: add a reports batcher

14b2adf

policy/3scale_batcher: add ReportsBatch

e81b1d8

spec/policy/3scale_batcher: adds specs for ReportsBatch

6692777

policy/3scale_batcher: add Reporter

cfc0992

spec/policy/3scale_batcher: add specs for Reporter

82df00b

Add 3scale batcher policy

b3fe30c

policy/3scale_batcher: add manifest

e0edc01

backend_client: add report()

f388d17

spec/backend_client: add specs for report()

d52edc7

policy/3scale_batcher: add README

21f4d91

CHANGELOG: add entry for 3scale batcher policy

24e34c4

Add specs for the 3scale batcher policy

ec6306b

t: add tests for the 3scale batcher policy

b7789b3

davidor force-pushed the split-3scale-auth-and-rep-policy branch from 4a25e31 to 3cf29ba Compare May 9, 2018 15:23

mikz reviewed May 9, 2018

View reviewed changes

davidor added 2 commits May 9, 2018 17:33

policy/3scale_batcher/reports_batcher: avoid evicting reports from th…

2a1f754

…e dict

spec/policy/3scale_batcher/reports_batcher: test that reports are not…

ecfa88e

… evicted from the dict

davidor force-pushed the split-3scale-auth-and-rep-policy branch from 3cf29ba to ecfa88e Compare May 9, 2018 15:38

mikz approved these changes May 9, 2018

View reviewed changes

davidor merged commit 7b06e3e into master May 9, 2018

davidor deleted the split-3scale-auth-and-rep-policy branch May 9, 2018 15:59

davidor added this to the 3.3 milestone Jun 1, 2018

mikz mentioned this pull request Jul 23, 2018

Reducing dependence on Backend #808

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3scale batcher policy #685

3scale batcher policy #685

davidor commented Apr 27, 2018

mikz left a comment

mikz May 1, 2018

davidor May 2, 2018

mikz May 1, 2018

davidor May 2, 2018

mikz May 1, 2018

eguzki May 2, 2018 •

edited

Loading

eguzki May 2, 2018

mikz May 1, 2018

mikz May 1, 2018

mikz May 1, 2018

davidor May 2, 2018

davidor May 7, 2018

mikz May 9, 2018

mikz commented May 2, 2018 via email

andrewdavidmackenzie commented May 2, 2018

davidor commented May 2, 2018 •

edited

Loading

andrewdavidmackenzie commented May 2, 2018 via email

davidor commented May 8, 2018 •

edited

Loading

mikz May 9, 2018

mikz May 9, 2018 •

edited

Loading

mikz May 9, 2018 •

edited

Loading

davidor May 9, 2018

mikz May 9, 2018

davidor May 9, 2018

mikz left a comment

3scale batcher policy #685

3scale batcher policy #685

Conversation

davidor commented Apr 27, 2018

mikz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eguzki May 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikz commented May 2, 2018 via email

andrewdavidmackenzie commented May 2, 2018

davidor commented May 2, 2018 • edited Loading

andrewdavidmackenzie commented May 2, 2018 via email

davidor commented May 8, 2018 • edited Loading

Choose a reason for hiding this comment

mikz May 9, 2018 • edited Loading

Choose a reason for hiding this comment

mikz May 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikz left a comment

Choose a reason for hiding this comment

eguzki May 2, 2018 •

edited

Loading

davidor commented May 2, 2018 •

edited

Loading

davidor commented May 8, 2018 •

edited

Loading

mikz May 9, 2018 •

edited

Loading

mikz May 9, 2018 •

edited

Loading