Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging when performing read_multi against large number of keys #941

Open
ghost opened this issue Dec 9, 2022 · 4 comments · May be fixed by #942
Open

Hanging when performing read_multi against large number of keys #941

ghost opened this issue Dec 9, 2022 · 4 comments · May be fixed by #942

Comments

@ghost
Copy link

ghost commented Dec 9, 2022

We're seeing an issue where the dalli client hangs trying to read ~100-200K keys at once. We were able to repro with the following code and a cache store configuration.

It looks similar to #776 but wanted to confirm and potentially add our use case given the age of that issue. We're going to mitigate on our end by dropping the batch size to 10K keys per call.

  • ruby 2.7.6p219
  • Rails 6.0.5.1
  • dalli (3.2.3)
  config.cache_store = :mem_cache_store,
    ENV.fetch('MEMCACHE_SERVERS', '127.0.0.1:11211'),
    { :namespace => "comp-#{Rails.env}" }
require 'timeout'

key_count = 2**10

while key_count < 10_000_000 do
    prefix = (rand * 2**64).to_i.to_s(32)
    puts "Prepping #{key_count} keys"
    keys = key_count.times.map { |i| "some_long_string:#{prefix}:#{i}" }
    t = Time.now
    puts "Starting set multi #{t}"
    Rails.cache.fetch_multi(*keys) { 0 }
    diff = Time.now - t
    puts "Completed in #{diff}s"
    t = Time.now
    puts "Starting multi get #{t}"
    Timeout::timeout(diff) {
        Rails.cache.read_multi(*keys) { 0 }
    }
    puts "Completed in #{Time.now - t}s"
    key_count *= 2
end
Prepping 65536 keys
Starting set multi 2022-12-09 17:36:08 +0000
Completed in 6.119117072s
Starting multi get 2022-12-09 17:36:14 +0000
Completed in 1.0013925s
Prepping 131072 keys
Starting set multi 2022-12-09 17:36:15 +0000
Completed in 12.960816923s
Starting multi get 2022-12-09 17:36:28 +0000
Traceback (most recent call last):
        41: from bin/rails:9:in `<main>'
        40: from bin/rails:9:in `require'
        39: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/commands.rb:18:in `<top (required)>'
        38: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/command.rb:46:in `invoke'
        37: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/command/base.rb:69:in `perform'
        36: from /usr/local/lib/ruby/gems/2.7.0/gems/thor-1.2.1/lib/thor.rb:392:in `dispatch'
        35: from /usr/local/lib/ruby/gems/2.7.0/gems/thor-1.2.1/lib/thor/invocation.rb:127:in `invoke_command'
        34: from /usr/local/lib/ruby/gems/2.7.0/gems/thor-1.2.1/lib/thor/command.rb:27:in `run'
        33: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/commands/runner/runner_command.rb:42:in `perform'
        32: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/commands/runner/runner_command.rb:42:in `load'
        31: from railscache.rb:16:in `<top (required)>'
        30: from /usr/local/lib/ruby/2.7.0/timeout.rb:110:in `timeout'
        29: from railscache.rb:17:in `block in <top (required)>'
        28: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:387:in `read_multi'
        27: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:686:in `instrument'
        26: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/notifications.rb:180:in `instrument'
        25: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
        24: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/notifications.rb:180:in `block in instrument'
        23: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:686:in `block in instrument'
        22: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:388:in `block in read_multi'
        21: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache/strategy/local_cache.rb:131:in `read_multi_entries'
        20: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache/mem_cache_store.rb:159:in `read_multi_entries'
        19: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:367:in `with'
        18: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache/mem_cache_store.rb:159:in `block in read_multi_entries'
        17: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `get_multi'
        16: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `tap'
        15: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:108:in `block in get_multi'
        14: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:19:in `process'
        13: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/ring.rb:78:in `lock'
        12: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:20:in `block in process'
        11: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:32:in `setup_requests'
        10: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `make_getkq_requests'
         9: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `each'
         8: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:48:in `block in make_getkq_requests'
         7: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `request'
         6: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `synchronize'
         5: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:18:in `block in request'
         4: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:35:in `request'
         3: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:210:in `pipelined_get'
         2: from /usr/local/lib/ruby/2.7.0/forwardable.rb:235:in `write'
         1: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write'
/usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write': execution expired (Timeout::Error)
@ghost
Copy link
Author

ghost commented Dec 9, 2022

Also able to repro without rails via the following:

require 'timeout'
require 'dalli'

key_count = 2**10

dalli_client = Dalli::Client.new( ENV.fetch('MEMCACHE_SERVERS'), timeout: 0.5, error_when_over_max_size: true)
while key_count < 10_000_000 do
    prefix = (rand * 2**64).to_i.to_s(32)
    puts "Prepping #{key_count} keys"
    keys = key_count.times.map { |i| "some_long_string:#{prefix}:#{i}" }
    t = Time.now
    puts "Starting set multi #{t}"
    keys.each do |k|
        dalli_client.set(k, 0)
    end
    diff = Time.now - t
    puts "Completed in #{diff}s"
    t = Time.now
    puts "Starting multi get #{t}"
    Timeout::timeout(diff) {
        dalli_client.get_multi(keys)
    }
    puts "Completed in #{Time.now - t}s"
    key_count *= 2
end
Prepping 524288 keys
Starting set multi 2022-12-09 17:25:48 +0000
Completed in 40.370745713s
Starting multi get 2022-12-09 17:26:28 +0000
Completed in 1.500120584s
Prepping 1048576 keys
Starting set multi 2022-12-09 17:26:30 +0000
Completed in 81.036222996s
Starting multi get 2022-12-09 17:27:51 +0000
Traceback (most recent call last):
        20: from dalli.rb:20:in `<main>'
        19: from /usr/local/lib/ruby/2.7.0/timeout.rb:110:in `timeout'
        18: from dalli.rb:21:in `block in <main>'
        17: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `get_multi'
        16: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `tap'
        15: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:108:in `block in get_multi'
        14: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:19:in `process'
        13: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/ring.rb:78:in `lock'
        12: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:20:in `block in process'
        11: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:32:in `setup_requests'
        10: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `make_getkq_requests'
         9: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `each'
         8: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:48:in `block in make_getkq_requests'
         7: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `request'
         6: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `synchronize'
         5: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:18:in `block in request'
         4: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:35:in `request'
         3: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:210:in `pipelined_get'
         2: from /usr/local/lib/ruby/2.7.0/forwardable.rb:235:in `write'
         1: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write'
/usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write': execution expired (Timeout::Error)

@petergoldstein
Copy link
Owner

@pcorliss-provi I strongly suspect it's the same issue as #776 . I don't have a mitigation for that yet, other than limiting the number of keys fetched at a time using multi_get. It's possible to implement that batching internal to the client (i.e. break bigger batches into batches of ~10k, although it's slightly more complicated because of the ring), but I haven't done that. If you'd like to submit a PR it would be welcome.

@marvinthepa
Copy link

@petergoldstein Funny that I seem to have discovered on the same day as @pcorliss-provi that we are affected by this / #776 as well. Increasing sndbuf and rcvbuf is a good workaround for now, but I would like to fix this properly.

I have started implementing interleaved reading/writing (which is what I suspect you mean by "batching internal to the client"), but it needs some polishing. If you want I can push a rough solution so we can have a discussion. I am especially not quite sure what you mean by "it's slightly more complicated because of the ring".

For now, do you want to close this as duplicate in favor of #776?

@petergoldstein
Copy link
Owner

@marvinthepa Yes, interleaved reading/writing is basically what I was thinking. The ring (the data structure that distributes among the memcached servers) makes it more complicated since the buffer problem is per-memcached server. So ideally the interleaving would occur per server, but that requires breaking the ring abstraction. So it may be simpler to ignore this detail and batch as if it's a single server.

And yes, we can close this issue as a duplicate.

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 12, 2022
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 12, 2022
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 1, 2024
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 1, 2024
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 1, 2024
fixes petergoldstein#776.
fixes petergoldstein#941.

When reading a large number of keys, memcached starts sending the
response when dalli is not yet finished sending the request.

As we did not start reading the response until we were finished writing
the request, this could lead to the following problem:

* the receive buffer (rcvbuf) would fill up
* due to TCP backpressure, memcached would stop sending (okay as we are
  not reading anyway), but also stop reading
* the send buffer (sndbuf) would also fill up
* as we were using a blocking write without timeout, we would block
  forever (at least with ruby < 3.2, which introduces IO::Timeout, see
  petergoldstein#967)

This is addressed by using IO::select on the sockets for both read and
write, and thus start reading as soon as data is available.
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 2, 2024
fixes petergoldstein#776.
fixes petergoldstein#941.

When reading a large number of keys, memcached starts sending the
response when dalli is not yet finished sending the request.

As we did not start reading the response until we were finished writing
the request, this could lead to the following problem:

* the receive buffer (rcvbuf) would fill up
* due to TCP backpressure, memcached would stop sending (okay as we are
  not reading anyway), but also stop reading
* the send buffer (sndbuf) would also fill up
* as we were using a blocking write without timeout, we would block
  forever (at least with ruby < 3.2, which introduces IO::Timeout, see
  petergoldstein#967)

This is addressed by using IO::select on the sockets for both read and
write, and thus start reading as soon as data is available.
marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 2, 2024
fixes petergoldstein#776.
fixes petergoldstein#941.

When reading a large number of keys, memcached starts sending the
response when dalli is not yet finished sending the request.

As we did not start reading the response until we were finished writing
the request, this could lead to the following problem:

* the receive buffer (rcvbuf) would fill up
* due to TCP backpressure, memcached would stop sending (okay as we are
  not reading anyway), but also stop reading
* the send buffer (sndbuf) would also fill up
* as we were using a blocking write without timeout, we would block
  forever (at least with ruby < 3.2, which introduces IO::Timeout, see
  petergoldstein#967)

This is addressed by using IO::select on the sockets for both read and
write, and thus start reading as soon as data is available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants