Hanging when performing read_multi against large number of keys #941

ghost · 2022-12-09T17:32:11Z

We're seeing an issue where the dalli client hangs trying to read ~100-200K keys at once. We were able to repro with the following code and a cache store configuration.

It looks similar to #776 but wanted to confirm and potentially add our use case given the age of that issue. We're going to mitigate on our end by dropping the batch size to 10K keys per call.

ruby 2.7.6p219
Rails 6.0.5.1
dalli (3.2.3)

  config.cache_store = :mem_cache_store,
    ENV.fetch('MEMCACHE_SERVERS', '127.0.0.1:11211'),
    { :namespace => "comp-#{Rails.env}" }

require 'timeout'

key_count = 2**10

while key_count < 10_000_000 do
    prefix = (rand * 2**64).to_i.to_s(32)
    puts "Prepping #{key_count} keys"
    keys = key_count.times.map { |i| "some_long_string:#{prefix}:#{i}" }
    t = Time.now
    puts "Starting set multi #{t}"
    Rails.cache.fetch_multi(*keys) { 0 }
    diff = Time.now - t
    puts "Completed in #{diff}s"
    t = Time.now
    puts "Starting multi get #{t}"
    Timeout::timeout(diff) {
        Rails.cache.read_multi(*keys) { 0 }
    }
    puts "Completed in #{Time.now - t}s"
    key_count *= 2
end

Prepping 65536 keys
Starting set multi 2022-12-09 17:36:08 +0000
Completed in 6.119117072s
Starting multi get 2022-12-09 17:36:14 +0000
Completed in 1.0013925s
Prepping 131072 keys
Starting set multi 2022-12-09 17:36:15 +0000
Completed in 12.960816923s
Starting multi get 2022-12-09 17:36:28 +0000
Traceback (most recent call last):
        41: from bin/rails:9:in `<main>'
        40: from bin/rails:9:in `require'
        39: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/commands.rb:18:in `<top (required)>'
        38: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/command.rb:46:in `invoke'
        37: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/command/base.rb:69:in `perform'
        36: from /usr/local/lib/ruby/gems/2.7.0/gems/thor-1.2.1/lib/thor.rb:392:in `dispatch'
        35: from /usr/local/lib/ruby/gems/2.7.0/gems/thor-1.2.1/lib/thor/invocation.rb:127:in `invoke_command'
        34: from /usr/local/lib/ruby/gems/2.7.0/gems/thor-1.2.1/lib/thor/command.rb:27:in `run'
        33: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/commands/runner/runner_command.rb:42:in `perform'
        32: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/commands/runner/runner_command.rb:42:in `load'
        31: from railscache.rb:16:in `<top (required)>'
        30: from /usr/local/lib/ruby/2.7.0/timeout.rb:110:in `timeout'
        29: from railscache.rb:17:in `block in <top (required)>'
        28: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:387:in `read_multi'
        27: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:686:in `instrument'
        26: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/notifications.rb:180:in `instrument'
        25: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
        24: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/notifications.rb:180:in `block in instrument'
        23: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:686:in `block in instrument'
        22: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:388:in `block in read_multi'
        21: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache/strategy/local_cache.rb:131:in `read_multi_entries'
        20: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache/mem_cache_store.rb:159:in `read_multi_entries'
        19: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:367:in `with'
        18: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache/mem_cache_store.rb:159:in `block in read_multi_entries'
        17: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `get_multi'
        16: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `tap'
        15: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:108:in `block in get_multi'
        14: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:19:in `process'
        13: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/ring.rb:78:in `lock'
        12: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:20:in `block in process'
        11: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:32:in `setup_requests'
        10: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `make_getkq_requests'
         9: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `each'
         8: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:48:in `block in make_getkq_requests'
         7: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `request'
         6: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `synchronize'
         5: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:18:in `block in request'
         4: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:35:in `request'
         3: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:210:in `pipelined_get'
         2: from /usr/local/lib/ruby/2.7.0/forwardable.rb:235:in `write'
         1: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write'
/usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write': execution expired (Timeout::Error)

The text was updated successfully, but these errors were encountered:

ghost · 2022-12-09T17:33:57Z

Also able to repro without rails via the following:

require 'timeout'
require 'dalli'

key_count = 2**10

dalli_client = Dalli::Client.new( ENV.fetch('MEMCACHE_SERVERS'), timeout: 0.5, error_when_over_max_size: true)
while key_count < 10_000_000 do
    prefix = (rand * 2**64).to_i.to_s(32)
    puts "Prepping #{key_count} keys"
    keys = key_count.times.map { |i| "some_long_string:#{prefix}:#{i}" }
    t = Time.now
    puts "Starting set multi #{t}"
    keys.each do |k|
        dalli_client.set(k, 0)
    end
    diff = Time.now - t
    puts "Completed in #{diff}s"
    t = Time.now
    puts "Starting multi get #{t}"
    Timeout::timeout(diff) {
        dalli_client.get_multi(keys)
    }
    puts "Completed in #{Time.now - t}s"
    key_count *= 2
end

Prepping 524288 keys
Starting set multi 2022-12-09 17:25:48 +0000
Completed in 40.370745713s
Starting multi get 2022-12-09 17:26:28 +0000
Completed in 1.500120584s
Prepping 1048576 keys
Starting set multi 2022-12-09 17:26:30 +0000
Completed in 81.036222996s
Starting multi get 2022-12-09 17:27:51 +0000
Traceback (most recent call last):
        20: from dalli.rb:20:in `<main>'
        19: from /usr/local/lib/ruby/2.7.0/timeout.rb:110:in `timeout'
        18: from dalli.rb:21:in `block in <main>'
        17: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `get_multi'
        16: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `tap'
        15: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:108:in `block in get_multi'
        14: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:19:in `process'
        13: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/ring.rb:78:in `lock'
        12: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:20:in `block in process'
        11: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:32:in `setup_requests'
        10: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `make_getkq_requests'
         9: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `each'
         8: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:48:in `block in make_getkq_requests'
         7: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `request'
         6: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `synchronize'
         5: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:18:in `block in request'
         4: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:35:in `request'
         3: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:210:in `pipelined_get'
         2: from /usr/local/lib/ruby/2.7.0/forwardable.rb:235:in `write'
         1: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write'
/usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write': execution expired (Timeout::Error)

petergoldstein · 2022-12-09T18:19:06Z

@pcorliss-provi I strongly suspect it's the same issue as #776 . I don't have a mitigation for that yet, other than limiting the number of keys fetched at a time using multi_get. It's possible to implement that batching internal to the client (i.e. break bigger batches into batches of ~10k, although it's slightly more complicated because of the ring), but I haven't done that. If you'd like to submit a PR it would be welcome.

marvinthepa · 2022-12-09T23:00:05Z

@petergoldstein Funny that I seem to have discovered on the same day as @pcorliss-provi that we are affected by this / #776 as well. Increasing sndbuf and rcvbuf is a good workaround for now, but I would like to fix this properly.

I have started implementing interleaved reading/writing (which is what I suspect you mean by "batching internal to the client"), but it needs some polishing. If you want I can push a rough solution so we can have a discussion. I am especially not quite sure what you mean by "it's slightly more complicated because of the ring".

For now, do you want to close this as duplicate in favor of #776?

petergoldstein · 2022-12-09T23:14:03Z

@marvinthepa Yes, interleaved reading/writing is basically what I was thinking. The ring (the data structure that distributes among the memcached servers) makes it more complicated since the buffer problem is per-memcached server. So ideally the interleaving would occur per server, but that requires breaking the ring abstraction. So it may be simpler to ignore this detail and batch as if it's a single server.

And yes, we can close this issue as a duplicate.

fixes petergoldstein#776. fixes petergoldstein#941. When reading a large number of keys, memcached starts sending the response when dalli is not yet finished sending the request. As we did not start reading the response until we were finished writing the request, this could lead to the following problem: * the receive buffer (rcvbuf) would fill up * due to TCP backpressure, memcached would stop sending (okay as we are not reading anyway), but also stop reading * the send buffer (sndbuf) would also fill up * as we were using a blocking write without timeout, we would block forever (at least with ruby < 3.2, which introduces IO::Timeout, see petergoldstein#967) This is addressed by using IO::select on the sockets for both read and write, and thus start reading as soon as data is available.

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022

WIP: first attempt at petergoldstein#776 / petergoldstein#941

d31d9b2

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022

WIP: first attempt at petergoldstein#776 / petergoldstein#941

2a0bcd2

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 10, 2022

WIP: first attempt at petergoldstein#776 / petergoldstein#941

350d7b4

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 12, 2022

WIP: first attempt at petergoldstein#776 / petergoldstein#941

03ad88e

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Dec 12, 2022

WIP: first attempt at petergoldstein#776 / petergoldstein#941

dac656d

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 1, 2024

WIP: first attempt at petergoldstein#776 / petergoldstein#941

1030225

marvinthepa pushed a commit to marvinthepa/dalli that referenced this issue Nov 1, 2024

WIP: first attempt at petergoldstein#776 / petergoldstein#941

b84ceae

marvinthepa linked a pull request Nov 2, 2024 that will close this issue

interleave read and write on pipelined_get (#776, #941) #942

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hanging when performing read_multi against large number of keys #941

Hanging when performing read_multi against large number of keys #941

ghost commented Dec 9, 2022 •

edited by ghost

Loading

ghost commented Dec 9, 2022

petergoldstein commented Dec 9, 2022

marvinthepa commented Dec 9, 2022

petergoldstein commented Dec 9, 2022

Hanging when performing read_multi against large number of keys #941

Hanging when performing read_multi against large number of keys #941

Comments

ghost commented Dec 9, 2022 • edited by ghost Loading

ghost commented Dec 9, 2022

petergoldstein commented Dec 9, 2022

marvinthepa commented Dec 9, 2022

petergoldstein commented Dec 9, 2022

ghost commented Dec 9, 2022 •

edited by ghost

Loading