Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: repeated keys in static_map after a sequence of insertions and deletions #606

Open
1 task done
amiralirz opened this issue Oct 2, 2024 · 4 comments
Open
1 task done
Labels
P1: Should have Necessary but not critical topic: static_map Issue related to the static_map topic: static_set Issue related to the static_set type: bug Something isn't working

Comments

@amiralirz
Copy link

amiralirz commented Oct 2, 2024

Is this a duplicate?

Type of Bug

Silent Failure

Describe the bug

I've been seeing repeated keys when I retrieve all the keys and values from cuco::staic_map. This is the explanation that I've found for it.

Assume that k1 and k2 map to the same position on the map and cause a collision. if we insert at k1, then insert at k2, remove k1 and then insert at k2 again, with the current probing schemes, there will be 2 instances of k2 at the map. That's because when we remove k1, an erased key sentinel will be left at it's position and when we want to insert k2 again, the insert function sees the erased key sentinel first and doesn't check any further to see if k2 is placed anywhere else. as a result, there will be 2 k2s at our map.

How to Reproduce

  1. Construct a map
  2. find 2 keys k1 and k2 that collide with each other
  3. insert something at k1
  4. insert something at k2
  5. remove k1
  6. insert something at k2
  7. retrieve all the keys at the map

Result: There will be 2 instances of k2 when we retrieve all the elements from the map.

Expected behavior

At any time there should be only one instance of k2 in the map.

Reproduction link

No response

Operating System

Ubuntu 22.04.4 LTS

nvidia-smi output

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.100 Driver Version: 550.100 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off |
| 0% 49C P8 19W / 500W | 31MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

NVCC version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

@amiralirz amiralirz added the type: bug Something isn't working label Oct 2, 2024
@sleeepyjack sleeepyjack added topic: static_map Issue related to the static_map topic: static_set Issue related to the static_set labels Oct 2, 2024
@sleeepyjack
Copy link
Collaborator

Hi @amiralirz,
Thank you for bringing this up!

This indeed is a solid bug and a tricky one to solve, too.

When the second k2 is inserted at the tombstone of k1, it has no clue about the other occurrence of k2 that is already present in the map, so we end up with two occurrences of k2 in the map. This is technically an invalid state as per the data structure's definition.

If we want to retrieve k2 from the map we would still get the correct result, i.e., only the first occurrence of the map will be returned. The probing method ensures that it will return after the first hit.

retrieve_all on the other hand doesn't use the probing method, but instead uses an array compaction mechanism, discarding any empty or erased slots. In this case, both k2 versions will be copied to the output, which results in the error you are seeing.

There's another problem here: In your example, if we erase k2 again, only the first occurrence will be replaced with a tombstone while the second one is still in the map. If we now probe for k2 again, we will get a positive lookup result, although we just erased k2.

@sleeepyjack
Copy link
Collaborator

I just clarified the bug description a bit. Hope that's ok @amiralirz

@sleeepyjack
Copy link
Collaborator

Dissecting the issue a bit more, I can come up with two possible solutions that involve changing the way how insert handles tombstones. We currently try to reclaim previously erased slots by treating them as regular empty slots during insertion. Unfortunately this causes the abovementioned bug to emerge by allowing two instances of k2 to be present in the map. Here are my proposed options:

Option 1: Don't reclaim tombstone slots

By treating tombstone slots as non-empty and also non-matching slots during insert, we eliminate the scenario where k2 is inserted at a tombstone slot, not knowing another instance of k2 is present.
While this is an easy fix, the downside of this approach is that we reduce the map's future capacity with every erased entry. A user could rehash the map after every N-many deletions to get rid of the tombstones periodically, however, this is a rather expensive operation.

Option 2: If a tombstone is found, keep looking for k2

If we want to reclaim the tombstone we have to make sure no other instance of k2 is present in the map. So we have to keep probing until we find a proper empty slot. In that case, we can safely insert k2 at the tombstone location. This is a bit more complex from an implementation perspective and I'm not yet 100% convinced this could lead to any other intricate edge case scenarios.

@PointKernel
Copy link
Member

Option 2: If a tombstone is found, keep looking for k2

I'm inclined to proceed with this solution, but as @sleeepyjack pointed out, the second insertion of k2 will be quite expensive.

@PointKernel PointKernel added the P1: Should have Necessary but not critical label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: Should have Necessary but not critical topic: static_map Issue related to the static_map topic: static_set Issue related to the static_set type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants