Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT]: Add hasher/comparator adaptor to better support mapping table use case #476

Open
3 tasks
PointKernel opened this issue May 7, 2024 · 3 comments
Labels
good first issue Good for newcomers P2: Nice to have Desired, but not necessary type: improvement Improvement / enhancement to an existing function

Comments

@PointKernel
Copy link
Member

PointKernel commented May 7, 2024

Is your feature request related to a problem? Please describe.

Many hash-based implementations rely on the mapping table method to handle large keys/values that are expensive to copy or move around. Every time users need to write the mapping table hasher/key_equal adaptors on their own, they shouldn't have to.

Describe the solution you'd like

As proposed by @esoha-nvidia, to better serve all users in this scenario, cuco could provide two handy utilities to the user:

  1. mapping_table_hash wrapper constructed from a span (a pointer to the beginning of the original data and a size) and the original hasher
  2. mapping_table_key_eq wrapper constructed from a span and the original key comparator

TODO:

  • Add two utilities (probably in include/cuco/utility/mapping_table.cuh)
  • Update the example accordingly
  • Update godbolt link in README

Describe alternatives you've considered

Seeking help for better naming

  • Is the mapping_table_ prefix descriptive enough? Any other suggestions?
  • STL also use pred/predicate to refer to key comparators but I found mapping_table_key_pred/mapping_table_pred can be misleading since we support conditional operations like insert_if and contains_if where the condition is also named as predicate
  • mapping_table_equal good enough? key_eq is the STL canonical but the _eq abbreviation is ambiguous.
@PointKernel PointKernel added good first issue Good for newcomers P2: Nice to have Desired, but not necessary type: improvement Improvement / enhancement to an existing function labels May 7, 2024
@esoha-nvidia
Copy link

The complication with this adaptor is that the key_equal function depends on the which table is in the input.

For example, say we are doing a bulk_insert followed by a bulk_lookup. And imagine, like in the case of a database, the insert is primary keys from one table and the lookup is foreign keys from another table.

When you do the insert, the key_equal function must perform primary_keys[index_to_insert] == primary_keys[slots_index]. But when we do the lookup, we must perform lookup_keys[index_to_lookup] == primary_keys[slots_index]. Above we have:

mapping_table_key_eq wrapper constructed from a pointer to the beginning of the original data, a size and the original key comparator

So which will be the "pointer to the beginning of the original data"? &primary_keys[0] or &lookup_keys[0]? It's impossible.

One solution would be, for example, to store both inside the equality functor and then make the index a tuple that indicates which table we should use for lookup. This is really wasteful of memory and also quite slow. The problem here is that when we are doing an insert versus a lookup, we know which table should be used for comparison, but that knowledge is thrown away because the equality operator is unable to use it.

I don't have a great solution for this, which is a major reason why I was unable to use cuCo. One possibility is to perhaps make an adapter around an existing hash table that would let you swap the equality operator. For example:

my_hash_table.bulk_insert(keys_to_insert); // Insert using default equality operator
mapping_table_key_eq lookup_table_eq(lookup_table); // Create a new equality operator.
my_hash_table.template with_eq<lookup_table_eq>(lookup_table_eq).bulk_lookup(lookup_table);

The with_eq function is creating a "view" of the original slots but with a new equality operator. The implementation is basically just a reinterpret_cast that modifies a template parameter, so it has no runtime cost. I can't think of a more elegant solution that still adheres to the STL hashing paradigm.

@njones93531
Copy link
Contributor

As mentioned in slack, I would like to work on this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers P2: Nice to have Desired, but not necessary type: improvement Improvement / enhancement to an existing function
Projects
None yet
Development

No branches or pull requests

3 participants