How to do incremental update with new rows where update_key is not set #380

ArcqStrato · 2024-09-18T15:53:38Z

Suppose I have an Oracle table with two columns, ID (the primary key) and MDATUM (the update key).

ID	MDATUM
1	NULL
2	NULL
3	2024-09-18 02:00:00.000

Now I upload this table to Snowflake.

Next, one row is added to the table (ID=4), and one row is updated (ID=2), which means it gets an MDATUM:

so it looks like this:

ID	MDATUM	note
1	NULL
2	2024-09-18 03:00:00.000	this row is updated
3	2024-09-18 02:00:00.000
4	NULL	this row is new

How can get those two new rows into Snowflake using an incremental load?

When I load with

--mode incremental \
--primary-key 'ID' \
--update-key 'MDATUM' \

then I get this table:

ID	MDATUM	note
1	NULL
2	2024-09-18 03:00:00.000	correctly updated
3	2024-09-18 02:00:00.000

As you can see, ID=3 is updated as expected. The new row (ID=4), on the other hand, is not inserted into the target table.

I believe this is in line with your documentation.

However, from a practical standpoint, this is not helpful. My example above is a real-life example from our production table. And it makes sense: If a new row is added then that's not an update, and thus the update_key is empty. Only if a row is modified then the update_key is set, which makes sense.

I therefore believe there should be another load strategy that loads "new records after max(update_key)" plus "new records with primary_key larger then max(primary_key).

A workaround is I can make run Sling twice:

First, I load using these parameters:

--mode incremental \
--primary-key 'ID' \
--update-key 'MDATUM' \

This gets me all rows where the update_key MDATUM is newer then max(MDATUM) in my target table, i.e., all updated rows.

Then I run Sling again and this time I use:

--mode incremental \
--primary-key 'ID' \
--update-key 'ID' \

This should get me all IDs where the ID is higher then max(ID) in my target table, i.e., all new rows.

I feel my use-case is quite plausible and should be quite widespread.

What are your thoughts on this? And do you see a problem with my workaround?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do incremental update with new rows where update_key is not set #380

How to do incremental update with new rows where update_key is not set #380

ArcqStrato commented Sep 18, 2024

How to do incremental update with new rows where update_key is not set #380

How to do incremental update with new rows where update_key is not set #380

Comments

ArcqStrato commented Sep 18, 2024