snowplow_unified v0.5.0 | Duplicate rows in snowplow_unified_dim_iso_639_2t causing duplicate session and user rows in derived tables #93
Labels
status:needs_triage
Needs maintainer triage.
type:bug
Bugs or weaknesses. The issue has to contain steps to reproduce.
Describe the bug
The seed file snowplow_unified_dim_iso_639_2t seem to have duplicate rows. Check screenshot.
What is happening is, when our this run tables are being created, it does a join with this seed file on language code, and because it has 2 rows for the same code, it is creating two rows post join. Thus causing a duplicate.
Steps to reproduce
This is the PR that caused it dd539fc
It was running fine with the previous version we had. But as we are now at 0.5.0, we faced a breaking change.
(https://hellofresh.slack.com/archives/D07FWKLMEG1/p1732790755671019)
Snowplow uses this seed https://www.loc.gov/standards/iso639-2/php/code_list.php which has duplicates there:
Expected results
Seed file should be cleaned and deduped before ingesting
Actual results
Seed file contains duplicates
Screenshots and log output
System information
The contents of your
packages.yml
file:# contents goes here
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
MAC
The output of
python --version
:Python 3.9.19
Additional context
Are you interested in contributing towards the fix?
The text was updated successfully, but these errors were encountered: