Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snowplow_unified v0.5.0 | Duplicate rows in snowplow_unified_dim_iso_639_2t causing duplicate session and user rows in derived tables #93

Open
1 of 6 tasks
akashgangulyhf opened this issue Nov 28, 2024 · 0 comments
Labels
status:needs_triage Needs maintainer triage. type:bug Bugs or weaknesses. The issue has to contain steps to reproduce.

Comments

@akashgangulyhf
Copy link

Describe the bug

The seed file snowplow_unified_dim_iso_639_2t seem to have duplicate rows. Check screenshot.
What is happening is, when our this run tables are being created, it does a join with this seed file on language code, and because it has 2 rows for the same code, it is creating two rows post join. Thus causing a duplicate.

image

Steps to reproduce

This is the PR that caused it dd539fc
It was running fine with the previous version we had. But as we are now at 0.5.0, we faced a breaking change.
(https://hellofresh.slack.com/archives/D07FWKLMEG1/p1732790755671019)
Snowplow uses this seed https://www.loc.gov/standards/iso639-2/php/code_list.php which has duplicates there:
image
image

Expected results

Seed file should be cleaned and deduped before ingesting

Actual results

Seed file contains duplicates

Screenshots and log output

System information

The contents of your packages.yml file:

# contents goes here

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • databricks
  • other (specify: ____________)

The output of dbt --version:

](snowflake: 1.8.3)

The operating system you're using:
MAC

The output of python --version:
Python 3.9.19

Additional context

Are you interested in contributing towards the fix?

@akashgangulyhf akashgangulyhf added the type:bug Bugs or weaknesses. The issue has to contain steps to reproduce. label Nov 28, 2024
@github-actions github-actions bot added the status:needs_triage Needs maintainer triage. label Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:needs_triage Needs maintainer triage. type:bug Bugs or weaknesses. The issue has to contain steps to reproduce.
Projects
None yet
Development

No branches or pull requests

1 participant