Data Quality Observation of Data Vault layer.
Installation
# package.yml - To be updated once package's published
packages:
- package: <TBD>/dq_vault
version: [">=0.1.0", "<1.0.0"]
# dbt_project.yml
on-run-end:
- '{{ dq_vault.store_test_results_json(results) }}'
vars:
dq_vault__enable_store_test: true or false
dq_vault__raw_db: 'your_custom_db or target.database'
dq_vault__raw_schema: 'your_custom_schema or target.schema'
dq_vault__selected_model_rules:
- hub: ['hub']
- sat: ['sat','satellite']
- link: ['link','tlink','t_link','lnk','tlnk','t_lnk']
- pit: ['pit']
- bridge: ['bridge']
- xts: ['xts']
In above:
dq_vault__enable_store_test
: bool- Set
true
to tell the package to capture the test results on the run end of dbt command. Defaultfalse
not to do anything.
- Set
dq_vault__raw_db
: string- Configure the database where the raw test log table (
RAW_TEST
) is created
- Configure the database where the raw test log table (
dq_vault__raw_schema
: string- Configure the schema where the raw test log table (
RAW_TEST
) is created
- Configure the schema where the raw test log table (
dq_vault__selected_model_rules
: list- Define the mapping for selecting the Data Vault models ONLY, currently relying on the model name. The order of item in the list does matter.
Currently there are 4 built-in test types based on the test name:
- Duplication: generic test name contains 'unique'
- Reconciliation:
- singular test
- generic test name contains 'equality', 'equal'
- Reference: generic test name contains 'reference', 'relationship'
- Unknown: default test type
- Using test config
models:
- name: my_model
tests:
- my_test:
test_type: duplication
- Using test meta
models:
- name: my_model
tests:
- my_test:
meta:
test_type: duplication
Listing the custom built-in macros in the packages
The integration_tests
directory contains a dbt project which tests the macros/models/etc in the dq-vault package. An integration test typically involves making 1) a new seed file 2) a new model file 3) a generic test to assert anticipated behaviour.
For an example integration tests, check out the tests for the get_datavault_type
macro:
- Macro definition
- Seed or Model file with fake data
- A generic test to assert the macro works as expected
Once you've added all of these files, you should be able to run:
Assuming you are in the integration_tests
folder,
dbt deps --target {your_target}
dbt seed --target {your_target}
dbt run --target {your_target} --model {your_model_name}
dbt test --target {your_target} --model {your_model_name}
Alternatively, at the root repo folder (/dq-vault
):
chmod +x run_test.sh
./run_test.sh {your_target} {your_models}
If the tests all pass, then you're good to go! All tests will be run automatically when you create a PR against this repo.
-
Quick Start (if you already setup the local dev):
- Start the shell
cd /path/to/dq-vault/integration_tests python3 -m poetry shell
- Some sample commands:
# Build model and capture test result - with refresh the resources (raw_tests table) dbt build --exclude source:run_result_log+ tag:failed tag:sample_custom --vars '{dq_vault__enable_store_test: true, fresh: true}' # Build integration test/models with potential failed cases dbt build --select tag:failed tag:sample_custom --vars '{dq_vault__enable_store_test: true}' # Build dq vault main models - downstream of test result log table dbt build --select source:run_result_log+
- Start the shell
-
Prequisites:
-
Install Python 3.9.6+ as recommended (specified in pyproject.toml)
Assuming your python alias:
python3
Don't need to use alias if your enviroment is not multi python version
-
Install
poetry
python3 -m pip install poetry
-
-
Setup dev local enviroment
- Set working dir
cd /path/to/dq-vault
- Install dependencies
python3 -m poetry install
- Start shell (equivalent to activate virtualenv)
python3 -m poetry shell
- Install dev dependencies
poe git-hooks # Yes, it's poe, it's not a spelling mistake :)
Now, you can play with dbt as further!
-
Verify dbt installed version
dbt --version
-
Copy profiles to '.dbt' dir (create if not exists) under the Users dir.
# Linux/MacOs mkdir ~/.dbt > /dev/null cp ./profiles/profiles.yml ~/.dbt/profiles.yml
NOTE: To simplify the dev, here we update the real
password
value (not usingenv_vars
) in the profiles.yml after copying -
Check dbt configs:
cd integration_tests dbt debug [--profiles-dir /path/to/profiles-dir]
-
Run your model
dbt deps dbt seed dbt run
To exit the shell:
exit # Enter