dbt-audit-helper-ext

Extended Audit Helper solution 💪

This repository provides a collection of powerful macros designed to enhance data validation workflows that support:

Historical Logging: Automatically saving detailed validation results into a designated DWH table for comprehensive audit tracking
Latest Summary Reporting: Maintaining a concise, up-to-date summary table for quick insights into the current state of validations
Codegen and Scripts: Simplifying workflows, particularly valuable for migration projects by automating repetitive tasks

Data Warehouses:

❄️ Snowflake (default)
☁️ BigQuery

Installation

Add to packages.yml file:

packages:
  - package: infinitelambda/audit_helper_ext
    version: [">=0.1.0", "<1.0.0"]
    # keep an eye on the latest version, and change it accordingly

Or use the latest version from git:

packages:
  - git: "https://github.com/infinitelambda/dbt-audit-helper-ext"
    revision: <release version or tag> # 0.1.0

And run dbt deps to install the package!

Initialize the resources:
```
dbt deps
dbt run -s audit_helper_ext
```
This step will create log table (validation_log) and the summary view on top (validation_log_report)
Generate the validation macros:

Check /scripts directory for all the codegen utilities

Firstly, we need to determine the location (database and schema) of the source tables:

** If all source tables are in the same location, we can use the environment variable to set these values:
```
export SOURCE_SCHEMA=MY_SOURCE_SCHEMA
export SOURCE_DATABASE=MY_SOURCE_DATABASE
```
** If having multiple locations, we can start to configure the location inside each dbt models' config block:
```
{{
  config(
    ...
    audit_helper__source_database = 'MY_SOURCE_SCHEMA',
    audit_helper__source_schema = 'MY_SOURCE_DATABASE'
  )
}}
...
```
Then, we can start generating the validation macro files now. Let's say we need to validate all models in 03_mart directory:
```
python dbt_packages/audit_helper_ext/scripts/create_validation_macros.py models/03_mart
```
Or just aim to validation a specific model which is 03_mart/dim_sales:
```
python dbt_packages/audit_helper_ext/scripts/create_validation_macros.py \
  models/03_mart \
  dim_sales
```
Finally, check out your dbt project at the directory named macros/validation!

Validation Strategy

This repo contains the useful macros to support for saving the historical validation results into the DWH table (validation_log), together with the latest summary table (validation_log_report).

There are 3 main types of validation:

Count (count, source)
Column by Column (all_col, source)
Row by Row (full, source)

Additionally, we have the 4th type - upstream_row_count (source) which will be very useful to understand better the validtion context, for example, the result might be up to 100% matched rate but there is 0 updates in the upstream models, hence there no updates in the final table, that means we can't not say surely it was a perfect match.

Depending on projects, it might be vary in the strategy of validation. Therefore, in this package, we're suggesting 1 first approach that we've used successfully in the real-life migration project (Informatica to dbt).

Context: Our dbt project has 3 layers (staging, intermediate, and mart). Each mart model will have the independant set of upstream models, or it is the isolated pipeline for each mart model. We want to validate mart models only.

Goal: 100% matched rate ✅, >=99% is still good 🟡, and below 99% is unacceptable ❌

Pre-requisites: 2 consecutive snapshots (e.g. Day1, Day2) of both source data and mart tables

Flow:

Freeze the source data, so we have source__YYYYMMD1 and source__YYYYMMD2, mart__YYYYMMD1 and mart__YYYYMMD2
Scenario 1: Validate the fresh run against D1
- Configure source yml to use source__YYYYMMD1
- Run dbt to build mart tables, callled mart_dbt
- Run validation macros to compare between mart_dbt vs mart__YYYYMMD1 👍
Scenario 2: Validate the incremental run against D2 based on D1
- Configure source yml to use source__YYYYMMD2
- Clone mart__YYYYMMD1 to mart_dbt to mimic that dbt should have the D1 data already (e.g. clone_relation)
- Run incrementally dbt to build mart tables
- Run validation macros to compare between mart_dbt vs mart__YYYYMMD2 👍👍

Finnally, check the validation log report, and decide what to do next steps:

🛩️ Sample report table on Snowflake:

💡 Optionally, let's build the Sheet to communicate the outcome with clent, here is the BigQuery+GGSheet sample:

Demo

dbt-audit-helper Extension - First Version - Watch Video

How to Contribute

dbt-audit-helper-ext is an open-source dbt package. Whether you are a seasoned open-source contributor or a first-time committer, we welcome and encourage you to contribute code, documentation, ideas, or problem statements to this project.

👉 See CONTRIBUTING guideline

🌟 And finally, kudos to our beloved OG Contributors who orginally developed the macros and scripts in this package: @William, @Duc, @Csabi, @Adrien & @Dat

About Infinite Lambda

Infinite Lambda is a cloud and data consultancy. We build strategies, help organizations implement them, and pass on the expertise to look after the infrastructure.

We are an Elite Snowflake Partner, a Platinum dbt Partner, and a two-time Fivetran Innovation Partner of the Year for EMEA.

Naturally, we love exploring innovative solutions and sharing knowledge, so go ahead and:

🔧 Take a look around our Git

✏️ Browse our tech blog

We are also chatty, so:

👀 Follow us on LinkedIn

👋🏼 Or just get in touch

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github		.github
docs/assets/img		docs/assets/img
integration_tests		integration_tests
macros		macros
models		models
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.sqlfluffignore		.sqlfluffignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
dbt_project.yml		dbt_project.yml
dependencies.yml		dependencies.yml
package-lock.yml		package-lock.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbt-audit-helper-ext

Installation

Validation Strategy

Demo

How to Contribute

About Infinite Lambda

About

Releases 1

Languages

License

infinitelambda/dbt-audit-helper-ext

Folders and files

Latest commit

History

Repository files navigation

dbt-audit-helper-ext

Installation

Validation Strategy

Demo

How to Contribute

About Infinite Lambda

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 1

Languages