Welcome to the Advanced Testing workshop of Coalesce 2023!
You've been using dbt for a while, and are comfortable with the out-of-the-box dbt tests:
- unique
- not_null
- accepted_values
- relationships
But... a part of you wonders if there's other tests you should be using. What are some best practices? dbt Community Slack... documentation... on-demand courses... blog posts... there's so many places and such little time!
Well, if you've decided to join us for this workshop - you've come to the right place.
We'll hold hands and walk through advanced testing concepts together. By the end of it you'll be an
expert at all the tools at your disposal and can rest easy knowing how to improve your data quality checks.
Prerequisites This workshop assumes that you're familiar with dbt and at a minimum know how to:
- Apply and run out-of-the-box dbt tests
- Execute dbt commmands
- Use git functionality
- Write basic SQL
It's a bonus if:
- You understand languages like Jinja (templating) or Python (scripting)
- You've written some basic Jinja or Python statements in the past
If you'll be joining us live at location, you'll be given access to a project for this workshop. You can also bring your own dbt Project!
Verify that you're set up and ready to go once you have access to a project:
- Navigate to the dbt Cloud account
- Click the cog in the top right corner and go to Profile Settings
- Click on User Profile > Credentials in the left sidebar
- Click on the project name you want to set credentials for
- Click Edit in the bottom right corner of the pane that opens
- Configure your development credentials:
- Scroll to the Development Credentials section in the pane
- Change these configurations:
configuration value schema Set this to dbt_[first_initial][last_name]
. Example:dbt_cberger
target Set this to dev
- Click Save
- Verify your connection:
- Click the Develop tab in the top menu bar
- Create a new branch named
coalesce_[first_initial][lastname]
. Example:coalesce_cberger
- Run
dbt deps
if needed to install dependencies (you'll be informed via the UI) - Try running
dbt build
from the command bar at the bottom of the UI
Before completing this workshop, there are some prerequisites that you'll need:
-
A Repository
Ideally, with the files and folders contained in this workshop. To make a copy, fork this repository. -
dbt This workshop is written using dbt Cloud, so a Cloud account is ideal. If using core, you'll need to pay attention to the differences to complete this workshop on your own.
Ideally, you'll also want to work with the dbt version this project was built with (dbt v1.6)
Resources:
-
Some Data
This project is written on top of Snowflake and uses the publicly available TPC-H data set which is be included in a Snowflake trial account.A truncated version of the data set has also been included in this project as CSV files if you're using a different data platform - this is located in the resources folder.
Important: We don't suggest seeding the CSV files. Though they are truncated, there are still a signifcant amount of rows. As a best practice we use seed functionality for small and static datasets where seeds are more performant for this specific purpose.
If you don't have a data platform set up yet, don't worry. These guides are great for getting you set up: BigQuery
- Instructions for setting up a free BigQuery account
- Instructions for loading CSV files into BigQuery
- dbt Cloud quickstart for BigQuery
- Starter instructions for accessing the TPC-H dataset yourself
- Instructions for adding the data from this project:
- Download each .CSV file from the
_resources/tpch_dataset
folder - In the BigQuery UI's
Explorer
pane, click the three dots next to your project name - Click
Create dataset
. - For
Dataset ID
, typeraw_tpch
. - Click
Create dataset
- You should now see your dataset listed under your project name. Click the three dots next to the dataset.
- Click
Create table
- Choose
Upload
as the Create table from option. - Click
Browse
underSelect file
- Upload each file you downloaded:
- For the table name, use the file name without the extension. Some file names have
_100mb
appended. Omit this. - Make sure to check
Auto detect
under Schema
- For the table name, use the file name without the extension. Some file names have
- Download each .CSV file from the
Snowflake
Verification if using this repository and dbt Cloud
- Navigate to your dbt Cloud account
- Click the cog in the top right corner and go to Profile Settings
- Click on User Profile > Credentials in the left sidebar
- Click on the project name you want to set credentials for
- Click Edit in the bottom right corner of the pane that opens
- Configure your development credentials:
- Scroll to the Development Credentials section in the pane
- Change these configurations:
configuration value schema Set this to dbt_[first_initial][last_name]
. Example:dbt_cberger
target Set this to dev
- Click Save
- Verify your connection:
- Click the Develop tab in the top menu bar
- Create a new branch named
coalesce_[first_initial][lastname]
. Example:coalesce_cberger
- Run
dbt deps
if needed to install dependencies (you'll be informed via the UI) - Try running
dbt build
from the command bar at the bottom of the UI
You're ready to start the workshop! There's nothing else to do until the workshop begins.
- Learn more about dbt in the docs
- Check out Discourse for commonly asked questions and answers
- Join the dbt community for more help and hang with other data practitioners
- Find dbt events near you
- Check out the blog for the latest news on dbt's development and best practices