Dbt supercharges sql with macros
dbt supercharges SQL with macros
With dbt you can leverage the full power of Jinja templating language in your SQL scripts. These code blocks are compiled into valid statements before being executed on a target database. Let me show you some useful tricks to use with the CORE DWH module.
Flexible row filtering
You have already seen filter_rows()
macro invocation in place of WHERE expression. It is used in every staging model. It helps fetch only relevant account rows and limit the number of rows for development and testing purposes if a table is large enough (especially useful for fact tables – sessions, hits, conversions).
See the stg_ym_sessions_facts code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
with source as (
select
{{ surrogate_key(["account_id",
"clientids_id",
"dates_id",
"traffic_id",
"locations_id",
"devices_id"])
}} as id
, f.account_id
, f.clientids_id
, f.dates_id
, f.traffic_id
, f.locations_id
, f.devices_id
, f.sessions
, f.bounces
, f.pageviews
, f.duration
, gd.dt
, gd.ts
from {{ source('metrika', 'sessions_facts') }} as f
left join {{ ref('stg_general_dates') }} as gd
on gd.id = f.dates_id
{{ filter_rows(
account_id=var('account_id_metrika'),
last_number_of_days=true,
ts_field='dt'
) }}
)
select * from source
It compiles to a target script as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
with source as (
select
HASHBYTES(
'SHA2_256',
concat(
coalesce(cast(account_id as NVARCHAR), ''),
'-',
coalesce(cast(clientids_id as NVARCHAR), ''),
'-',
coalesce(cast(dates_id as NVARCHAR), ''),
'-',
coalesce(cast(traffic_id as NVARCHAR), ''),
'-',
coalesce(cast(locations_id as NVARCHAR), ''),
'-',
coalesce(cast(devices_id as NVARCHAR), '')
)
) as id
, f.account_id
, f.clientids_id
, f.dates_id
, f.traffic_id
, f.locations_id
, f.devices_id
, f.sessions
, f.bounces
, f.pageviews
, f.duration
, gd.dt
, gd.ts
from "master"."core_ci"."metrika_sessions_facts" as f
left join "master"."core_ci_staging"."stg_general_dates" as gd
on gd.id = f.dates_id
where 1 = 1
and account_id in (21600)
and 1 = 1
)
select * from source
Here is the source code of macro filter_rows itself:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- filter data for selected accounts, resize for dev, ci pipelines
{% macro filter_rows(
account_id=none,
last_number_of_days=none,
ts_field='dt'
) -%}
{#- prepare expression to filter on according account_id -#}
{% if account_id -%}
{%- set filter_account_id = 'account_id in (' ~ account_id ~ ')' -%}
{% else -%}
{%- set filter_account_id = '1 = 1' -%}
{%- endif -%}
{#- prepare expression to filter only last N days of data (e.g. last 3 days) -#}
{%- if target.name in ['dev', 'ci'] and last_number_of_days -%}
{%- set limit_rows = ts_field ~ ' >= dateadd(day, ' ~ -var('filter_days_of_data') ~ ', convert(date, getdate()))' -%}
{%- else -%}
{%- set limit_rows = '1 = 1' -%}
{%- endif -%}
{#- prepare final filter expression -#}
where 1 = 1
and {{ filter_account_id }}
and {{ limit_rows }}
{%- endmacro -%}
As a result WHERE expression consists of several parts:
- Filtering on account id if one is passed (or using 1 = 1 if none is passed)
- Limit the number of rows to 3 last days for
dev
andci
environments where turned on (fact tables)
Source systems account identifiers are supplied at project level in dbt_project.yml as variables:
1
2
3
4
5
6
7
8
9
10
11
vars:
filter_days_of_data: 3
mybi_dbt_core:
account_id_metrika: null
account_id_b24: null
account_id_direct: null
account_id_facebook: null
account_id_adwords: null
account_id_mytarget: null
account_id_amocrm: null
account_id_ga: null
When you import the CORE DWH module you simply override these variables with desired values.