Metadata Reference
AutomateDV is metadata driven. On this page, we provide an overview of how to provide and store the data for AutomateDV
macros .
For all other metadata and configurations, please refer to
the dbt configurations reference .
For further details about how to use the macros in this section, see table templates .
Approaches
This page will describe just one way of providing metadata to the macros. There are many ways to do it, and it comes
down to user and organisation preference.
Note
The macros do not care how the metadata parameters get provided, as long as they are of the correct type.
Parameter data types definitions are available on the macros page. The approaches below are simply our
recommendations, which we hope provide a good balance of manageability and readability.
All approaches for the same structure will produce the same result/output, the only difference is how the metadata is provided.
It is worth noting that with larger projects, metadata management gets increasingly harder and can become unwieldy.
See the problem with metadata for a more detailed discussion.
We can reduce the impact of this problem by providing the metadata for a given model, in the model itself. This approach
does have the drawback that the creation of models is significantly less copy-and-paste, but the metadata management
improvements are usually worth it.
Per-model - Variables
You may also provide metadata on a per-model basis. This is useful if you have a large amount of metadata.
Example
Per-Model - YAML strings
If you want to provide metadata inside the model itself, but find yourself disliking the variables format for larger collections
of metadata or certain data types (e.g. dict literals), then providing a YAML string inside a block
set
assignment is a good alternative to using multiple individual set
assignments. This approach takes advantage of
the fromyaml()
built-in jinja function provided by dbt, which is
documented here .
The below example for a Hub is a little excessive for the small amount of metadata provided, so there is also a stage
example provided to help better convey the difference.
Warning
dbt does not yet provide any syntax checking in these YAML strings, often leading to confusing and
misleading error messages. If you find that variables which are extracted from the YAML string are empty, it is an
indicator that the YAML did not compile correctly, and you should check your formatting; including indentation.
Examples
Staging
Parameters
stage macro parameters
All components Only Source (ref style) Only Source (source style) Only hashing Only derived Only null key columns Only ranked Exclude Columns flag
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46 {% - set yaml_metadata -%}
source_model: raw_source
hashed_columns:
CUSTOMER_HK: CUSTOMER_ID
CUST_CUSTOMER_HASHDIFF:
is_hashdiff: true
columns:
- CUSTOMER_DOB
- CUSTOMER_ID
- CUSTOMER_NAME
- "!9999-12-31"
CUSTOMER_HASHDIFF:
is_hashdiff: true
columns:
- CUSTOMER_ID
- NATIONALITY
- PHONE
derived_columns:
RECORD_SOURCE: "!STG_BOOKING"
EFFECTIVE_FROM: BOOKING_DATE
null_columns:
required:
- CUSTOMER_ID
optional:
- CUSTOMER_REF
- NATIONALITY
ranked_columns:
AUTOMATE_DV_RANK:
partition_by: CUSTOMER_ID
order_by: BOOKING_DATE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{% set source_model = metadata_dict [ "source_model" ] %}
{% set derived_columns = metadata_dict [ "derived_columns" ] %}
{% set null_columns = metadata_dict [ "null_columns" ] %}
{% set hashed_columns = metadata_dict [ "hashed_columns" ] %}
{% set ranked_columns = metadata_dict [ "ranked_columns" ] %}
{{ automate_dv.stage ( include_source_columns = true ,
source_model = source_model ,
derived_columns = derived_columns ,
null_columns = null_columns ,
hashed_columns = hashed_columns ,
ranked_columns = ranked_columns ) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14 {% - set yaml_metadata -%}
source_model: raw_source
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{% set source_model = metadata_dict [ "source_model" ] %}
{{ automate_dv.stage ( include_source_columns = true ,
source_model = source_model ,
derived_columns = none ,
null_columns = none ,
hashed_columns = none ,
ranked_columns = none ) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 {% - set yaml_metadata -%}
source_model:
raw_source_name: source_table_name
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{% set source_model = metadata_dict [ "source_model" ] %}
{{ automate_dv.stage ( include_source_columns = true ,
source_model = source_model ,
derived_columns = none ,
null_columns = none ,
hashed_columns = none ,
ranked_columns = none ) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 {% - set yaml_metadata -%}
source_model: raw_source
hashed_columns:
CUSTOMER_HK: CUSTOMER_ID
CUST_CUSTOMER_HASHDIFF:
is_hashdiff: true
columns:
- CUSTOMER_DOB
- CUSTOMER_ID
- CUSTOMER_NAME
- "!9999-12-31"
CUSTOMER_HASHDIFF:
is_hashdiff: true
columns:
- CUSTOMER_ID
- NATIONALITY
- PHONE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{% set source_model = metadata_dict [ "source_model" ] %}
{% set hashed_columns = metadata_dict [ "hashed_columns" ] %}
{{ automate_dv.stage ( include_source_columns = false ,
source_model = source_model ,
derived_columns = none ,
null_columns = none ,
hashed_columns = hashed_columns ,
ranked_columns = none ) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 {% - set yaml_metadata -%}
source_model: raw_source
derived_columns:
RECORD_SOURCE: "!STG_BOOKING"
EFFECTIVE_FROM: BOOKING_DATE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{% set source_model = metadata_dict [ "source_model" ] %}
{% set derived_columns = metadata_dict [ "derived_columns" ] %}
{{ automate_dv.stage ( include_source_columns = false ,
source_model = source_model ,
derived_columns = derived_columns ,
null_columns = none ,
hashed_columns = none ,
ranked_columns = none ) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 {% - set yaml_metadata -%}
source_model: raw_source
null_columns:
required:
- CUSTOMER_ID
optional:
- CUSTOMER_REF
- NATIONALITY
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{% set source_model = metadata_dict [ "source_model" ] %}
{% set null_columns = metadata_dict [ "null_columns" ] %}
{{ automate_dv.stage ( include_source_columns = false ,
source_model = source_model ,
derived_columns = none ,
null_columns = null_columns ,
hashed_columns = none ,
ranked_columns = none ) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 {% - set yaml_metadata -%}
source_model: raw_source
ranked_columns:
AUTOMATE_DV_RANK:
partition_by: CUSTOMER_ID
order_by: BOOKING_DATE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{% set source_model = metadata_dict [ "source_model" ] %}
{% set ranked_columns = metadata_dict [ "ranked_columns" ] %}
{{ automate_dv.stage ( include_source_columns = false ,
source_model = source_model ,
derived_columns = none ,
null_columns = none ,
hashed_columns = none ,
ranked_columns = ranked_columns ) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 {% - set yaml_metadata -%}
source_model: raw_source
hashed_columns:
CUSTOMER_HK: CUSTOMER_ID
CUSTOMER_DETAILS_HASHDIFF:
is_hashdiff: true
exclude_columns: true
columns:
- PRICE
CUSTOMER_HASHDIFF:
is_hashdiff: true
columns:
- CUSTOMER_ID
- NATIONALITY
- PHONE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{% set source_model = metadata_dict [ "source_model" ] %}
{% set hashed_columns = metadata_dict [ "hashed_columns" ] %}
{{ automate_dv.stage ( include_source_columns = false ,
source_model = source_model ,
derived_columns = none ,
null_columns = none ,
hashed_columns = hashed_columns ,
ranked_columns = none ) }}
Hubs
Parameters
hub macro parameters
Per-model - YAML strings Per-Model - Variables
Single Source Multi Source Composite NK
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 {% - set yaml_metadata -%}
source_model: stg_web_customer_hashed
src_pk: CUSTOMER_HK
src_nk: CUSTOMER_ID
src_ldts: LOAD_DATETIME
src_source: RECORD_SOURCE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{{ automate_dv.hub ( src_pk = metadata_dict [ "src_pk" ],
src_nk = metadata_dict [ "src_nk" ],
src_ldts = metadata_dict [ "src_ldts" ],
src_source = metadata_dict [ "src_source" ],
source_model = metadata_dict [ "source_model" ]) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 {% - set yaml_metadata -%}
source_model:
- stg_web_customer_hashed
- stg_crm_customer_hashed
src_pk: CUSTOMER_HK
src_nk: CUSTOMER_ID
src_ldts: LOAD_DATETIME
src_source: RECORD_SOURCE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{{ automate_dv.hub ( src_pk = metadata_dict [ "src_pk" ],
src_nk = metadata_dict [ "src_nk" ],
src_ldts = metadata_dict [ "src_ldts" ],
src_source = metadata_dict [ "src_source" ],
source_model = metadata_dict [ "source_model" ]) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 {% - set yaml_metadata -%}
source_model: stg_customer_hashed
src_pk: CUSTOMER_HK
src_nk:
- CUSTOMER_ID
- CUSTOMER_DOB
src_ldts: LOAD_DATETIME
src_source: RECORD_SOURCE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{{ automate_dv.hub ( src_pk = metadata_dict [ "src_pk" ],
src_nk = metadata_dict [ "src_nk" ],
src_ldts = metadata_dict [ "src_ldts" ],
src_source = metadata_dict [ "src_source" ],
source_model = metadata_dict [ "source_model" ]) }}
Single Source Multi Source Composite NK
{% - set source_model = "stg_customer_hashed" -%}
{% - set src_pk = "CUSTOMER_HK" -%}
{% - set src_nk = "CUSTOMER_ID" -%}
{% - set src_ldts = "LOAD_DATETIME" -%}
{% - set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.hub ( src_pk = src_pk , src_nk = src_nk , src_ldts = src_ldts ,
src_source = src_source , source_model = source_model ) }}
{% - set source_model = [ "stg_web_customer_hashed" , "stg_crm_customer_hashed" ] -%}
{% - set src_pk = "CUSTOMER_HK" -%}
{% - set src_nk = "CUSTOMER_ID" -%}
{% - set src_ldts = "LOAD_DATETIME" -%}
{% - set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.hub ( src_pk = src_pk , src_nk = src_nk , src_ldts = src_ldts ,
src_source = src_source , source_model = source_model ) }}
{% - set source_model = "stg_customer_hashed" -%}
{% - set src_pk = "CUSTOMER_HK" -%}
{% - set src_nk = [ "CUSTOMER_ID" , "CUSTOMER_DOB" ] -%}
{% - set src_ldts = "LOAD_DATETIME" -%}
{% - set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.hub ( src_pk = src_pk , src_nk = src_nk , src_ldts = src_ldts ,
src_source = src_source , source_model = source_model ) }}
Links
Parameters
link macro parameters
Per-model - YAML strings Per-Model - Variables
Single Source Multi Source
{% - set source_model = "v_stg_orders" -%}
{% - set src_pk = "LINK_CUSTOMER_NATION_HK" -%}
{% - set src_fk = [ "CUSTOMER_HK" , "NATION_HK" ] -%}
{% - set src_ldts = "LOAD_DATETIME" -%}
{% - set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.link ( src_pk = src_pk , src_fk = src_fk , src_ldts = src_ldts ,
src_source = src_source , source_model = source_model ) }}
{% - set source_model = [ "v_stg_orders" , "v_stg_transactions" ] -%}
{% - set src_pk = "LINK_CUSTOMER_NATION_HK" -%}
{% - set src_fk = [ "CUSTOMER_HK" , "NATION_HK" ] -%}
{% - set src_ldts = "LOAD_DATETIME" -%}
{% - set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.link ( src_pk = src_pk , src_fk = src_fk , src_ldts = src_ldts ,
src_source = src_source , source_model = source_model ) }}
Transactional links
(also known as non-historised links)
Parameters
t_link macro parameters
Per-model - YAML strings Per-Model - Variables
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25 {% - set yaml_metadata -%}
source_model: v_stg_transactions
src_pk: TRANSACTION_HK
src_fk:
- CUSTOMER_HK
- ORDER_HK
src_payload:
- TRANSACTION_NUMBER
- TRANSACTION_DATE
- TYPE
- AMOUNT
src_eff: EFFECTIVE_FROM
src_ldts: LOAD_DATETIME
src_source: RECORD_SOURCE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{{ automate_dv.t_link ( src_pk = metadata_dict [ "src_pk" ],
src_fk = metadata_dict [ "src_fk" ],
src_payload = metadata_dict [ "src_payload" ],
src_eff = metadata_dict [ "src_eff" ],
src_ldts = metadata_dict [ "src_ldts" ],
src_source = metadata_dict [ "src_source" ],
source_model = metadata_dict [ "source_model" ]) }}
{% - set source_model = "v_stg_transactions" -%}
{% - set src_pk = "TRANSACTION_HK" -%}
{% - set src_fk = [ "CUSTOMER_HK" , "ORDER_HK" ] -%}
{% - set src_payload = [ "TRANSACTION_NUMBER" , "TRANSACTION_DATE" , "TYPE" , "AMOUNT" ] -%}
{% - set src_eff = "EFFECTIVE_FROM" -%}
{% - set src_ldts = "LOAD_DATETIME" -%}
{% - set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.t_link ( src_pk = src_pk , src_fk = src_fk , src_ldts = src_ldts ,
src_payload = src_payload , src_eff = src_eff ,
src_source = src_source , source_model = source_model ) }}
Satellites
Parameters
sat macro parameters
Per-model - YAML strings Per-Model - Variables
Standard Hashdiff Aliasing Exlude Payload Columns
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25 {% - set yaml_metadata -%}
source_model: v_stg_orders
src_pk: CUSTOMER_HK
src_hashdiff: CUSTOMER_HASHDIFF
src_payload:
- NAME
- ADDRESS
- PHONE
- ACCBAL
- MKTSEGMENT
- COMMENT
src_eff: EFFECTIVE_FROM
src_ldts: LOAD_DATETIME
src_source: RECORD_SOURCE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{{ automate_dv.sat ( src_pk = metadata_dict [ "src_pk" ],
src_hashdiff = metadata_dict [ "src_hashdiff" ],
src_payload = metadata_dict [ "src_payload" ],
src_eff = metadata_dict [ "src_eff" ],
src_ldts = metadata_dict [ "src_ldts" ],
src_source = metadata_dict [ "src_source" ],
source_model = metadata_dict [ "source_model" ]) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27 {% - set yaml_metadata -%}
source_model: v_stg_orders
src_pk: CUSTOMER_HK
src_hashdiff:
source_column: CUSTOMER_HASHDIFF
alias: HASHDIFF
src_payload:
- NAME
- ADDRESS
- PHONE
- ACCBAL
- MKTSEGMENT
- COMMENT
src_eff: EFFECTIVE_FROM
src_ldts: LOAD_DATETIME
src_source: RECORD_SOURCE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{{ automate_dv.sat ( src_pk = metadata_dict [ "src_pk" ],
src_hashdiff = metadata_dict [ "src_hashdiff" ],
src_payload = metadata_dict [ "src_payload" ],
src_eff = metadata_dict [ "src_eff" ],
src_ldts = metadata_dict [ "src_ldts" ],
src_source = metadata_dict [ "src_source" ],
source_model = metadata_dict [ "source_model" ]) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 {% - set yaml_metadata -%}
source_model: v_stg_orders
src_pk: CUSTOMER_HK
src_hashdiff: CUSTOMER_HASHDIFF
src_payload:
exclude_columns: true
columns:
- NAME
- PHONE
src_eff: EFFECTIVE_FROM
src_ldts: LOAD_DATETIME
src_source: RECORD_SOURCE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{{ automate_dv.sat ( src_pk = metadata_dict [ "src_pk" ],
src_hashdiff = metadata_dict [ "src_hashdiff" ],
src_payload = metadata_dict [ "src_payload" ],
src_eff = metadata_dict [ "src_eff" ],
src_ldts = metadata_dict [ "src_ldts" ],
src_source = metadata_dict [ "src_source" ],
source_model = metadata_dict [ "source_model" ]) }}
Standard Hashdiff Aliasing Exlude Payload Columns
1
2
3
4
5
6
7
8
9
10
11
12 {% - set source_model = "v_stg_orders" -%}
{% - set src_pk = "CUSTOMER_HK" -%}
{% - set src_hashdiff = "CUSTOMER_HASHDIFF" -%}
{% - set src_payload = [ "NAME" , "ADDRESS" , "PHONE" , "ACCBAL" , "MKTSEGMENT" , "COMMENT" ] -%}
{% - set src_eff = "EFFECTIVE_FROM" -%}
{% - set src_ldts = "LOAD_DATETIME" -%}
{% - set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.sat ( src_pk = src_pk , src_hashdiff = src_hashdiff ,
src_payload = src_payload , src_eff = src_eff ,
src_ldts = src_ldts , src_source = src_source ,
source_model = source_model ) }}
1
2
3
4
5
6
7
8
9
10
11
12 {% - set source_model = "v_stg_orders" -%}
{% - set src_pk = "CUSTOMER_HK" -%}
{% - set src_hashdiff = { "source_column" : "CUSTOMER_HASHDIFF" , "alias" : "HASHDIFF" } -%}
{% - set src_payload = [ "NAME" , "ADDRESS" , "PHONE" , "ACCBAL" , "MKTSEGMENT" , "COMMENT" ] -%}
{% - set src_eff = "EFFECTIVE_FROM" -%}
{% - set src_ldts = "LOAD_DATETIME" -%}
{% - set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.sat ( src_pk = src_pk , src_hashdiff = src_hashdiff ,
src_payload = src_payload , src_eff = src_eff ,
src_ldts = src_ldts , src_source = src_source ,
source_model = source_model ) }}
1
2
3
4
5
6
7
8
9
10
11
12 {% - set source_model = "v_stg_orders" -%}
{% - set src_pk = "CUSTOMER_HK" -%}
{% - set src_hashdiff = "CUSTOMER_HASHDIFF" -%}
{% - set src_payload = { "exclude_columns" : "true" , "columns" : [ "NAME" , "PHONE" ]} -%}
{% - set src_eff = "EFFECTIVE_FROM" -%}
{% - set src_ldts = "LOAD_DATETIME" -%}
{% - set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.sat ( src_pk = src_pk , src_hashdiff = src_hashdiff ,
src_payload = src_payload , src_eff = src_eff ,
src_ldts = src_ldts , src_source = src_source ,
source_model = source_model ) }}
Hashdiff aliasing allows you to set an alias for the HASHDIFF
column.
Read more
Effectivity Satellites
Parameters
eff_sat macro parameters
Per-model - YAML strings Per-Model - Variables
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24 {% - set yaml_metadata -%}
source_model: v_stg_order_customer
src_pk: ORDER_CUSTOMER_HK
src_dfk:
- ORDER_HK
src_sfk: CUSTOMER_HK
src_start_date: START_DATE
src_end_date: END_DATE
src_eff: EFFECTIVE_FROM
src_ldts: LOAD_DATETIME
src_source: RECORD_SOURCE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{{ automate_dv.eff_sat ( src_pk = metadata_dict [ "src_pk" ],
src_dfk = metadata_dict [ "src_dfk" ],
src_sfk = metadata_dict [ "src_sfk" ],
src_start_date = metadata_dict [ "src_start_date" ],
src_end_date = metadata_dict [ "src_end_date" ],
src_eff = metadata_dict [ "src_eff" ],
src_ldts = metadata_dict [ "src_ldts" ],
src_source = metadata_dict [ "src_source" ],
source_model = metadata_dict [ "source_model" ]) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14 {% - set source_model = "v_stg_order_customer" -%}
{% - set src_pk = "ORDER_CUSTOMER_HK" -%}
{% - set src_dfk = [ "ORDER_HK" ] -%}
{% - set src_sfk = "CUSTOMER_HK" -%}
{% - set src_start_date = "START_DATE" -%}
{% - set src_end_date = "END_DATE" -%}
{% - set src_eff = "EFFECTIVE_FROM" -%}
{% - set src_ldts = "LOAD_DATETIME" -%}
{% - set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.eff_sat ( src_pk = src_pk , src_dfk = src_dfk , src_sfk = src_sfk ,
src_start_date = src_start_date , src_end_date = src_end_date ,
src_eff = src_eff , src_ldts = src_ldts , src_source = src_source ,
source_model = source_model ) }}
Multi-Active Satellites (MAS )
Parameters
ma_sat macro parameters
Per-model - YAML strings Per-Model - Variables
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 {% - set yaml_metadata -%}
source_model: v_stg_orders
src_pk: CUSTOMER_HK
src_cdk:
- CUSTOMER_PHONE
src_payload:
- CUSTOMER_NAME
src_hashdiff: HASHDIFF
src_eff: EFFECTIVE_FROM
src_ldts: LOAD_DATETIME
src_source: RECORD_SOURCE
{% - endset -%}
{% set metadata_dict = fromyaml ( yaml_metadata ) %}
{{ automate_dv.ma_sat ( src_pk = metadata_dict [ "src_pk" ],
src_cdk = metadata_dict [ "src_cdk" ],
src_payload = metadata_dict [ "src_payload" ],
src_hashdiff = metadata_dict [ "src_hashdiff" ],
src_eff = metadata_dict [ "src_eff" ],
src_ldts = metadata_dict [ "src_ldts" ],
src_source = metadata_dict [ "src_source" ],
source_model = metadata_dict [ "source_model" ]) }}
1
2
3
4
5
6
7
8
9
10
11
12 {% - set source_model = "v_stg_customer" -%}
{% - set src_pk = "CUSTOMER_HK" -%}
{% - set src_cdk = [ "CUSTOMER_PHONE" , "EXTENSION" ] -%}
{% - set src_hashdiff = "HASHDIFF" -%}
{% - set src_payload = [ "CUSTOMER_NAME" ] -%}
{% - set src_eff = "EFFECTIVE_FROM" -%}
{% - set src_ldts = "LOAD_DATETIME" -%}
{% - set src_source = "RECORD_SOURCE" -%}
{{ automate_dv.ma_sat ( src_pk = src_pk , src_cdk = src_cdk , src_hashdiff = src_hashdiff ,
src_payload = src_payload , src_eff = src_eff , src_ldts = src_ldts ,
src_source = src_source , source_model = source_model ) }}
Extended Tracking Satellites (XTS )
Parameters
xts macro parameters
Per-model - YAML strings Per-Model - Variables
Tracking a single satellite Tracking multiple satellites
{ %- set source_model = "v_stg_customer" -% }
{ %- set src_pk = "CUSTOMER_HK" -% }
{ %- set src_satellite = { "SATELLITE_CUSTOMER" : { "sat_name" : { "SATELLITE_NAME" : "SATELLITE_NAME_1" } , "hashdiff" : { "HASHDIFF" : "CUSTOMER_HASHDIFF" }}}
{ %- set src_ldts = "LOAD_DATETIME" -% }
{ %- set src_source = "RECORD_SOURCE" -% }
{{ automate_dv . xts ( src_pk = src_pk , src_satellite = src_satellite , src_ldts = src_ldts ,
src_source = src_source , source_model = source_model ) }}
1
2
3
4
5
6
7
8
9
10
11
12
13
14 { %- set source_model = "v_stg_customer" -% }
{ %- set src_pk = "CUSTOMER_HK" -% }
{ %- set src_satellite = "SAT_CUSTOMER_DETAILS" : {
"sat_name" : { "SATELLITE_NAME" : "SATELLITE_NAME_1" } ,
"hashdiff" : { "HASHDIFF" : "CUSTOMER_HASHDIFF" }
} , "SAT_ORDER_DETAILS" : {
"sat_name" : { "SATELLITE_NAME" : "SATELLITE_NAME_2" } ,
"hashdiff" : { "HASHDIFF" : "ORDER_HASHDIFF" }
}} -% }
{ %- set src_ldts = "LOAD_DATETIME" -% }
{ %- set src_source = "RECORD_SOURCE" -% }
{{ automate_dv . xts ( src_pk = src_pk , src_satellite = src_satellite , src_ldts = src_ldts ,
src_source = src_source , source_model = source_model ) }}
Understanding the src_satellite
parameter
The src_satellite
parameter provides the means to define the satellites which the XTS tracks.
The mapping matches columns in the stage layer (using the stage
macro) to the SATELLITE_NAME
and HASHDIFF
columns in the XTS .
...
"sat_name": {"SATELLITE_NAME": "SATELLITE_NAME_1"}
...
In the above example we expect a SATELLITE_NAME_1
column in the Stage, defined as follows:
...
derived_columns:
SATELLITE_NAME_1: "!SAT_CUSTOMER_DETAILS"
...
This works exactly the same way for the HASHDIFF
column, as defined by:
...
"hashdiff": {"HASHDIFF": "ORDER_HASHDIFF"}
...
For example, the 'Tracking multiple satellites' metadata examples above would produce the XTS in the table below, given
the following derived columns:
...
derived_columns:
SATELLITE_NAME_1: "!SAT_CUSTOMER_DETAILS"
SATELLITE_NAME_2: "!SAT_ORDER_DETAILS"
...
CUSTOMER_HK
HASHDIFF
SATELLITE_NAME
LOAD_DATE
SOURCE
B8C37E...
3C598...
SAT_CUSTOMER_DETAILS
1993-01-01
*
.
.
.
.
.
.
.
.
.
.
FED333...
6C958...
SAT_ORDER_DETAILS
1993-01-01
*
Point-In-Time (PIT ) Tables
Parameters
pit macro parameters
Bridge tables
Parameters
bridge macro parameters
Reference Tables
Parameters
ref_table macro parameters
When metadata gets stored in the dbt_project.yml
, you can probably foresee the file getting very large for bigger
projects. If your metadata gets defined and stored in each model, it becomes harder to generate and develop with, but it
can be easier to manage. Model-level metadata alleviates the issue, but will not completely solve it.
Whichever approach gets chosen, metadata storage and retrieval is difficult without a dedicated tool. To help manage
large amounts of metadata, we recommend the use of third-party enterprise tools such as WhereScape, Matillion, or Erwin
Data Modeller.
In the future, dbt will likely support better ways to manage metadata at this level, to put off the need for a tool a
little longer. Discussions are already ongoing , and we hope to be able to
advise on better ways to manage metadata in the future.