Point In Time (PIT) tables
A Point-In-Time table is a query assistant structure, part of the Business Vault, meant to improve the performance of loading and creating the information marts. Given a supplied list of dates/timestamps in an As of Date table, the PIT table will identify the relevant records from each Satellite for that specific date/timestamp and record the Hash Key and the LDTS value of that Satellite record. By identifying the coordinates of the relevant records at each point-in-time a priori, the information marts queries can make use of equi-joins which offer a significant boost in performance.
The recommendation is to use the PIT table when referencing at least two Satellites and especially when the Satellites have different rates of update.
Structure¶
Our Point-In-Time structures will contain:
Source Model (source_model)¶
This is the name of the parent Hub that contains the primary key (src_pk) and to which the Satellites are connected to.
Primary Key (src_pk)¶
A primary key (or surrogate key) which is usually a hashed representation of the natural key. This will be the primary key used by the parent Hub.
Load Date/Timestamp (src_ldts)¶
This is a string with the name of the Hub's Load Date/Timestamp column
As of Date Table (as_of_dates_table)¶
The as_of_dates_table
describes the history needed to construct the PIT table as a list of dates. This is where you would
supply the name of your As of Date table.
Satellites (satellites)¶
This is a dictionary that contains the metadata for the Satellites in subject. It will have three levels of keys.
The first level key is the name of the Satellite in uppercase.
The second level keys will be pk and ldts.
The third level key will be 'PK' and 'LDTS'. The expected value for the 'PK' key is the Hash Key column name of the Satellite (e.g. CUSTOMER_HK). The expected value for the 'LDTS' key is the Load Date/Timestamp column name of the Satellite (e.g. LOAD_DATETIME).
Stage Models (stage_tables_ldts)¶
This is a dictionary that contains the names of the Load Date/Timestamp columns for each stage table sourcing the Satellites.
The keys in the dictionary will be the stage table names (e.g. 'STG_CUSTOMER_DETAILS), whereas the values will be the name of the Load Date/Timestamp column for that stage table (e.g. 'LOAD_DATETIME')
Tip
To see a full example of how the metadata needs to be defined for a PIT object, please check the PIT section on the metadata page.
Creating PIT models¶
Create a new dbt model as before. We'll call this one pit_customer
.
1 2 3 4 5 |
|
To create a PIT model, we simply copy and paste the above template into a model named after the PIT we are creating. AutomateDV will generate a PIT using parameters provided in the next steps.
Materialisation¶
PIT tables should use the pit_incremental
materialisation, as they will be remade with every new As of Date table.
Adding the metadata¶
Let's look at the metadata we need to provide to the pit macro.
Parameter | Value |
---|---|
source_model | hub_customer |
src_pk | CUSTOMER_HK |
as_of_dates_table | AS_OF_DATE |
satellites | {"SAT_CUSTOMER_DETAILS": |
{"pk": {"PK": "CUSTOMER_HK"}, {"ldts": {"LDTS": "LOAD_DATETIME"}} | |
} | |
{"SAT_CUSTOMER_LOGIN": | |
{"pk": {"PK": "CUSTOMER_HK"}, {"ldts": {"LDTS": "LOAD_DATETIME"}} | |
} | |
stage_tables_ldts | {"STG_CUSTOMER_DETAILS": "LOAD_DATETIME", |
"STG_CUSTOMER_LOGIN": "LOAD_DATETIME"} | |
src_ldts | LOAD_DATETIME |
Source table¶
Here we will define the metadata for the source_model. We will use the HUB_CUSTOMER that we built before.
1 2 3 |
|
Primary Key¶
Next we need add the Hub's Primary Key column
1 2 3 4 |
|
Load Date/Timestamp¶
Next, we add the Load Date/Timestamp column name of the parent Hub
1 2 3 4 5 |
|
As of Date Table¶
Next, we provide the table name of the PIT's As of Date table.
1 2 3 4 5 6 |
|
Satellites metadata¶
Here we add the Satellite related details (i.e. the Primary/Hash Key and the Load Date/Timestamp column names)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Stage metadata¶
Finally, we add Satellites' stage table names and their Load Date/Timestamp column names
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Now, our model should look like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
Note
See our metadata reference for more details on how to provide metadata to PITs.
Running dbt¶
With our model complete and our YAML written, we can run dbt to create our pit_customer
table.
dbt run -s +pit_customer
The resulting Point in Time table would look like this:
CUSTOMER_HK | AS_OF_DATE | SAT_CUSTOMER_DETAILS_PK | SAT_CUSTOMER_DETAILS_LDTS | SAT_CUSTOMER_LOGIN_PK | SAT_CUSTOMER_LOGIN_LDTS |
---|---|---|---|---|---|
HY67OE... | 2021-11-01 | HY67OE... | 2020-06-05 | 000000... | 1900-01-01 |
RF57V3... | 2021-11-01 | RF57V3... | 2017-04-24 | RF57V3... | 2021-04-01 |
. | . | . | . | . | . |
. | . | . | . | . | . |
HY67OE... | 2021-11-15 | HY67OE... | 2021-11-09 | HY67OE... | 2021-11-14 |
RF57V3... | 2021-11-15 | RF57V3... | 2017-04-24 | RF57V3... | 2021-04-01 |
. | . | . | . | . | . |
. | . | . | . | . | . |
HY67OE... | 2021-11-31 | HY67OE... | 2021-11-09 | HY67OE... | 2021-11-30 |
RF57V3... | 2021-11-31 | RF57V3... | 2021-11-20 | RF57V3... | 2021-04-01 |