ELT materialization policies in DataStage
Materialization policies define the number and types of database artifacts (views or tables) that
are created for intermediate data when you run a DataStage flow in ELT mode. The policy selected
impacts execution performance and intermediate data volume. Execution in ELT mode uses DBT, so ELT
materialization policies overlap with DBT
materializations. Configure settings in your flow's output connector to define target dataset
materialization. When a target dataset gets recreated with table action = replace
,
it is materialized as a default DBT materialization table, while table action =
append
uses a custom DBT materialization that implements various write modes supported by
DataStage, including insert, update, delete, and more.
Generate nested SQL
Generate nested SQL is the default materialization policy. When selected, the flow builds separate single SQL statements for each output table. Each statement contains nested SQL statements for all upstream nodes of the output table. This approach delegates all optimization work to a database engine.
The Nested Query
materialization policy typically avoids creating any
intermediate database artifacts. This simplifies cleanup and avoids leftovers after execution in
case of failure. If an input connector uses a custom SQL statement, then nesting is not possible,
because the relational algebra model cannot be built. In this case, you must select a policy that
generates intermediate views, or the intermediate data is not materialized.
In some cases, flows will have slower performance with this policy. In a flow with multiple output connectors that share a large number of the same upstream stages, the SQL statements generated for each of the output connectors will contain the same logic, which will be executed multiple times by the database. To avoid this repetitive execution, the Advanced policy is recommended.
Advanced
This policy performs similarly to Generate nested SQL, but outperforms other policies when a flow has a high average number of links per stage.
Link as view
All links of a flow are materialized as views and intermediate data is removed after execution.
Link as table
All links of a flow are materialized as tables and intermediate data is removed after execution. This policy performs similarly to Generate nested SQL because the database engine applies internal optimization algorithms when it builds output tables. Querying intermediate data has less performance impact than in Link as view.
Choosing a policy
In cases where flows have multiple output tables to build in parallel, the Advanced policy may perform better than Generate nested SQL, which does best in most cases. The performance of these policies also varies based on database engine.