0 / 0
ELT materialization policies in DataStage

ELT materialization policies in DataStage

Materialization policies define the number and types of database artifacts (views or tables) that are created for intermediate data when you run a DataStage flow in ELT mode. The policy selected impacts execution performance and intermediate data volume. Execution in ELT mode uses DBT, so ELT materialization policies overlap with DBT materializations. Configure settings in your flow's output connector to define target dataset materialization. When a target dataset gets recreated with table action = replace, it is materialized as a default DBT materialization table, while table action = append uses a custom DBT materialization that implements various write modes supported by DataStage, including insert, update, delete, and more.

Generate nested SQL

Generate nested SQL is the default materialization policy. When selected, the flow builds separate single SQL statements for each output table. Each statement contains nested SQL statements for all upstream nodes of the output table. This approach delegates all optimization work to a database engine.

The Nested Query materialization policy typically avoids creating any intermediate database artifacts. This simplifies cleanup and avoids leftovers after execution in case of failure. If an input connector uses a custom SQL statement, then nesting is not possible, because the relational algebra model cannot be built. In this case, you must select a policy that generates intermediate views, or the intermediate data is not materialized.

In some cases, flows will have slower performance with this policy. In a flow with multiple output connectors that share a large number of the same upstream stages, the SQL statements generated for each of the output connectors will contain the same logic, which will be executed multiple times by the database. To avoid this repetitive execution, the Advanced policy is recommended.

Advanced

The Advanced materialization policy creates a single SQL statement for a set of connected nodes that change data cardinality, or cardinality changers, in order to avoid repetitive computations when building output data. The following image shows how these nodes get combined:
Figure 1. Advanced materializion example
An example showing how a set of connected nodes gets combined to avoid repeating computation.
The stages defined as cardinality changers are grouped into a single database artifact. The SQL statement for that artifact includes the statements for the cardinality changers and their upstream stages. Other stages in the flow use simple SELECT statements and are not grouped.

This policy performs similarly to Generate nested SQL, but outperforms other policies when a flow has a high average number of links per stage.

Link as view

All links of a flow are materialized as views and intermediate data is removed after execution.

Link as table

All links of a flow are materialized as tables and intermediate data is removed after execution. This policy performs similarly to Generate nested SQL because the database engine applies internal optimization algorithms when it builds output tables. Querying intermediate data has less performance impact than in Link as view.

Choosing a policy

In cases where flows have multiple output tables to build in parallel, the Advanced policy may perform better than Generate nested SQL, which does best in most cases. The performance of these policies also varies based on database engine.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more