0 / 0
DataStage environments
Last updated: Dec 09, 2024
DataStage environments

Control how your DataStage jobs run on the runtime engine by configuring environments. You can run DataStage jobs in environments on IBM Cloud or you can run jobs locally by setting up environments with your own DataStage remote runtime engines.

DataStage environments on IBM Cloud

IBM® DataStage® offers three PX environments that you can use to run your jobs. A job uses Default DataStage PX S runtime by default. However, before you run the flow as a job, you can update the environment to any of the three environments that are available.

The three runtimes of IBM Cloud consume capacity unit hours (CUHs) that are tracked. Only the time it takes to run jobs is tracked. Creating, configuring, and updating flows on the DataStage canvas does not use any CUHs.

When you create a job in which to run a DataStage flow, you can select one of the following preset environments:

Name Hardware configuration
Default DataStage PX S 1 Conductor: 2 vCPU and 8 GB RAM
Default DataStage PX M 1 Conductor: 4 vCPU and 16 GB RAM
Default DataStage PX L 1 Conductor: 8 vCPU and 32 GB RAM

The Default DataStage PX S runtime is used when you run a job to extract, transform, and load data in DataStage, unless you select a different environment. For complex jobs with large data sets, select plans with more vCPU and memory to increase capacity. The default environments use 2 partitions.

To update the environment that you want to use:

  • On the DataStage canvas, select the run settings icon and select the environment that you want to use.
  • Select a job, edit the job configuration, and on the run settings tab, change the environment.

Administrators can create new environments for IBM Cloud to specify environment variables and change the number of partitions.

DataStage environments on remote runtime engines

You can run jobs in an environment that's not managed by IBM using a remote runtime engine. With a DataStage remote runtime engine, you can use on-premises applications and databases and run jobs locally. An administrator can configure DataStage remote runtime engines at the project level. Developers with Editor or Admin access to a project with a DataStage remote runtime engine can run jobs in that environment.

Once you select a remote environment as a project default environment, you can only use remote environments in that project. You cannot switch back to using IBM Cloud environments for that project's DataStage jobs.

Remote environments provide the following benefits:
  • Run workloads and process data locally
  • Avoid data transfer costs
  • Increase security by keeping data local to your cloud environment
  • Use DataStage features from Cloud Pak for Data such as User-defined stages, the Java Integration stage, Before/after job routines, and more, without maintaining a full Cloud Pak for Data install

Remote environments do not support connectors that need a driver upload, vaults, and the Data service connector. Several connectors are supported only via flow connection.

For more information, see DataStage Anywhere.

Running a flow

You can create a job in which to run your DataStage flow:

  • Directly on the DataStage canvas by clicking the run icon from the DataStage toolbar (the default name of a job that runs a flow is the flow's name appended with .DataStage job
  • From your project’s DataStage flows page by selecting the DataStage flow and clicking the Action menu and selecting New job.

When you run a job to extract, transform, or load data in DataStage, a Default DataStage XS runtime is started automatically and is listed as an active runtime on the Environments page of your project. You can update the environment you want to use by selecting the run settings icon on the DataStage canvas or by selecting a job from the Jobs tab and changing the settings there.

Monitor monthly billing

You must be an IBM Cloud account owner or administrator to see resource usage information.

To see the monthly charges, the amount of CUH used, the number of VPCs used, and the number of users for your service instance, go to the Cloud Usage Dashboard. For each instance, click Manage > Billing and Usage > Usage, click View Instances next to the service name, and then click View instance next to instance name.

Runtime logs for jobs

To view the accumulated logs for a DataStage job:

  1. From the project’s Jobs page, click the DataStage job for which you want to see logs.
  2. Click the job run. You can view the job log, copy the log to clipboard, or download the log.