Setting up before-job and after-job subroutines in DataStage
You use before-job and after-job subroutines to run built-in subroutines.
Before-job and after-job subroutines include running a script before the job runs or generating a report after the job successfully completes. A return code of 0 from the subroutine indicates success. Any other code indicates failure and causes an unrecoverable error when the job is run.
To set up a before-job or after-job subroutine, complete the following steps.
- Open a DataStage® flow, then click the Settings icon.
- On the Settings page, click Before/after-job subroutines.
- Specify a before-job subroutine, an after-job subroutine, or both. Then, click Save.
Using custom Python code in subroutines
You can install Python packages to run scripts in before-job and after-job subroutines.
- Open a DataStage flow, navigate to Settings , and click Before/after-job subroutines.
- Under Built-in subroutine to execute, choose Execute shell command.
- In the Shell command text box, enter a command to create a directory for
your modules under /px-storage and a command to install the desired modules.
This example command installs modules in the directory
pip_modules.
mkdir -p /px-storage/pip_modules && pip3 install <modulename1> <modulename2> --target /px-storage/pip_modules –user
- Save and execute the flow.
- To enable non-root users to execute your script, append the file path of the module directory at
the top of your Python script. Following the previous example:
import sys sys.path.append("/px-storage/pip_modules")
- Replace the command in the Shell command text box with a command to call
the Python script with its filepath. This example calls the script
test_data.py
from /ds-storage/ing_py_env.python3 /ds-storage/ing_py_env/test_data.py
- Save and execute the flow.
Using cpdctl in before and after job routines
You can use the command line (cpdctl) in before and after your job routines.
You can find cpdctl binary in the /px-storage/tools/cpdctl directory. If you
want to update the cpdctl to the latest version, you can download the specific version from this
page: https://github.com/IBM/cpdctl/releases. Use the
following command to copy the cpdctl
version:
oc cp cpdctl ds-px-default-ibm-datastage-px-runtime-7d77747cfc-sjngt:/px-storage/tools/cpdctl/cpdctl
To execute cpdctl commands, complete the following steps.
- Open a DataStage flow, go to Settings , and click Before/after-job subroutines.
- Under Built-in subroutine to execute, choose Execute shell command.
- In the Shell command text box, enter a cpdctl command that you want to
run, for example
cpdctl project list
- If you want to run the flow from the canvas, you can create a local parameter on your job canvas.
- On the canvas, click Add parameters, and then Create parameter.
- In the Name field, specify the parameter name as $ENABLE_CPDCTL.
- Choose the String type and enter a value 1 in the Default value field.
- If you want to use the cpdctl command line to run your job, use the following command to
configure the job with the variable
ENABLE_CPDCTL.
cpdctl dsjob run --project-name --job --env ENABLE_CPDCTL=1
- Save and run the job with the specified environment option ENABLE_CPDCTL=1.