Oracle Database for DataStage connector (DataStage)

Use the Oracle Database for DataStage connector in DataStage® to access Oracle database system and perform read, write, and load functions.

Prerequisite

Create the connection. For instructions, see Connecting to a data source in DataStage and the Oracle Database for DataStage connection.

Configuring the Oracle Database for DataStage connector as a source for reading data
Reading partitioned data
Configuring the Oracle Database for DataStage connector as a target for writing data
Configuring bulk loading of data
Writing partitioned data

Configuring the Oracle Database for DataStage connector as a source for reading data

To configure the connector to read rows in an Oracle table or view, you must specify the source table or view or define a complete SELECT statement or PL/SQL block.

The available properties for the Read mode depend on whether you choose Select or PL/SQL mode.

Reading data from the Oracle Database for DataStage database

From the job design canvas, double-click the Oracle Database for DataStage connector.
On the Output tab, click Usage.
Set Read mode to Select or PL/SQL
If you set Read mode to Select, use one of these methods to specify the source of the data:
- Set Generate SQL at a runtime to Yes and then enter the name of the table or view in the Table name property. Use the syntax schema_name.table_name, where schema_name is the owner of the table. If you do not specify schema_name, the connector uses the schema that belongs to the user who is currently connected. From the Table scope property you can choose what part of the table to access.
- Set Generate SQL at a runtime to No, and then specify the SELECT statement in the Select statement property.
- Set Generate SQL at a runtime to No, and then enter the fully qualified file name of the file that contains the SQL statement in the Select statement property.
If you set Read mode to PL/SQL, use one of these methods to specify the source of data:
- Enter the PL/SQL block manually in the PL/SQL block property.
- Enter the fully qualified file name of the file that contains the PL/SQL block in the PL/SQL block property.
The PL/SQL block that you specify must begin with the keyword DECLARE or BEGIN and must end with the keyword END, and you must enter a semicolon after the END keyword.
You can configure optional properties:
- In the Session tab:
  - Prefetch row count: Set a value other than 1 for this property to prefetch that many rows beforehand from the Oracle database.
  - Prefetch buffer size: Enter the value to fetch as many records that can fit in the specified buffer size.
  - Transfer BFILE contents: Transfer external file contents instead of file location when transferring BFILE column values.
  You can specify Before/After SQL statements that will run once before or after any data is processed, or run once on each node before or after any data is processed.
- In the Transaction tab:
  - Isolation level: Specify the degree to which the data that is being accessed by the Oracle Database for DataStage connector is locked or isolated from other concurrently executing transactions, units of work, or processes.
  - Record count: The number of records per transaction. The value 0 means all available records.
  - Mark end of wave: Choose how the connector handles batch processing.
Click Save.

Reading partitioned data

In a job that uses multiple nodes, each node that is specified for the stage reads a distinct subset of data from the source table.

Reading partitioned data from the Oracle Database for DataStage database

On the Output tab, set Enable partitioned reads to Yes.
Set Read mode to Select, and then define the SELECT statement that the connector uses at run time:
- Set Generate SQL at a runtime to Yes, and then enter the name of the table or view in the Table name property. Use the syntax schema_name.table_name, where schema_name is the owner of the table. If you do not specify schema_name, the connector uses the schema that belongs to the user who is currently connected. The connector automatically generates and runs the SELECT statement.
  To read data from a particular partition of a partitioned table, set the Table scope property to Single partition and specify the name of the partition in the Partition name property. The connector then automatically adds a PARTITION(partition_name) clause to the SELECT statement that is generated. To read data from a particular subpartition of the composite partitioned table, set the Table scope property to Single subpartition.
- Set Generate SQL at a runtime to No, and then specify the SELECT statement in the Select statement property. You can enter the SQL statement or enter the fully qualified file name of the file that contains the SQL statement.
Choose the Partitioned read method that you want to use. The default partitioning method is Rowid range.
- If you choose the Rowid range or Minimum and maximum range partitioned method, in the Partition or subpartition name for partitioned reads property, specify the name of the partition or subpartition that the partitioned read methods uses.
  If you do not specify a value for the Partition or subpartition name for partitioned reads property, the connector uses the entire table as input for the partitioned read method.
- If you choose the Modulus or the Minimum and maximum range partitioned read method, in the Column name for partitioned reads, enter the name of the column from the source table to use for the method. The column must be an existing column in the table, must be of NUMBER(p) data type, where p is the number precision, and must have a scale of zero.
- In the Table name for partitioned reads property, specify the name of the table that the partitioned read method uses to define the subsets of data that each node reads from the source table. If you do not specify a table name, the connector uses the value of the Generate SQL at a runtime property to determine the table name. If Generate SQL at a runtime is set to Yes, the connector uses the table name that is specified in the Table name property. If Generate SQL at a runtime property is set to No, the connector looks at the SELECT statement that is specified in the Select statement property and uses the first table name that is specified in the FROM clause.
Click Save.

Configuring the Oracle Database for DataStage connector as a target for writing data

To configure the connector to write rows to the Oracle Database for DataStage table or writable view, you must specify the target table or view or define the SQL statements or PL/SQL block.

Writing data to the Oracle Database for DataStage

From the job design canvas, double-click the Oracle Database for DataStage connector.
On the Input tab, click Usage.

Specify how the Oracle Database for DataStage connector writes data to an Oracle table or writable view:

Table 1. Methods for writing data to an Oracle Database for DataStage table or writable view
Method	Procedure
Automatically generate the SQL at run time	Set Generate SQL at runtime to Yes. Set Write mode to Insert, Insert new rows only, Update, Delete, Insert then update, Update then insert, or Delete then insert. Enter the name of the target table in the Table name property.
Enter the SQL manually	Set Generate SQL at runtime to No. Set Write mode to Insert, Insert new rows only, Update, Delete, Insert then update, Update then insert, or Delete then insert. Enter SQL statements in the fields that correspond to the write mode that you selected.
Specify a PL/SQL block	Set the Write mode to PL/SQL. Enter the PL/SQL block in the PL/SQL block property. The PL/SQL block must begin with the keyword DECLARE or BEGIN and end with the keyword END. You must include a semicolon character after the END keyword.
Bulk load the data	Set Write mode to Bulk load. Enter the name of the table in the Table name property. Use the syntax `schema_name.table_name`, where `schema_name` is the owner of the table. If you do not specify `schema_name`, the connector uses the schema that belongs to the user who is currently connected.

Click Save.

Configuring bulk loading of data

When you use the Oracle Database for DataStage connector to bulk load data to an Oracle database, you can enable or disable constraints and triggers. You can also configure the date cache, manage indexes, set options for bulk record loading, and enable manual mode.

Setting up the bulk load

In the Write mode property, choose the Bulk load method and specify the table to write to in the Table name property.
Configure the connector to disable constraints before it bulk loads data and enable constraints after it bulk loads data:
1. Set Perform operations before bulk load to Yes.
2. Set Disable constrains to Yes.
3. Set Perform operations after bulk load to Yes.
4. Set Enable constrains to Yes.
5. In the Exceptions table name field, enter the name of the exceptions table.
  If the exceptions table does not exist, the connector creates it.
  
  If the exceptions table already exists, the connector deletes any data that is in the table and then uses it.
Configure the connector to disable triggers before it bulk loads data and enable triggers after it bulk loads data:
1. Set Perform operations before bulk load to Yes.
2. Set Disable triggers to Yes.
3. Set Perform operations after bulk load to Yes.
4. Set Enable triggers to Yes.
To control how to handle table indexes during a bulk load, set the Index maintenance option property.
To rebuild indexes after a bulk load:
1. Set Perform operations after bulk load to Yes.
2. Set Rebuild indexes to Yes.
3. Optional: To enable or disable parallelism and logging to the redo log when the index is rebuilt, specify nondefault values for the Parallel clause and Logging clause properties.
  By default, parallel and logging clauses are not included in the ALTER INDEX statement.
4. Optional: To stop the job if an index rebuild statement fails, set Fail on error for index rebuilding to Yes.
  If an index rebuild fails, the connector logs a fatal error.
If you plan to bulk load data into tables that contain DATE or TIMESTAMP columns, enable and configure the date cache:
1. Set Use Oracle data cache to Yes.
2. Optional: In the Cache size property, enter the maximum number of entries that the cache stores.
3. Optional: Set Disable cache when full to Yes.
  When the number of entries in the cache reaches the number that is specified in the Cache size property and the next lookup in the cache results in a miss, the cache is disabled.
Set options to control bulk record loading:
1. Set Array size to a value 1 - 999,999,999.
  The default is 2,000.
2. Set the Allow concurrent load sessions property to Yes.
3. Set Buffer size to a value 4 - 100,240, which represents the buffer size in KB.
  The default is 1,024.
To enable manual mode:
1. Set Manaul mode to Yes.
2. Optional: In the Directory for data and control files property, specify a directory to save the control and data files to.
3. Optional: In the Control file name property, specify a name for the control file.
4. In the Data file name property, specify the name of the data file.
  If you do not specify a value for the data file name, the connector generates the name in the servername_tablename.dat format.
5. In the Load options property, specify the bulk load options to include in the control file that the connector generates.
  The value contains parameters that are passed to the Oracle SQL*Loader utility when the utility is invoked to process the control and data files. The default value is OPTIONS(DIRECT=FALSE,PARALLEL=TRUE).
  
  The DIRECT=FALSE parameter tells the Oracle SQL*Loader to use the conventional path load instead of the direct path load. The PARALLEL=TRUE parameter tells the utility that the data can be loaded in parallel from multiple concurrent sessions.
  
  The word OPTIONS and the parentheses must be included in the value that is specified for the property. The connector saves this property value to the control file that is generated and does not check the syntax of the value.

Writing partitioned data

In a job that uses multiple nodes, records that arrive on the input link of the connector are distributed across multiple nodes. Then, the records are written in parallel from all of the nodes to the target database.

The default partition type is Auto, which selects the partition type based on the various settings for the stages in the job. Instead of using Auto, it is better to select a partition type based on your knowledge about the actual data and the target table that the connector writes to at run time.

Setting a partitioned method

On the job design canvas, double-click the Oracle Database for DataStage connector.
On the Input tab, click Partitioning tab.

Choose the Partition type:

Table 2. Partitioning types for the Oracle Database for DataStage connector
Partition type	Description
Auto	DataStage determines the best partitioning method to use depending on the type of stage, and what the previous stage in the job has done.
Entire	Every instance of a stage on every processing node receives the complete data set as input. You are most likely to use this partitioning method with stages that create lookup tables from their input.
Random	Records are randomly distributed across all processing nodes in this partition. Like round robin, random partitioning can rebalance the partitions of an input data set to guarantee that each processing node receives an approximately equal-sized partition.
Round robin	The first record goes to the first processing node, the second to the second processing node, and so on. When DataStage reaches the last processing node in the system, it starts over. This method is useful for resizing partitions of an input data set that are not equal in size.
Same	The stage using the data set as input performs no repartitioning and takes as input the partitions output by the preceding stage. With this partitioning method, records stay on the same processing node; that is, they are not redistributed. DataStage uses this method when passing data between stages in your job.
Modulus	Partitioning is based on a key column modulo the number of partitions. The modulus partitioner assigns each record of an input data set to a partition of its output data set as determined by a specified key field in the input data set.
Hash	Partitioning is based on a function of one or more columns (the hash partitioning keys) in each record. The hash partitioner examines one or more fields of each input record (the hash key fields). Records with the same values for all hash key fields are assigned to the same processing node.
Range	Divides a data set into approximately equal-sized partitions, each of which contains records with key columns within a specified range. This method is also useful for ensuring that related records are in the same partition.