You can combine tables from multiple data sources into a single virtual table, which
creates a unified definition that contains the columns and data from all participating data sources.
These tables are referred to as grouped tables.
Segmentation is vertical (either a subset or superset of columns based on a selection of chosen
columns). You can run queries against the resulting virtual table the same way that you would query
any of the base tables.
Tables can be grouped into a single virtualized table if the tables' names as resolved by the
driver are the same (an exact match, case-sensitive). The table grouping semantics considers columns
with the same name (case-insensitive) to match. The data type promotion follows precedence
rules.
You must ensure that the grouping of tables is logical, especially concerning the handling of
data types of the result grouped table. In some cases, corresponding columns might have different
data types. Rules are used to determine the result data types of the grouped table. For more
information, see Rules for result data types.
To prevent unexpected results, you must be careful when you group tables into a single
virtualized table. The following recommendations can be considered when you create a grouped table,
although nothing is restricted.
- Ensure that the columns with the same name from all the tables have compatible data types and
compatible business usage. The domain of data in all tables must be compatible.
- Group tables from the same type of data source to ensure consistent data semantics.
- When you group tables from different types of data sources, ensure consistent data semantics.
- Ensure that all tables in the group have most columns with the same names. Ensure that matching
columns have compatible data types.
- Do not include columns that are defined on complex types.
Note: Some data sources, such as
Hive, use lowercase default for tables names.
These tables cannot be grouped with tables from other data sources that have uppercase as the
default unless delimited identifiers are used to ensure an exact match on table names.
About this task
These steps describe how to virtualize data without business terms. If you want to govern your
virtual data, see Governing virtual data in Watson Query.
Procedure
- On the navigation menu, click .
The service menu opens to the Data
sources page by default.
- On the service menu, click
and click the Tables tab.
The list of connections appears in the default
Explore view. Click a connection, schema, or table to select it and preview the
contents.
The list of available tables in your connections appears
in the List view. You can filter the listed
tables by adding filters on the Data sources page. Additionally, you can search
for tables by name, schema, column, or business term.
Watson Query can automatically suggest matched tables that can be combined into a single
virtual table.
- To combine tables into a single virtual object, click Settings
and select Group tables with identical names in the
settings menu.
The display shows the suggested grouping of tables, and you can fine-tune the participating
source tables on the Review step.
The list of available tables includes tables in which the read permission is not granted
(non-readable tables).
- If you create a virtual table that includes a non-readable table, select one of the
following options to make your virtual table queryable.
- Option 1 Click Edit grouped tables on the table action menu
and remove the non-readable table from the group that you are virtualizing.
- Option 2 Ensure that the user who is assigned to the data source connection can
access the non-readable table. This step can happen before or after the table is
virtualized.
- Option 3 Create a data source connection in Watson Query to access the same
data source with credentials for a separate user that does have read-access to the table. As a
result, the list of additional table instances appears when you select the Group tables
with identical names option. Before virtualizing tables, you must still manually remove
non-readable tables discovered under the restricted user's connection (as described in option
1).
- Select the table that you want to virtualize and click Add to
cart.
- Click View cart to view your
selections. From this window, you can also edit the table and schema names or remove a
selection from your
cart.
- Select the appropriate action to assign the virtualized table.
Assign to |
When to use this option |
Project |
Select Project if you created the virtual table to use in a specific
project. Then, choose the appropriate project. The table also appears in Virtualized
data. |
Virtualized data |
Select Virtualized data if the table was not created to use in a
specific project. This setting is the default if no projects exist. |
- Select Publish to catalog if you also
want to publish to a selected catalog.
A list of available catalogs is shown in the
drop-down menu. Each catalog is tagged as
Governed or
Not
governed.
Note: You must have at least one catalog in
IBM Knowledge
Catalog.
You must have permission to publish to a
catalog. An administrator can enable whether all virtual objects are published to a selected
governed catalog, which prevents a user from publishing to a specified
catalog.
- Specify a schema in the Schema
field.
You can also create a schema by following these
steps.
- If you have the Watson Query
Engineer or User role, leave the
Schema field as default to create a schema with your user ID.
- If you have the Watson Query
Manager role, leave the
Schema field as default to create a schema with your user ID or enter the new
schema name in the Schema field.
-
Click Virtualize to complete the process.
When the status window appears, you can select to view your virtualized data
or virtualize more data.
- Click View virtualized data to see your newly
created tables.