0 / 0
Data type STRING in Hive tables is assigned CLOB data type in Data Virtualization
Last updated: Nov 26, 2024
Data type STRING in Hive tables is assigned CLOB data type in Data Virtualization

When you virtualize tables in a Hive data source, Data Virtualization converts data with type STRING to type CLOB instead of type Varchar when it exceeds the maximum string length.

Symptoms

When you virtualize a Hive table that contains data type STRING that exceeds the default maximum string length (MaxStringSize), the data type of the column is converted to CLOB.

Hive does not give a maximum length for STRING types; therefore, Data Virtualization virtualizes at a specific length to avoid truncation. However, you can adjust the maximum length.

The following example is from a Hive table with strings where you can adjust the column type to a length suitable for aggregation.

describe table TESTDATA.BASIC_STRING;

                                Data type                     Column
Column name                     schema    Data type name      Length     Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
c1                              SYSIBM    INTEGER                      4     0 Yes
c_chr5                          SYSIBM    CHARACTER                    5     0 Yes
c_vchr10                        SYSIBM    VARCHAR                     10     0 Yes
c_str                           SYSIBM    CLOB                     65535     0 Yes
  4 record(s) selected.
The following query fails because of the CLOB data type.
select "c_str", sum("c1") from TESTDATA.BASIC_STRING group by "c_str";
SQL0134N  Improper use of a string column, host variable, constant, or function "c_str".  SQLSTATE=42907

Resolving the problem

You can alter the column that causes the failure to reduce its size to something that can be used in aggregation. In the following example, VARCHAR(200) is used but you can pick what is appropriate for the maximum length of your data.
alter nickname TESTDATA.BASIC_STRING alter column "c_str" local type VARCHAR(200);
The same query now succeeds.
select "c_str", sum("c1") from TESTDATA.BASIC_STRING group by "c_str";
c_str                                                                                                                      2
------------------------------------------------------------------------------------------------------------ ---------------
                                                                                                                         112
                                                                                                                         382
something a little longer to test string, where we do support pushdown for aggregation                                   244
a                                                                                                                        126
9995-12-31 23:59:59.999999                                                                                               124
A STRING                                                                                                                 118
-                                                                                                                        114
0005-01-01 00:00:00.000001                                                                                               122
  8 record(s) selected.