CHAID node

Last updated: Feb 11, 2025

CHAID node (SPSS Modeler)

CHAID, or Chi-squared Automatic Interaction Detection, is a classification method for building decision trees by using chi-square statistics to identify optimal splits.

CHAID first examines the crosstabulations between each of the input fields and the outcome, and tests for significance by using a chi-square independence test. If more than one of these relations are statistically significant, CHAID selects the input field that is the most significant (smallest p value). If an input has more than two categories, they are compared. Those categories that show no differences in the outcome are collapsed together. Category merging is done by successively joining the pair of categories that show the least significant difference. This category-merging process stops when all remaining categories differ at the specified testing level. For nominal input fields, any categories can be merged; for an ordinal set, only contiguous categories can be merged.

Exhaustive CHAID is a modification of CHAID that does a more thorough job of examining all possible splits for each predictor but takes longer to compute.

Requirements

Target and input fields can be continuous or categorical. Nodes can be split into two or more subgroups at each level. Any ordinal fields that are used in the model must have numeric storage (not string). If necessary, the Reclassify node can be used to convert them.

Strengths

Unlike the C&R Tree and QUEST nodes, CHAID can generate nonbinary trees, which means that some splits have more than two branches. For this reason, CHAID tends to create a wider tree than the binary growing methods. CHAID works for all types of inputs, and it accepts both case weights and frequency variables.

Customized layers

You can customize the properties for the CHAID node to specify fields that the CHAID algorithm must use when it determines where to split the decision tree. When the SPSS Modeler flow runs, the decision tree uses the field that is specified for that layer when it splits. You can specify fields for multiple layers to control each split of the decision tree.

You can use custom layers to control the growth of the decision tree. This control is especially useful when you know your dataset well or have some predefined decision rules.

To use customized layers, you must enable and configure it: