Text Mining node (SPSS Modeler)

Text Mining node

You can use the Text Mining node for text mining, which is an iterative process that identifies relevant concepts and patterns in the text data. When you run the Text Mining node, the extraction engine reads through the text data, identifies the relevant concepts, and assigns a type to each. You can then review the extraction results by using the Text Analytics Workbench to fine-tune the extraction process. You can rerun the Text Mining node to produce new results, and then evaluate the new results.

Add a Data Asset node that points to hotelSatisfaction.csv.
From the Text Analytics category on the node palette, add a Text Mining node, connect it to the Data Asset node you added in the previous step, and double-click it to open its properties.
Under Fields, select Comments for the Text field and select id for the ID field.
Note: Only the Text field is required.

Figure 2. Text Mining node properties
Under Copy resources from, select Text analysis package, click Select Resources, and then load Hotel Satisfaction (English).tap (with Current category set(s) = Topic + Opinion).
A text analysis package (TAP) is a predefined set of libraries and advanced linguistic and nonlinguistic resources, which are bundled with one or more sets of predefined categories. If no text analysis package is relevant for your application, you can instead select Resource template under Copy resources from. A resource template is a predefined set of libraries and advanced linguistic and nonlinguistic resources that were fine-tuned for a particular domain or usage.
Figure 3. Text Mining node properties
Under Build models, check that Build interactively (category model nugget) is selected. Later when you run the node, this option starts Text Analytics Workbench, which is an interactive interface where you can explore and fine-tune the extraction results.
Under Begin session by, select Extracting concepts and text links. The option Extracting concepts extracts only concepts, whereas TLA extraction outputs both concepts and text links that are connections between topics (such as service, personnel, and food) and opinions.
Under Expert, select Accommodate spelling for a minimum word character length of. This option applies a fuzzy grouping technique that helps group commonly misspelled words or closely spelled words under one concept. The fuzzy grouping algorithm temporarily strips double or triple consonants and all vowels (except the first one) from extracted words. It then compares them to see whether they're the same. For example, location and locattoin are grouped.
Figure 4. Text Mining node properties
Click Save.
Run the Text Mining node to open the Text Analytics Workbench, and then proceed to the next section of this tutorial.