Text Mining node
You can use the Text Mining node for text mining, which is an iterative process that identifies relevant concepts and patterns in the text data. When you run the Text Mining node, the extraction engine reads through the text data, identifies the relevant concepts, and assigns a type to each. You can then review the extraction results by using the Text Analytics Workbench to fine-tune the extraction process. You can rerun the Text Mining node to produce new results, and then evaluate the new results.
- Add a Data Asset node that points to hotelSatisfaction.csv.
- From the Text Analytics category on the node palette, add a Text Mining node, connect it to the Data Asset node you added in the previous step, and double-click it to open its properties.
- Under Fields, select Comments for the
Text field and select id for the ID
field. Note: Only the Text field is required.
- Under Copy resources from, select Text analysis
package, click Select Resources, and then load Hotel
Satisfaction (English).tap (with Current category set(s) = Topic +
Opinion).A text analysis package (TAP) is a predefined set of libraries and advanced linguistic and nonlinguistic resources, which are bundled with one or more sets of predefined categories. If no text analysis package is relevant for your application, you can instead select Resource template under Copy resources from. A resource template is a predefined set of libraries and advanced linguistic and nonlinguistic resources that were fine-tuned for a particular domain or usage.
- Under Build models, check that Build interactively (category model nugget) is selected. Later when you run the node, this option starts Text Analytics Workbench, which is an interactive interface where you can explore and fine-tune the extraction results.
- Under Begin session by, select Extracting concepts and text links. The option Extracting concepts extracts only concepts, whereas TLA extraction outputs both concepts and text links that are connections between topics (such as service, personnel, and food) and opinions.
- Under Expert, select Accommodate spelling for a minimum word
character length of. This option applies a fuzzy grouping technique that helps group
commonly misspelled words or closely spelled words under one concept. The fuzzy grouping algorithm
temporarily strips double or triple consonants and all vowels (except the first one) from extracted
words. It then compares them to see whether they're the same. For example,
location
andlocattoin
are grouped. - Click Save.
- Run the Text Mining node to open the Text Analytics Workbench, and then proceed to the next section of this tutorial.