Linguistic resources
SPSS Modeler uses an extraction process that relies on linguistic resources. These linguistic resources serve as the basis for how to process the text data and extract information to get the concepts, types, and sometimes patterns.
The linguistic resources can be divided into different types:
- Category sets
- Categories are a group of closely related ideas and patterns that the text data is assigned to through a scoring process.
- Libraries
- Libraries are used as building blocks for both TAPs and templates. Each library is made up of several dictionaries, which are used to define and manage terms, synonyms, and exclude lists. While libraries are also delivered individually, they are prepackaged together in templates and TAPs.
- Templates
- Templates consist of a set of libraries and some advanced linguistic and nonlinguistic resources. These resources form a specialized set that is adapted to a particular domain or context, such as product opinions.
- Text analysis packages (TAP)
- A text analysis package is a predefined template that is bundled with one or more category sets. TAPs bundle together these resources so that the categories and the resources that were used to generate them are both stored together and reusable. You can then reuse a TAP to apply the same categories and resources to other flows.
Project assets for Text Analytics
You can save Text Analytics assets as project assets to create your own custom linguistic resources. You can reuse these assets to work more efficiently in your flows or share them to collaborate with others.
- Templates
- Libraries
- Text Analysis packages (TAP)
Category sets are not saved as project assets. To save any modification you make to a category set, you must download and save the category set or save it as part of a TAP.
For more information about projects and assets, see Projects and Managing assets in projects.
Downloading linguistic resources
You can download linguistic resources to manage them directly or to share them across teams.
- Templates (.lrt)
- Libraries (.lib)
- Text Analysis packages (.tap)
- Category sets (.xlsx)
- Within your project, go to the Assets tab and expand
SPSS Modeler Components.
The project assets for Text Analytics are sorted by type.
- Find the project asset that you want to download.
- Click the Options icon and select Download.
Custom linguistic resources
SPSS Modeler has a default set of specialized linguistic resources. You can use these linguistic resources to benefit from research and fine-tuning for specific languages and specific applications. However, these linguistic resources might not be optimized for your context or your data. You can edit and save your changes to these linguistic resources to optimize the extraction process for your flow.
You can also create and import custom linguistic resources that are uniquely fine-tuned to your organization's data. You can use local files to share these linguistic resources between users and projects. You can add a template, library, or TAP as a project asset from a local file.
For libraries and templates, you can upload them while you are working in the Text Analytics Workbench:
- Go to the Resource Editor tab.
- Click the Options icon and select Load library or Change template.
- Click Import, and then browse to or drag-and-drop a library or template.
- Enter details about the asset, and click Add.
- Click Apply.
For a custom TAP, you must upload it within the Text Mining node before you run your flow. For more information, see Uploading a custom asset in a Text Mining node.
For a custom category set, you can also upload it within the Text Analytics Workbench. For more information, see Reusing custom category sets.