Introduction to modeling
A model is a set of rules, formulas, or equations that can be used to predict an outcome based on a set of input fields or variables. For example, a financial institution might use a model to predict whether loan applicants are likely to be good or bad risks, based on information that is already known about them.
Preview the tutorial
Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.
Video disclaimer: This video provides a visual method to learn the concepts and tasks in this documentation.
The ability to predict an outcome is the central goal of predictive analytics, and understanding the modeling process is the key to using flows in Watson Studio.
The model in this example shows how a bank can predict if future loan applicants might default on their loans. These customers previously took loans from the bank, so the customers’ data is stored in the bank's database. The model uses the customers’ data to determine how likely they are to default.
An important part of any model is the data that goes into it. The bank maintains a database of historical information on customers, including whether they repaid the loans (Credit rating = Good) or defaulted (Credit rating = Bad). The bank wants to use this existing data to build the model. The following fields are used:
Field name | Description |
---|---|
Credit_rating | Credit rating: 0=Bad, 1=Good, 9=missing values |
Age | Age in years |
Income | Income level: 1=Low, 2=Medium, 3=High |
Credit_cards | Number of credit cards held: 1=Less than five, 2=Five or more |
Education | Level of education: 1=High school, 2=College |
Car_loans | Number of car loans taken out: 1=None or one, 2=More than two |
This example uses a decision tree model, which classifies records (and predicts a response) by using a series of decision rules.
For example, this decision rule classifies a record as having a good credit rating when the income falls in the medium range and the number of credit cards are less than 5.
IF income = Medium
AND cards <5
THEN -> 'Good'
Using a decision tree model, you can analyze the characteristics of the two groups of customers and predict the likelihood of loan defaults.
While this example uses a CHAID (Chi-squared Automatic Interaction Detection) model, it is intended as a general introduction, and most of the concepts apply broadly to other modeling types in SPSS Modeler.
Sample files
This example uses the flow that is named Introduction to Modeling, available in the example project you previously imported. The data file that is used in this example project is tree_credit.csv.
To open the Introduction to Modeling flow, follow these steps:
- Open the Example Project.
- Scroll down to the Modeler flows section, click View all, and select the Introduction to Modeling flow.
The Introduction to Modeling flow demonstrates the basic steps that you need to do to build, browse, evaluate, and score the model. Read the following lessons to learn more about each step.