## Shiny practice items 'Statistics and business analytics', module 8

This app allows you to practice various aspects covered in this module.

Please make sure that you have fully completed the following tasks before continuing with this app:

1. Read the module's learning plan

2. Watch the topic videos and review the lecture notes

3. Work through the SPSS How to guide

The module has the following objectives:

1. Specify a logit regression model to explain/predict probabilities

2. Interpret a logit regressions through log-odds and odds ratios

3. Express the results of a logit analysis using baseline odds

4. Evaluate the fit of a logit regression model

5. Use SPSS to estimate a regression model with a categorical dependent variable

Please proceed with the practice items by clicking on the links in the top bar of this app. It is not necessary to complete these items in order. We encourage you to work together with a classmate!

There are three types of practice items: theory ('TH'), SPSS ('SP') and multiple choice ('MC').

Thus, you should complete these practice items before joining the lab meeting corresponding to this module. MBA program
Peter Ebbes

#### Purpose

##### Use SPSS to build a logit regression model to predict whether an insurance claim is fraudulent or not. Let's do it!

In this module you learned about logistic regressions. This class of models can explain and predict categorical variables. In module 2 we introduced the insurance fraud mini-case. While we were able to use a series of bivariate statistical analyses to characterize which claims were more likely to be fraudulent (e.g. modules 4,5), what is really needed is a prediction tool that allows the company to score insurance claims relative to their risk of being fraudulent. The techniques you learned in this module can be used for that.

Based on our univariate and bivariate statistical analyses in earlier modules, we can now start building a regression model that would predict whether a claim is fraudulent or not. We found using bivariate statistical analyses for instance that fraud rates differ strongly for claim types (e.g. module 5). We also found using univariate statistics that claim amount and income are long tailed distributions and logging them is probably a good idea.

Build a logit regression model that models the probability of a claim to be fraudulent ('fraudulent') as a function of the following predictors: 'claim_type', 'gender', 'retire', 'LN(claim_amount)', and 'LN(income)'.

Use SPSS and the data from the insurance fraud mini-case (module 2, SPSS file 'mini_case_insurance_fraud_web.sav').

#### Question 1

What is your estimated model? Write down the equation on scratch paper in terms of log-odds.

#### Question 2

Comment on the model fit.

#### Question 3

Discuss the interpretation of the odds ratios for (a) Wind/Hail claim type, (b) Fire/Smole claim type, (c) Income (in log dollars)

#### Question 4

Which of the variables in the equation is most important in explaining/predicting whether or not a claim is fraudulent?

#### Question 5

Consider the following two claims that are filed at the company. Use SPSS to predict the probability that each is fraudulent. Here we are only asking for a point prediction. Based on your prediction, what would you do with the claim?

Claim 1: Theft/Vandalism claim of an amount of 4 log dollars, filed by a female, not retired, with income 2.5 log dollars

Claim 2: Wind/Hail claim of an amount of 6.5 log dollars, filed by a male, not retired, with income 9 log dollars

#### Purpose

##### Test your knowledge about the subjects of this module. Let's do it!

1. Which of the following aspects need to hold for logit regressions?

2. A researcher estimates that the odds of bankruptcy for a startup are 4.161. Which of the following expressions is true?

3. You estimate that your probability of winning a game of chance is 0.779. Which of the following expressions is true?

4. You analyze a survey of 1265 higher-income (top half of incomes) persons in Japan. For each person, you analyze whether (1) or not (0) they own an expresso machine (12.9% of the sample own one). You have three explanatory variables: gender (0 female, 1 male), income (in 1000 euro), and whether the person owns a home cinema (1 yes, 0 no). Which of the following statements is correct? 5. Consider the same scenario as in question 4. What is the estimated probability to own an expresso machine for a male with an income of 50000 euros who does not own a home cinema?