## Shiny practice items 'Statistics and business analytics', module 5

This app allows you to practice various aspects covered in this module.

Please make sure that you have fully completed the following tasks before continuing with this app:

1. Read the module's learning plan

2. Watch the topic videos and review the lecture notes

3. Work through the SPSS How to guide

The module has the following objectives:

1. Use cross tabs and a chi square test to describe the relationship between two categorical variables

2. Perform a bivariate correlation analysis

3. Graphically summarize the relationship between two variables when each variable has the same scale level

4. Carry out bivariate statistical analyses using SPSS

5. Construct effective graphs [[ Shiny app item only ]]

Please proceed with the practice items by clicking on the links in the top bar of this app. It is not necessary to complete these items in order. We encourage you to work together with a classmate!

There are three types of practice items: theory ('TH'), SPSS ('SP') and multiple choice ('MC').

Thus, you should complete these practice items before joining the lab meeting corresponding to this module.

MBA program
Peter Ebbes

#### Question

Data visualization is a very important task of applied data science. We have seen many examples already in the modules covered so far. Whenever you can plot it, plot it!

But, bad graphs can also mislead the reader. A bad graph may hide data, display data inaccurately, or presents data in a cluttered and confused manner. For each of the following graphs, try to identify what the problem is with the graph. And based on that, how would you improve the graph?

All the way below we will provide a brief summary of the learnings from these curious examples!

#### Purpose

##### Use SPSS to carry out the chi-square test for a cross tab. Follow the following managerial/research questions to further practice your SPSS skills. Let's do it!

In this topic you learned the chi-square test for cross tabs. In an earlier module you already learned about describing the relationship between two categorical variables. To do this, you can compute a cross-tab (contigency table) or a clustered or segmented bar chart. Here we learn how to generalize the results to the population. For this you need again the chi-square test statistic, that we also used for a frequency table (one categorical variable). Fortunately, SPSS can easily perform such analyses for you! It is always a good idea to write down the 6 steps of hypothesis testing on scratch paper, even when you use SPSS. Practice conducting a chi-square test with SPSS following the scenario below from the insurance fraud mini-case (module 2, SPSS file 'mini_case_insurance_fraud_web.sav').

#### Question 1

On p2 of module 5's lecture slides we posed a couple questions regarding what claims are more or less likely to be fraudulent. Lets investigate here whether men or women are more likely to file a fraudulent claim. Or is there no difference? Use SPSS to investigate this. What do you conclude?

#### Question 2

Continuing the previous question, how about town size? Are fraudulent claims more likely to happen in smaller or larger towns? Or is there no difference? Use SPSS to investigate this. What do you conclude?

#### Purpose

##### Test your knowledge about the subjects of this module. Let's do it!

1. For which of the following scenarios can you compute a correlation coefficient as we learned it?

2. Which of the following data visualization approaches can be used to graph the relationship between a categorical variable and a quantitative variable?

3. A researcher reports that the length of hair (measured in inches) correlates with the exam score (0-100) of a final exam in Psychology 101 at a large undergrad school: the longer the hair, the higher the exam score (r=0.34, P-value=0.00). The researcher recommends all students to grow their hair before the next exam. Which of the following expressions best describes this scenario?

4. Considering the chi-square test we discussed in the lecture (see lecture slides module 5 p9), we computed that E5 is 930. Which of the following expression regarding this number is true?

5. Joe collects data on the profit margin of retail stores (expressed on a 0-100 percentage scale) and the stores' distances to the town center (in miles) for a random sample of stores. He reports to the management of the retail chain that when retail stores are closer to the town center, the profit margins are higher. Conversely, stores that are further away from the town center tend to have lower profit margins (r = -0.26, P-value = 0.18). He also examines the relationship between the two variables using a scatter plot and does not observe anything unusual such as a non-linear pattern. Which of the following expressions is true?