## Shiny practice items 'Statistics and business analytics', module 3

This app allows you to practice various aspects covered in this module.

Please make sure that you have fully completed the following tasks before continuing with this app:

1. Read the module's learning plan

2. Watch the topic videos and review the lecture notes

3. Work through the SPSS How to guide

The module has the following objectives:

1. Describe the basic idea of statistical hypothesis tests

2. Choose which test to use given the scale level of the variable

3. Carry out statistical hypothesis tests for population parameters by hand and with SPSS

4. Calculate sample sizes

Please proceed with the practice items by clicking on the links in the top bar of this app. It is not necessary to complete these items in order. We encourage you to work together with a classmate!

There are three types of practice items: theory ('TH'), SPSS ('SP') and multiple choice ('MC').

Thus, you should complete these practice items before joining the lab meeting corresponding to this module.

MBA program
Peter Ebbes

#### Scenario

Suppose you just finished your studies at HEC Paris and you landed this fantastic job. It pays well. On the top of that you will receive a lucrative quaterly bonus if your work improves customer satisfaction. Your manager receives once a quarter the results from a random sample of customers who indicate their satisfaction with the company's product and services on a 0-100 scale.

Your first evaluation moment (and bonus?) is almost there! The target is to improve over the average satisfaction from last year, which was 75. There comes the email from your manager. You open it, and find that the average satisfaction is... 79!

#### Question 1

Of course, you deserve your first bonus! But you also know that your manager is strong with statistics. In your own words, is your first bonus a done deal based on the information provided in the scenario above? Why? Why not?

#### Question 2

Your manager computed the P-value and lets you know that the P-value is 0.0004. Can you now celebrate? Why? Why not?

#### Question 3

Suppose that the P-value was 0.88 instead. Does your conclusion change? Why? Why not?

#### Question 4

Suppose the P-value was 0.049. Should you get your bonus? Why? Why not?

#### Question 5

Suppose that the sample produced an average satisfaction of 75.8 and a P-value of 0.002. Would you celebrate? Why? Why not?

#### Question 1

Use a $\chi^{2}$-test for the scenario above.

#### Question 2

Use a t-test for the scenario above to test an hypothesis about $\pi$, where $\pi$ indicates the proportion of students that prefers the 9am class.

#### Case 1

Did the proportion of married people change from 2010? Assume married is capture as 1=married and 2=not married, and that the proportion of married people in 2010 was 0.53.

Well, what did you write down? Here we have a categorical variable, so we are talking about proportions. In the population, proportion is represented by the symbol $\pi$, and hypotheses always need to be stated in population terms. Here we are wondering whether the proportion of married people changed from 0.53. So, we would write $H_{0}: \pi=0.53$ and $H_{1}: \pi \not= 0.53$. We can use the Z-test statistic for this question, but we need to take the correct one (i.e. the one for categorical data), which is $Z= \frac{p-\pi}{\sqrt{\pi(1-\pi)/n}}$. Do you remember what $p$ stands for? Note that we could potentially also use the $\chi^{2}$-statistic here, in that case I would write the null hypothesis as $H_{0}: \pi_{1}=0.53, \pi_{2}=0.47$, where 1=married, 2=not married. See also item 3 of this app.

#### Case 2

Did income grow, where the average income in 2010 was 34000 euros?

Here, we are dealing with a quantitative variable, and we learned about testing an hypothesis about the population mean $\mu$ for such a scenario. The null hypothesis has to be a statement about the population, i.e. a Greek letter, and here we wonder whether the mean is bigger than 34000 euros. So, $H_{0}: \mu=34000$ and $H_{1}: \mu>34000$. Here it would also be fine to write down $H_{1}: \mu \not= 34000$, which would give a more conservative test (meaning, harder to reject the null hypothesis so stronger evidence from the data is needed to reject $H_{0}$). As for the previous question, we would use a Z-test statistic, but now for a quantitative variable, i.e. $Z= \frac{\bar{x}-\mu}{S / \sqrt{n}}$, where $\bar{x}$ is the sample mean and $S$ is the sample standard deviation both computed from the 2020 data.

#### Case 3

Whether employment status changed, where employment status is measured as 1=employed (excl. self-employed), 2=self-employed, 3 = unemployed, 4=other (e.g. retired). In 2010, the percentages were 62, 10, 10, and 18, respectively.

Employment status is a categorical variable with four categories. Like case 1, we again need to test about (population) proportions. We have four $\pi$'s in the null hypothesis. More specifically, $H_{0}: \pi_{1} = 0.62, \pi_{2} = 0.10, \pi_{3} = 0.10, \pi_{4} = 0.18$, the subscripts are defined above in the question text. Note that the $\pi$'s are proportions not percentages! For the alternative hypothesis we could write $H_{1}:$ at least one $=$ is $\not=$. Here, we have to use the $\chi^{2}$-test statistic, which is given by $\chi^{2} = (O_{1}-E_{1})^{2}/E_{1} + (O_{2}-E_{2})^{2}/E_{2} + (O_{3}-E_{3})^{2}/E_{3} + (O_{4}-E_{4})^{2}/E_{4}$; the $O$'s are the observed frequencies in the 2020 sample (e.g. $O_{1}$ is the number of people in your sample being employed) and the $E$'s are computed as the proportions under $H_{0}$ times the sample size.

#### Case 4

Are people travelling more or less often internationally compared to 2010? Assume that international travel is measured in your dataset as the exact number (0,1,2,3,. etc. times), and that the average number of international trips in 2010 was 3.8.

As always, first identify the measurement level of the variable in question. Here, international travel is measured as the exact number, so it is a quantitative variable. That means we can compute the mean and the hypothesis will be about the population mean $\mu$. More specifically, given the situation in 2010, the null hypothesis is $H_{0}: \mu=3.8$ and the alternative hypothesis is $H_{1}: \mu \not= 3.8$. The test statistic is the same as in scenario 2.

#### Purpose

##### Use SPSS to test hypotheses about the population. Follow the following managerial/research questions to further practice your SPSS skills. Let's do it!

In this module you learned about an hypothesis test for a mean and for proportions. SPSS can easily perform such tests for you (of course, you can also do them by hand, which is always a fun activity for a rainy Sunday afternoon!). It is always a good idea to write down the 6 steps on scratch paper, even when using SPSS. Practice conducting a t-test and a chi-square test with SPSS following two scenarios from the insurance claims and fraud mini-case (module 2, SPSS file 'mini_case_insurance_fraud_web.sav').

#### Question 1

Suppose the managers at the insurance company wanted to know whether the average claim amount this time period changed from previous time period. The accounting department reports that the average claim last year was $63500. What would your conclusion be? #### Question 2 Similarly, management was concerned about the distribution of claim types this year. Knowing what claims to expect helps the insurance company to plan their risk. One manager argued that there was a trend that the Theft/Vandalism claims went down, as well as Contamination claims, whereas other claim types (in particular wind and hail damage) went up (relatively speaking). The manager prepared the data from last year: W/H 22%, Water 14%, F/S 23%, Contamination 10%, T/V 31%. Is there evidence in the data to support the manager's claim? #### Topic #### Five multiple choice practice questions #### Lecture ##### Module 3, all topics #### Purpose ##### Test your knowledge about the subjects of this module. Let's do it! 1. The mayor of a city wants to assess the impact of the city's new public transportation system. She collects data on travel time (in minutes) from 130 persons and finds that$\bar{X} =23.4$and$S^{2} = 70.56$. Given the stakes, the mayor would like to have a confidence interval of plus or minus 1 minute. What sample size do you recommend using a confidence level 0f 99% (round to the nearest integer)? 2. Which of the following expressions best describes the significance level$\alpha$or its use? 3. Which of the following tests would you recommend to test$H_{0}: \pi_{1}=\pi_{2}=\pi_{3}=1/3$? 4. Joe finds in his sample that 100 users like the product (category 1) and 100 users do not like the product (category 2). His hypothesis is that, in the population, 60% like the product and 40% do not like the product. He uses a chi-square test to investigate this. What is the correct value for$E_{1}\$?

5. The approval rate of the president went down (measured as a grade on 0-100 scale). The opposition is bragging about it on TV. The study interviews 1000 individuals that are reprenstative of the citizens of the country. Before believing the opposition, what would you ideally need to know?