5160196 Statistics Fall 1999
Teachers : François Bellavance and JeanClaude Lebrun
Problem 1. (12 points)
From October 18 to 25, 1999, a survey from « La Presse SOM » was carried out with 505 Montrealers to know their level of satisfaction on the administration of the mayor Pierre Bourque as well as their opinion on the « one island, one city» project. For the question, « The mayor presently exercises his second mandate. Would you say that you currently trust him more, as much as or less than during his first mandate ? », we obtained the following results :
Mother tongue
  Trust more 
Trust as much

Trust less

French

32

156

118

English (or other)

40

50

59

Note : 50 people (18 francophones and 32 anglophones or others) did not know or did not answer the question and thus were not entered in this table.
Questions :

In Montreal, is there a significant difference in the level of confidence granted to the mayor between the francophones and the anglophones (or others)? Use a =5% level and verify, at least 3 TIMES, that there is no errors in the transcription of the data in EXCEL or MINITAB,!!!! (2 points)

If you observed a significant link, briefly describe this link. If you did not observe a significant link, briefly describe why. (4 points)
During the same survey, for the question, « If a referendum was held on the island of Montreal, would you vote yes or no to the following question : do you want to replace the 29 municipalities of the island of Montreal by one city, i.e. one island, one city? », we obtained the following results :
Mother tongue
 Answer to the question 
Yes

No

French

161

145

English (or other)

64

102

Note : 33 people (18 francophones and 15 anglophones or others) did not know or did not answer the question and thus were not entered in this table.
.
We want to verify the hypothesis that in Montreal, the proportion of francophones in favour of the « one island, one city» project is different from the proportion of anglophones (or others).
Questions :

Formulate the hypotheses for this problem.(1 point)

In the sample, what are the respective proportions of francophones and anglophones in favour of the « one island, one city» project? (1 point)

Obtain the « pvalue » for the test of the hypotheses formulated in c) and give your conclusion at the =5 % level? (4 points)
For problems 2 to 5, see the EXCEL data file « jan2000.xls ». Before using this file to answer the questions, be sure that you save on your hard disk, at least one copy of this file under another name or in another directory.
Context and data file description :
A firm of« head hunters » offers its services to recruit the best managers either from inside or outside the company. In the business world, several people claim that the managers hired from outside of the company obtain better performances than the ones recruited from within the company.
A team of researchers asked 150 United States managers, chosen at random, to take part in a small study in order to verify this business world claim. The sample was obtained using a simple random draw with a 91% rate of participation. A comparison of some demographic characteristics between the participants and the non participants did not reveal a significant difference between these two groups. In other words, we can say that the sample obtained is probably not biased.
The file « jan2000.xls » contains the data collected for each of the 150 participants (source : Foster D.P. et al. Business Analysis Using Regression. A Casebook . SpringerVerlag, New York, 1998). The detailed content of the file and the description of the measured variables are as follows:
Column

Variable name

Description

A

Id

Anonymous identification number of the participants (1 to 150)

B

Performance

Score of performance evaluated by the researchers’ team

C

Salary

Managers annual salary in thousands of U.S. dollars
(Note that a higher salary is an indicator of a higher level in the company, i.e. closer to the top management)

D

Years

Managers years of experience

E

ExtInt

Variable indicating the managers origin: 1=External and 0=Internal

F

Origin

Same variable as column F, but not numerically coded

G

PerfExt

Score of performance of the managers recruited from outside the company

H

PerfInt

Score of performance of the managers recruited from within the company

I

SalaryExt

Annual salary of the of the managers recruited from outside the company

J

SalaryInt

Annual salary of the managers recruited from within the company

Note that the data in columns G and H are the same as the ones in column B, but grouped by the managers’ origin. Also, the data in columns I and J are the same as the ones in column C, but are grouped by the managers’ origin.
Problem 2. (10 points)
Questions :

Using EXCEL (or MINITAB) obtain the minimum, the maximum, the mean and the standard deviation of the 150 managers salaries for the sample? (4 points)
Minimum :
Maximum :
Mean :
Standard deviation :

Using EXCEL (or MINITAB), obtain the 95% confidence interval for the mean of the United States managers salaries and briefly give the interpretation of that interval. (4 points)

Starting from the confidence interval for the mean of the salaries found in b), define if the « pvalue » to confront the hypotheses H_{0 }: = 70 against H_{1 }: 70 , where represents the true mean of the United States managers salaries in thousands of U.S. dollars, would be higher, lower or equal to 5%. Briefly justify your answer. (2 points)
Problem 3. (15 points)
Questions :
Although several people from the business world claim that the managers recruited from outside the company obtain better performances, we believe that in the United States the proportion of theses managers is lower than 50%. Starting from our sample we want to verify this last assertion: in the United States the proportion of managers recruited from outside the company is lower than 50%.

Formulate precisely the hypotheses H_{0 }and H_{1 }that we want to confront in this problem. (1 point)

In the sample, what is the proportion of managers recruited from outside the company?(2 points)

Using EXCEL (or MINITAB), obtain the pvalue corresponding to your hypotheses formulated in a) and give your conclusion at the =5% level? (3 points)

Using EXCEL (or MINITAB), obtain the 95% confidence interval for the proportion of the United States managers recruited from outside the company and briefly give the interpretation of this interval. (4 points)

Would you have been able to verify the hypotheses formulated in a) if instead of having taken a simple random sample, the researchers’ team had used a stratified sampling design with the managers recruited externally as first strata and the managers recruited within the company as second strata? Briefly justify your answer. (5 points)
Problem 4. (8 points)
Questions :

What are the means and the standard deviations of the score of performance for the groups of managers recruited internally and externally respectively? (4 points)
External mean of performance: standard deviation :
Internal mean of performance: standard deviation :

We are now interested to verify the hypothesis that on average the managers recruited from outside the company obtain higher scores of performance than the ones recruited from within the company. Previously we carried out a test on the variances in order to take the good statistical test to compare the means. We obtained the following results for the test on the variances:
H_{0 }: equal variances H_{1 }: unequal variances pvalue = 0.309.
Using EXCEL (or MINITAB), find the pvalue corresponding to the hypothesis on the means that we want to verify and briefly comment the results considering a =5% level . (4 points)
Problem 5. (15 points)
Before undertaking a multiple linear regression analysis, it is important to examine the scatterplots between all the variables as well as the correlation coefficients.
Salary
Years


Origin
Years
Outside


Origin
Origin


Correlations (Pearson)
PValue
Performance Salary
Salary 0.684
0.000
Years 0.068 0.323
0.410 0.000
In order to greater analyse and understand this set of data and the relations between the variables, it is also interesting to examine the scatterplots between the performance, the salary and the years of experience by identifying on scatterplots the two manager groups. The Pearson correlation coefficients between these variables were also calculated separately for each of the two groups.
Salary
Years
O external
+ internal
O external
+ internal


Years
O external
+ internal


Managers recruited from outside the company (external)
Correlations (Pearson)
PValue
Performance Salary
Salary 0.736
0.000
Years 0.150 0.174
0.245 0.175
 Managers recruited within the company (internal)
Correlations (Pearson)
PValue
Performance Salary
Salary 0.642
0.000
Years 0.276 0.014
0.009 0.899

Questions :

According to the graphs and the Pearson correlation coefficients, we note that for all the managers there is a negative linear relation which is significant (r = 0.323) between the salary and the number of years of experience. How would you explain this relation which, at first sight, seems to be somewhat unexpected? (5 points)
Using SAS software, we have obtained a summary of all the multiple linear regression models characteristics. The results are the following :
N = 150 Regression Models for Dependent Variable: Performance
Number in Rsquare Adjusted C(p) Variables in Model
Model Rsquare
1 0.46737553 0.46377672 30.84091 SALARY
1 0.05674645 0.05037311 167.17715 EXTINT
1 0.00458214 .00214366 184.49664 YEARS

2 0.56021507 0.55423160 2.01651 SALARY YEARS
2 0.48849467 0.48153541 25.82897 SALARY EXTINT
2 0.11058043 0.09847948 151.30330 YEARS EXTINT

3 0.56026479 0.55122914 4.00000 SALARY YEARS EXTINT

Questions :

Which one of the various multiple and simple linear regression models seems to be the best and why ? (4 points)

Using EXCEL (or MINITAB), obtain the linear regression line for the best model found in b) and briefly interpret the coefficients of this model as well as the squared coefficient of determination. (6 points)
Solutions :
Problem 1.

2 by 3 crossed table. Yes there is a significant difference since the pvalue = 0.000007 < 0.05. So, we reject the hypothesis H_{0} : there is no link between the confidence level granted to the mayor by the Montreal francophones comparatively to the anglophones (or others). (The coefficient of Cramer = 0.2288)

The proportion of Montrealers who have less confidence in the mayor is similar for francophones and anglophones (38.56% and 39.60% respectively). However, francophones have more confidence in the mayor in a proportion of only 10.46% comparatively to 26.85% for anglophones. On the other hand, the proportion of Montrealears who have the same level of confidence in the mayor is 50.98% for francophones and 33.56% for anglophones.

H_{0} : p_{francophones }= p_{anglophones} vs H_{1} : p_{francophones } p_{anglophones}_{ }.


pvalue = 0.0035 < = 0.05. Consequently, we reject the hypothesis H_{0} . So, in Montreal, the proportion of francophones in favour of the « one island, one city» project is significantly different from the proportion of anglophones (or others) in favour of the project.
Problem 2.

Minimum = 48, Maximum = 103, Mean = 71.63 and standard deviation = 10.704

95% CI (69.906 ; 73.360). By saying that the true mean of the United States managers salaries is between 69,906$ and 73,360$, there is only 5% chance of error.

The pvalue to confront these hypotheses would be > 0.05 because 70,000 is included in the 95% confidence interval .
Problem 3.

H_{0} : p_{external } 50%_{ }against H_{1} : p_{external }< 50%.


pvalue = 0.0169 < = 0.05. Consequently, we reject the hypothesis H_{0} . So, the proportion of managers recruited from outside the company is significantly lower than 50%.

95% CI (33.45% ; 49.21 By saying that the true proportion of managers recruited from outside the company is between 33.45% and 49.21% , there is only 5% chance of error.

No. In the case of a stratified sampling where the two strata are the managers recruited from within
and outside the company respectively, researchers predetermine the number of managers to sample from within and outside the company and thus automatically determine the percentage of managers recruited from outside the company (and within the company) that will be included in the total sample.
Problem 4.

External performance mean : 6.32 standard deviation : 1.342
Internal performance mean : 5.60 standard deviation : 1.518

H_{0 }: _{external} _{internal } vs H_{1 }: _{external} > _{internal } . Test to compare two means with equal variances (because we do not reject the equality of variances, pvalue = 0.309 > 0.05): pvalue = 0.001665 < = 0.05. Consequently, we reject the hypothesis H_{0} . Thus, the mean of the managers scores of performance hired from outside the company is significantly higher than the mean of the managers scores of performance recruited from within the company.
Problem 5.

By Analysing the scatter plots, we observe that the « externals » have on average a higher salary than the « internals » while having on average less years of experience. Also when we look separately at the relation between the salaries and the years of experience for the « externals » and the «internals», the link is no longer significant (externals : r = 0.174 pvalue = 0.175 ; iternals : r = 0.014 pvalue = 0.899).

The model with salaries and years of experience. Comparatively to the other models, this model has the greatest value of R^{2} _{ajusted} (55.4%) and the smallest value of C_{p} (2.01).

Performance = 2.9206 + 0.1093 x salary + 0.1215 x years of experience. R^{2} = 56.02%, therefore, 56.02% of the observed variability in the managers scores of performance is explained by the salaries and the years of experience. According to the model, when the salary is higher, the score of performance is higher. Also, when the number of years of experience is higher, the score of performance is higher.
5160196 Final exam – January 2000
