Table 6 – Sources of information for league expansions and initiatives
3.6.1 Summary of Data Included
Table 7 shows a summary of the data included in this study.
Sport
|
Data
|
American Football
|
1970 – 2008 excluding 1982 and 1986 due to strikes.
|
Baseball
|
1961 – 2008 excluding 1981, 1994 and 1995 due to strikes.
|
Basketball
|
1962 – 2008 excluding 1998 due to strike.
|
Ice Hockey
|
1967 – 2008 excluding 1994 and 2004 due to strikes.
|
Table 7 – Summary of Data
3.7 Ethical Considerations
A full ethical checklist was completed for this study.
There were no participants in the study other than the researcher. Therefore there were no ethical considerations with regards to vulnerable groups and no requirements to consider while gathering data, either relating to observing participants or obtaining their consent. Furthermore there were no ethical considerations regarding the type of data being gathered, notably taking body samples and the like or using procedures of a potentially physically challenging nature.
The data used to produce the analysis is data available to the general public as it relates to the historical results of sports leagues and the policies that governed the leagues at that time. There are therefore no issues regarding data confidentiality or disposal of data once the study is complete.
3.8 Competitive Balance Measures
This study will examine five different measures of CB. Table 8 shows these:
Measure
|
Definition
|
SD
|
|
NSD
|
|
GC
|
See Diagram 1 below for pictorial representation
|
HHI
|
|
FCCR
|
|
Table 8 – Summary of the five measures of CB used in this study
Cumulative Curve
Default values
Area
Diagram 1 – Pictorial representation of GC
3.8.1 Standard Deviation
The 0.5 term in the SD calculation replaces the population mean in the normal calculation of standard deviation. This is because in competitive North American sports there is almost always a winner and a loser and in the small number of games that are tied the result is actually recorded as half a win and half a loss. Therefore the mean number of games won by all teams is 0.5.
3.8.2 Normalised Standard Deviation
To allow comparison across sports where different numbers of games are played it is important to compare the SD observed as our first measure with the SD expected from an ideal league under the same number of games. The ideal league would have an SD of where n is the number of games. This defines the normalised measure as in table 8.
3.8.3 Gini Coefficient
The GC is calculated by placing the observations in ascending order of win percentage and plotting against the default values. In this study the default values are values where all teams have a win percentage of 50%. The measure is then defined as twice the area shown in Diagram 1 and takes a value between 0 and 1.
3.8.4 Herfindahl-Herschmann Index
This has been adapted from industry where it measures the competitiveness of a particular market. It is calculated from the sum of squares of the market shares of the firms in the market. When related to professional sports it is calculated from the sum of squares of the percentages of games won by each team in the league.
3.8.5 Five Club Concentration Ratio
The FCCR is a simplified HHI that focuses on the number of wins achieved by the best five clubs in the league and compares it to the number of games in the league. It is felt that it is a worthwhile measure as in the North American sports leagues 5 clubs represents 15-20% of the clubs. Whilst the choice to use the top 5 clubs as opposed to the top 3 or top 7 is an arbitrary one it is the one most used and will also be used here. Further studies may focus on whether an arbitrary number should be used or whether investigation as to the merits of different numbers or even percentages should be undertaken to determine an optimum value for this kind of measure.
3.9 Regression
To assess the impacts of each of the governance changes in each sport a linear regression equation will be constructed for each measure of CB. As there are five measures of CB and four sports this will result in twenty regression equations. For example the regression equation for Baseball for the HHI measure will be:
HHIt = β1 + β2X2t + β3X3t + β4X4t + β5X5t + εt
where:
HHI = Herfindahl-Hirschmann Index
X2 = Luxury Tax dummy variable
X3 = Revenue Sharing dummy variable
X4 = Expansion Team Existence dummy variable
X5 = Young Team Existence dummy variable
For a more complete discussion of the regression technique with sports in mind please see Downward and Dawson (2000). For a full mathematical approach see Hair et al (1995). However, a brief summary is given below.
Regression analysis estimates relationships between a dependent variable (a variable that the analyst is interested in explaining) and a series of explanatory variables (that the analyst believes directly influence the dependent variable).
A linear regression model will be of the form:
Yt = β1 + β2X2t + β3X3t + β4X4t + εt
Yt is the dependent variable. In our case above it is the HHI.
X2 – X4 are the explanatory variables. In the case above there are four variables of this type and are the existence of a luxury tax, revenue sharing agreement, an expansion team and a young team. They are dummy variables that take a value of 1 or 0 depending whether they are “on” or not as the explanatory variables themselves are clearly qualitative in nature.
The β values are the coefficients and show the influence of the explanatory variables in the model. They are determined by the regression analysis and can have either positive or negative signs. A negative sign indicates an inverse relationship between the dependent and explanatory variables when they are scored ‘1’ rather than ‘0’.
Of particular interest is the coefficient β1. This is the value that represents the average value of the dependent variable regardless of the influence of the explanatory variables.
The other β values are called partial slope coefficients. They show the change in the dependent variable caused by a unit change in the relevant explanatory variable.
Linear regression analysis uses the principle of “least squares” to estimate the β coefficients. It is achieved by minimising the squares of the differences between values of the dependent variable predicted by the estimated coefficients and its actual values. Squared values are used because the differences can be positive or negative.
For each coefficient it is also necessary to test whether it is significantly different from zero. This indicates that the sample value was not just an exception to the data. This is done by calculating the t-values for each coefficient. The t-value is calculated by dividing the estimate for the coefficient by the standard error of the coefficient. This allows the null hypothesis of βn=0 to be tested against the alternative hypothesis of βn≠0 using the t-distribution test statistic
(bn-βn)/se(bn)
Where se(bn) is the standard error of the coefficient.
Given that ideally the standard error should be as small as possible the larger this t-statistic is the more reliable the estimate of the coefficient. The t-distribution is derived from the normal distribution but is used instead as there is only sample data here. The coefficient estimators themselves are normally distributed though with a mean of the true value of the coefficient and a variance. Statistical tables can then be used to assess their reliability. For example, the null hypothesis can be rejected at approximately the 5% significance level if the t-statistic is greater than 2 in absolute value. This significance level is the level at which we would expect to reject the null hypothesis in error.
The overall fit of the model can be tested using the R2 value where R2 is defined as the Sum of squares Regression / Total Sum of Squares. The sum of squares predicted by the model is compared to the total sum of squares. The higher this ratio the better the model with an R2 value of 1 being a model that explains all the variance.
The hypothesis that R2 > 0 can then be tested (i.e. that the regression model explains more variation than the average) by calculating the test F statistic as follows:
F =
In the above equation the Degrees of freedomregression is the number of estimated coefficients (including constant)-1 and the Degrees of freedomresidual is the sample size-the number of estimated coefficients (including constant). In this study the overall strength of the model is not relevant. This study is examining the variables in the models and whether the measures of CB behave the same as each other.
The explanatory variables can be tested for collinearity. Collinear variables are ones where there is a linear relationship between them and they therefore explain the same variance in any model. This means that it is impossible to determine which variable is causing an observed effect.
To test whether variables are collinear a tolerance statistic for each variable is calculated as follows:
is the coefficient of determination of regression of the variable i on all other variables. If this tolerance statistic is below 0.1 it suggests that collinearity exists for that variable. This is relevant for this study and the analysis will cover this collinearity test.
3.10 Conclusion
The methodology chapter described how this study will attempt to answer the research questions posed. Sections 3.2 and 3.3 discussed the philosophical position of the study, concluding that the ontology is realist and the epistemology is hypothetico-deductivist.
Section 3.4 defined the hypotheses to be used and section 3.5 outlined the data available, listing the league initiatives and relevant league expansions.
Section 3.6 discussed data reliability with the conclusion that strike seasons should be excluded from the analysis as each strike season is different in length and would have to be modelled using its own dummy variable rather than having one dummy variable covering all strike seasons.
The ethical considerations of the study were discussed in section 3.7 with the conclusion that there aren’t any issues to consider.
Section 3.8 introduced and defined the five measures of CB to be used in this study.
Finally section 3.9 outlined the technique of regression. This section concluded with the recommendation that regression is the correct method to produce the analysis required to answer the research questions.
4 Results
4.1 Introduction
Chapter 4 will present the results of the study. Section 4.2 will give an overview of the data manipulation process and the outputs produced as a result. Section 4.3 will then present a series of graphs showing the values for the measures combined with indicators for when the initiative variables and other explanatory variables took place. These will be described to give an overview of the data before regression modelling.
Section 4.4 will present the bulk of the analysis and will focus on the regression models produced. The first part of section 4.4 will present the results in their entirety for Baseball and will talk through them step by step. The second part of section 4.4 will then extend this to present the results for the other sports in a summarised way.
Section 4.5 will present an overall summary of the results by drawing out the key findings and presenting these clearly and concisely. It will use an anomaly seen in the Basketball results to highlight a particular issue. Section 4.6 will identify the limitations of this study while Section 4.7 will tackle some of these and make recommendations for future research that may be undertaken as a result of the analysis produced. Section 4.8 will finally draw conclusions from the key findings.
4.2 Data Manipulation
The initial downloads from the source website specified in section 3.6 were MS Excel files containing win percentages for each club for all seasons, one file for each sport. The data from seasons not being used was removed. This included data from strike seasons leaving just the seasons listed in table 7 in section 3.6.1. Dummy variables for the league initiatives were created. Dummy variables were also created for league expansions and the existence of “young” teams taking care to ensure that if a team changed cities it was not classified as either.
The five measures of CB were then created for each season. These completed the data manipulation in MS Excel. The data files were then imported into the statistical software package SPSS where the relevant regression models were built and the relevant statistics produced.
4.3 Descriptive Statistics
4.3.1 Baseball
Diagram 2 – Baseball Data Graph and CB measures
Diagram 2 shows the measures and the variables for Baseball. The NSD is plotted on the secondary axis to allow the other measures to be seen easier. Below the chart is a series of coloured bars showing when each initiative and other activity occurred.
The league initiatives of Luxury Tax and Revenue Share are recent whereas most of the expansions took place between 1960 and 1980. There are no young teams immediately after the expansion of 1993 because both the 1994 and 1995 seasons were lost to strikes.
Analysis of the diagram would suggest that the SD and GC measures are very similar and that the NSD measure shows the same pattern which is unsurprising as it is derived from the SD measure. The other measures are different and both are decreasing suggesting that CB has improved. This will be discussed in more detail later.
Further analysis of the diagram would suggest that there are peaks in the measures that correspond with the expansions and young team variables. This would suggest that they have a negative impact on CB. This will be investigated fully in the regression models.
4.3.2 Basketball
Diagram 3 – Basketball Data graph and CB measures
Again the NSD is on the secondary axis. As before the SD, NSD and GC measures show very similar patterns and the HHI and FCCR are completely different.
Unlike with Baseball it is very difficult to see any clear correlations with expansion or young teams but it appears for the HHI and FCCR measures the presence of a salary cap has had a significant effect. This will be explored further.
Low Salary Cap is defined as less than $20m.
Medium Salary Cap is defined as between $20m and $40m.
High Salary Cap is defined as greater than $40m.
4.3.3 American Football
Diagram 4 – American Football Data graph and CB measures
The NSD line is shown on the secondary axis.
As with Baseball and Basketball the lines for SD, NSD and GC are very similar and are different to HHI and FCCR which are generally decreasing over time. It is very difficult to see any obvious correlations between any of the league initiatives or the other events such as expansion or the existence of young teams.
Low Salary Cap is defined as being below $100m.
High Salary Cap is defined as being above $100m.
4.3.4 Ice Hockey
Diagram 5 – Ice Hockey Data graph and CB measures
The NSD is shown on the secondary axis.
As with all other sports the SD, NSD and GC lines are similar to each other and the HHI and FCCR are different and decrease over time. The periods 1968-1976 and 1992-1993 seem to show peaks in the SD, NSD and GC measures. These appear to coincide with the existence of young teams in the league and to a lesser extent, expansions. The FCCR is lowest over the most recent 5 years when there has been a salary cap.
4.4 Regression Models
Section 4.4.1 will show the regression models for Baseball. The tables will be presented in full and the discussion will begin by describing the contents of the tables before analysing the results. Sections 4.4.2 – 4.4.4 will show the models for the other three sports. The tables will be presented in the same format so there will be no need to describe the table itself. A full analysis of the results shown in the tables will be given in each case. Baseball has been chosen as the first sport as it is the most commonly analysed in the literature.
4.4.1 MLB (Baseball)
Table 9 shows descriptive statistics relating to the dependent variables used in the models.
Dependent Variable
|
Mean
|
Standard Deviation
|
SD
|
0.072
|
0.011
|
NSD
|
1.820
|
0.280
|
GC
|
0.079
|
0.012
|
HHI
|
0.040
|
0.006
|
FCCR
|
0.234
|
0.031
|
Table 9 – Dependent Variables Used in Baseball Models
Table 10 shows the explanatory variables used in the models. Since all are dummy variables this only shows how many seasons each took a value of “1” for. In 11% of seasons there was an Expansion team, in 20% a Young team. In 20% of seasons there was Luxury Tax present and in 27% there was Revenue Sharing.
Explanatory Variable
|
Mean
|
Expansion Team
|
0.11
|
Young Team
|
0.20
|
Luxury Tax
|
0.20
|
Revenue Share
|
0.27
|
Table 10 – Explanatory Variables Used in Baseball Models
Table 11 shows the coefficients for each of the explanatory variables in each of the five models. It also shows the t-statistic for each coefficient and the associated significance level. Significance at the 5% level is shown by an *. The key points to note are the following:
-
The sign of the coefficient. If the sign is negative the variable is shown to have improved CB in the league. If the sign is positive the variable has had an adverse effect on CB.
-
The significance. As outlined in section 3.4 this study is testing the hypotheses that these coefficients are equal to 0. The significance gives us the probability of obtaining the observed result by chance. If the significance is less than 0.05 the relevant hypothesis will be rejected in favour of the alternative hypothesis that the coefficient is not equal to 0.
-
The comparisons of the variables across the five models. If there are differences in how the relationship between the variables and differences between the significance levels the variables behave differently depending on which measure of CB is being used.
Dependent
|
Explanatory
|
Coefficient
|
t-Statistic
|
Significance
|
SD
|
Constant
|
0.067 *
|
36.730
|
.000
|
Expansion Team
|
0.018 *
|
4.264
|
.000
|
Young Team
|
0.009 *
|
2.546
|
.015
|
Luxury Tax
|
-0.006
|
-1.064
|
.294
|
Revenue Share
|
0.008
|
1.383
|
.175
|
NSD
|
Constant
|
1.704 *
|
36.730
|
.000
|
Expansion Team
|
0.463 *
|
4.264
|
.000
|
Young Team
|
0.219 *
|
2.546
|
.015
|
Luxury Tax
|
-0.163
|
-1.064
|
.294
|
Revenue Share
|
0.191
|
1.383
|
.175
|
GC
|
Constant
|
0.074 *
|
36.991
|
.000
|
Expansion Team
|
0.019 *
|
4.120
|
.000
|
Young Team
|
0.009 *
|
2.413
|
.021
|
Luxury Tax
|
-0.008
|
-1.165
|
.251
|
Revenue Share
|
0.010
|
1.602
|
.117
|
HHI
|
Constant
|
0.042 *
|
47.857
|
.000
|
Expansion Team
|
0.000
|
0.112
|
.912
|
Young Team
|
0.003
|
1.816
|
.077
|
Luxury Tax
|
0.001
|
0.266
|
.792
|
Revenue Share
|
-0.009 *
|
-3.316
|
.002
|
FCCR
|
Constant
|
0.241 *
|
49.296
|
.000
|
Expansion Team
|
0.008
|
0.658
|
.514
|
Young Team
|
0.018
|
1.988
|
.054
|
Luxury Tax
|
0.002
|
0.140
|
.890
|
Revenue Share
|
-0.044 *
|
-3.026
|
.004
|
Share with your friends: |