Competitive balance measures in us professional sport: an empirical comparison



Download 362.34 Kb.
Page3/5
Date17.08.2017
Size362.34 Kb.
#33659
1   2   3   4   5

Table 6 – Sources of information for league expansions and initiatives

3.6.1 Summary of Data Included

Table 7 shows a summary of the data included in this study.



Sport

Data

American Football

1970 – 2008 excluding 1982 and 1986 due to strikes.

Baseball

1961 – 2008 excluding 1981, 1994 and 1995 due to strikes.

Basketball

1962 – 2008 excluding 1998 due to strike.

Ice Hockey

1967 – 2008 excluding 1994 and 2004 due to strikes.

Table 7 – Summary of Data

3.7 Ethical Considerations

A full ethical checklist was completed for this study.

There were no participants in the study other than the researcher. Therefore there were no ethical considerations with regards to vulnerable groups and no requirements to consider while gathering data, either relating to observing participants or obtaining their consent. Furthermore there were no ethical considerations regarding the type of data being gathered, notably taking body samples and the like or using procedures of a potentially physically challenging nature.

The data used to produce the analysis is data available to the general public as it relates to the historical results of sports leagues and the policies that governed the leagues at that time. There are therefore no issues regarding data confidentiality or disposal of data once the study is complete.



3.8 Competitive Balance Measures

This study will examine five different measures of CB. Table 8 shows these:



Measure

Definition

SD




NSD




GC

See Diagram 1 below for pictorial representation


HHI




FCCR




Table 8 – Summary of the five measures of CB used in this study


Cumulative Curve

Default values

Area

Diagram 1 – Pictorial representation of GC



3.8.1 Standard Deviation

The 0.5 term in the SD calculation replaces the population mean in the normal calculation of standard deviation. This is because in competitive North American sports there is almost always a winner and a loser and in the small number of games that are tied the result is actually recorded as half a win and half a loss. Therefore the mean number of games won by all teams is 0.5.



3.8.2 Normalised Standard Deviation

To allow comparison across sports where different numbers of games are played it is important to compare the SD observed as our first measure with the SD expected from an ideal league under the same number of games. The ideal league would have an SD of where n is the number of games. This defines the normalised measure as in table 8.



3.8.3 Gini Coefficient

The GC is calculated by placing the observations in ascending order of win percentage and plotting against the default values. In this study the default values are values where all teams have a win percentage of 50%. The measure is then defined as twice the area shown in Diagram 1 and takes a value between 0 and 1.



3.8.4 Herfindahl-Herschmann Index

This has been adapted from industry where it measures the competitiveness of a particular market. It is calculated from the sum of squares of the market shares of the firms in the market. When related to professional sports it is calculated from the sum of squares of the percentages of games won by each team in the league.



3.8.5 Five Club Concentration Ratio

The FCCR is a simplified HHI that focuses on the number of wins achieved by the best five clubs in the league and compares it to the number of games in the league. It is felt that it is a worthwhile measure as in the North American sports leagues 5 clubs represents 15-20% of the clubs. Whilst the choice to use the top 5 clubs as opposed to the top 3 or top 7 is an arbitrary one it is the one most used and will also be used here. Further studies may focus on whether an arbitrary number should be used or whether investigation as to the merits of different numbers or even percentages should be undertaken to determine an optimum value for this kind of measure.



3.9 Regression

To assess the impacts of each of the governance changes in each sport a linear regression equation will be constructed for each measure of CB. As there are five measures of CB and four sports this will result in twenty regression equations. For example the regression equation for Baseball for the HHI measure will be:

HHIt = β1 + β2X2t + β3X3t + β4X4t + β5X5t + εt

where:


HHI = Herfindahl-Hirschmann Index

X2 = Luxury Tax dummy variable

X3 = Revenue Sharing dummy variable

X4 = Expansion Team Existence dummy variable

X5 = Young Team Existence dummy variable

For a more complete discussion of the regression technique with sports in mind please see Downward and Dawson (2000). For a full mathematical approach see Hair et al (1995). However, a brief summary is given below.

Regression analysis estimates relationships between a dependent variable (a variable that the analyst is interested in explaining) and a series of explanatory variables (that the analyst believes directly influence the dependent variable).

A linear regression model will be of the form:

Yt = β1 + β2X2t + β3X3t + β4X4t + εt

Yt is the dependent variable. In our case above it is the HHI.

X2 – X4 are the explanatory variables. In the case above there are four variables of this type and are the existence of a luxury tax, revenue sharing agreement, an expansion team and a young team. They are dummy variables that take a value of 1 or 0 depending whether they are “on” or not as the explanatory variables themselves are clearly qualitative in nature.

The β values are the coefficients and show the influence of the explanatory variables in the model. They are determined by the regression analysis and can have either positive or negative signs. A negative sign indicates an inverse relationship between the dependent and explanatory variables when they are scored ‘1’ rather than ‘0’.

Of particular interest is the coefficient β1. This is the value that represents the average value of the dependent variable regardless of the influence of the explanatory variables.

The other β values are called partial slope coefficients. They show the change in the dependent variable caused by a unit change in the relevant explanatory variable.

Linear regression analysis uses the principle of “least squares” to estimate the β coefficients. It is achieved by minimising the squares of the differences between values of the dependent variable predicted by the estimated coefficients and its actual values. Squared values are used because the differences can be positive or negative.

For each coefficient it is also necessary to test whether it is significantly different from zero. This indicates that the sample value was not just an exception to the data. This is done by calculating the t-values for each coefficient. The t-value is calculated by dividing the estimate for the coefficient by the standard error of the coefficient. This allows the null hypothesis of βn=0 to be tested against the alternative hypothesis of βn≠0 using the t-distribution test statistic

(bnn)/se(bn)

Where se(bn) is the standard error of the coefficient.

Given that ideally the standard error should be as small as possible the larger this t-statistic is the more reliable the estimate of the coefficient. The t-distribution is derived from the normal distribution but is used instead as there is only sample data here. The coefficient estimators themselves are normally distributed though with a mean of the true value of the coefficient and a variance. Statistical tables can then be used to assess their reliability. For example, the null hypothesis can be rejected at approximately the 5% significance level if the t-statistic is greater than 2 in absolute value. This significance level is the level at which we would expect to reject the null hypothesis in error.

The overall fit of the model can be tested using the R2 value where R2 is defined as the Sum of squares Regression / Total Sum of Squares. The sum of squares predicted by the model is compared to the total sum of squares. The higher this ratio the better the model with an R2 value of 1 being a model that explains all the variance.

The hypothesis that R2 > 0 can then be tested (i.e. that the regression model explains more variation than the average) by calculating the test F statistic as follows:

F =

In the above equation the Degrees of freedomregression is the number of estimated coefficients (including constant)-1 and the Degrees of freedomresidual is the sample size-the number of estimated coefficients (including constant). In this study the overall strength of the model is not relevant. This study is examining the variables in the models and whether the measures of CB behave the same as each other.

The explanatory variables can be tested for collinearity. Collinear variables are ones where there is a linear relationship between them and they therefore explain the same variance in any model. This means that it is impossible to determine which variable is causing an observed effect.

To test whether variables are collinear a tolerance statistic for each variable is calculated as follows:



is the coefficient of determination of regression of the variable i on all other variables. If this tolerance statistic is below 0.1 it suggests that collinearity exists for that variable. This is relevant for this study and the analysis will cover this collinearity test.

3.10 Conclusion

The methodology chapter described how this study will attempt to answer the research questions posed. Sections 3.2 and 3.3 discussed the philosophical position of the study, concluding that the ontology is realist and the epistemology is hypothetico-deductivist.

Section 3.4 defined the hypotheses to be used and section 3.5 outlined the data available, listing the league initiatives and relevant league expansions.

Section 3.6 discussed data reliability with the conclusion that strike seasons should be excluded from the analysis as each strike season is different in length and would have to be modelled using its own dummy variable rather than having one dummy variable covering all strike seasons.

The ethical considerations of the study were discussed in section 3.7 with the conclusion that there aren’t any issues to consider.

Section 3.8 introduced and defined the five measures of CB to be used in this study.

Finally section 3.9 outlined the technique of regression. This section concluded with the recommendation that regression is the correct method to produce the analysis required to answer the research questions.

4 Results

4.1 Introduction

Chapter 4 will present the results of the study. Section 4.2 will give an overview of the data manipulation process and the outputs produced as a result. Section 4.3 will then present a series of graphs showing the values for the measures combined with indicators for when the initiative variables and other explanatory variables took place. These will be described to give an overview of the data before regression modelling.

Section 4.4 will present the bulk of the analysis and will focus on the regression models produced. The first part of section 4.4 will present the results in their entirety for Baseball and will talk through them step by step. The second part of section 4.4 will then extend this to present the results for the other sports in a summarised way.

Section 4.5 will present an overall summary of the results by drawing out the key findings and presenting these clearly and concisely. It will use an anomaly seen in the Basketball results to highlight a particular issue. Section 4.6 will identify the limitations of this study while Section 4.7 will tackle some of these and make recommendations for future research that may be undertaken as a result of the analysis produced. Section 4.8 will finally draw conclusions from the key findings.



4.2 Data Manipulation

The initial downloads from the source website specified in section 3.6 were MS Excel files containing win percentages for each club for all seasons, one file for each sport. The data from seasons not being used was removed. This included data from strike seasons leaving just the seasons listed in table 7 in section 3.6.1. Dummy variables for the league initiatives were created. Dummy variables were also created for league expansions and the existence of “young” teams taking care to ensure that if a team changed cities it was not classified as either.

The five measures of CB were then created for each season. These completed the data manipulation in MS Excel. The data files were then imported into the statistical software package SPSS where the relevant regression models were built and the relevant statistics produced.

4.3 Descriptive Statistics

4.3.1 Baseball

Diagram 2 – Baseball Data Graph and CB measures

Diagram 2 shows the measures and the variables for Baseball. The NSD is plotted on the secondary axis to allow the other measures to be seen easier. Below the chart is a series of coloured bars showing when each initiative and other activity occurred.

The league initiatives of Luxury Tax and Revenue Share are recent whereas most of the expansions took place between 1960 and 1980. There are no young teams immediately after the expansion of 1993 because both the 1994 and 1995 seasons were lost to strikes.

Analysis of the diagram would suggest that the SD and GC measures are very similar and that the NSD measure shows the same pattern which is unsurprising as it is derived from the SD measure. The other measures are different and both are decreasing suggesting that CB has improved. This will be discussed in more detail later.

Further analysis of the diagram would suggest that there are peaks in the measures that correspond with the expansions and young team variables. This would suggest that they have a negative impact on CB. This will be investigated fully in the regression models.



4.3.2 Basketball

Diagram 3 – Basketball Data graph and CB measures

Again the NSD is on the secondary axis. As before the SD, NSD and GC measures show very similar patterns and the HHI and FCCR are completely different.

Unlike with Baseball it is very difficult to see any clear correlations with expansion or young teams but it appears for the HHI and FCCR measures the presence of a salary cap has had a significant effect. This will be explored further.

Low Salary Cap is defined as less than $20m.

Medium Salary Cap is defined as between $20m and $40m.

High Salary Cap is defined as greater than $40m.

4.3.3 American Football

Diagram 4 – American Football Data graph and CB measures

The NSD line is shown on the secondary axis.

As with Baseball and Basketball the lines for SD, NSD and GC are very similar and are different to HHI and FCCR which are generally decreasing over time. It is very difficult to see any obvious correlations between any of the league initiatives or the other events such as expansion or the existence of young teams.

Low Salary Cap is defined as being below $100m.

High Salary Cap is defined as being above $100m.



4.3.4 Ice Hockey

Diagram 5 – Ice Hockey Data graph and CB measures

The NSD is shown on the secondary axis.

As with all other sports the SD, NSD and GC lines are similar to each other and the HHI and FCCR are different and decrease over time. The periods 1968-1976 and 1992-1993 seem to show peaks in the SD, NSD and GC measures. These appear to coincide with the existence of young teams in the league and to a lesser extent, expansions. The FCCR is lowest over the most recent 5 years when there has been a salary cap.



4.4 Regression Models

Section 4.4.1 will show the regression models for Baseball. The tables will be presented in full and the discussion will begin by describing the contents of the tables before analysing the results. Sections 4.4.2 – 4.4.4 will show the models for the other three sports. The tables will be presented in the same format so there will be no need to describe the table itself. A full analysis of the results shown in the tables will be given in each case. Baseball has been chosen as the first sport as it is the most commonly analysed in the literature.



4.4.1 MLB (Baseball)

Table 9 shows descriptive statistics relating to the dependent variables used in the models.



Dependent Variable

Mean

Standard Deviation

SD

0.072

0.011

NSD

1.820

0.280

GC

0.079

0.012

HHI

0.040

0.006

FCCR

0.234

0.031

Table 9 – Dependent Variables Used in Baseball Models

Table 10 shows the explanatory variables used in the models. Since all are dummy variables this only shows how many seasons each took a value of “1” for. In 11% of seasons there was an Expansion team, in 20% a Young team. In 20% of seasons there was Luxury Tax present and in 27% there was Revenue Sharing.



Explanatory Variable

Mean

Expansion Team

0.11

Young Team

0.20

Luxury Tax

0.20

Revenue Share

0.27

Table 10 – Explanatory Variables Used in Baseball Models

Table 11 shows the coefficients for each of the explanatory variables in each of the five models. It also shows the t-statistic for each coefficient and the associated significance level. Significance at the 5% level is shown by an *. The key points to note are the following:



  • The sign of the coefficient. If the sign is negative the variable is shown to have improved CB in the league. If the sign is positive the variable has had an adverse effect on CB.

  • The significance. As outlined in section 3.4 this study is testing the hypotheses that these coefficients are equal to 0. The significance gives us the probability of obtaining the observed result by chance. If the significance is less than 0.05 the relevant hypothesis will be rejected in favour of the alternative hypothesis that the coefficient is not equal to 0.

  • The comparisons of the variables across the five models. If there are differences in how the relationship between the variables and differences between the significance levels the variables behave differently depending on which measure of CB is being used.



Dependent

Explanatory

Coefficient

t-Statistic

Significance

SD

Constant

0.067 *

36.730

.000

Expansion Team

0.018 *

4.264

.000

Young Team

0.009 *

2.546

.015

Luxury Tax

-0.006

-1.064

.294

Revenue Share

0.008

1.383

.175

NSD

Constant

1.704 *

36.730

.000

Expansion Team

0.463 *

4.264

.000

Young Team

0.219 *

2.546

.015

Luxury Tax

-0.163

-1.064

.294

Revenue Share

0.191

1.383

.175

GC

Constant

0.074 *

36.991

.000

Expansion Team

0.019 *

4.120

.000

Young Team

0.009 *

2.413

.021

Luxury Tax

-0.008

-1.165

.251

Revenue Share

0.010

1.602

.117

HHI

Constant

0.042 *

47.857

.000

Expansion Team

0.000

0.112

.912

Young Team

0.003

1.816

.077

Luxury Tax

0.001

0.266

.792

Revenue Share

-0.009 *

-3.316

.002

FCCR

Constant

0.241 *

49.296

.000

Expansion Team

0.008

0.658

.514

Young Team

0.018

1.988

.054

Luxury Tax

0.002

0.140

.890

Revenue Share

-0.044 *

-3.026

.004


Download 362.34 Kb.

Share with your friends:
1   2   3   4   5




The database is protected by copyright ©ininet.org 2024
send message

    Main page