Senior Project Monte Carlo Methods

Download 218.61 Kb.

Date	19.10.2016
Size	218.61 Kb.
	#4370

Random Walk Random Walk Random Walk
Random Walk My System BCS

Sites

Adam Sites

Senior Project

Monte Carlo Methods

Dr. Frey and Dr. Volpert

Ranking Division 1-A College Football Teams

At the end of every regular season, the best of the best in college football wait to see the BCS rankings to determine which bowl game they will play in and their respective opponents. For those unaware, BCS stands for Bowl Championship System, and it is a complex formula designed to determine the top 25 college football teams over the course of the year. Over the course of this semester, we have learned how random walks generated by the program R can be used to predict certain models. As part of my senior project, I looked to create my own ranking system based on a series of random walks based on the paper Random Walker Ranking for NCAA Divison I-A Football by Callaghan, Mason, and Porter. After studying their system, I attempted to create my own twist to that system in hopes of getting a more accurate ranking based on certain factors.

College football is a billion dollar industry that brings in millions of dollars for schools across the country. Attendance is at an all time high, reportedly 49,670,895 people attended games across 639 NCAA schools. That is an increase of almost 3% from the previous year (Hootens). Along with the increase in attendance is the increase in viewers watching from home. Fans turn out in record numbers for bowl games as well. 1,813,215 people attended the 35 bowl games in 2010, in addition to the 134 million viewers tuning in from home. It has been estimated that 1.6 billion dollars was netted from travel and tourism for those 35 bowl games. For colleges across the country, football is seen as a way to promote the school’s image as well as raise money. According to CNN, of all the teams in the top 6 conferences in division 1 football, only one reported a monetary loss for that season. 3 teams broke even, leaving the other 64 teams to report an average profit of around 15.8 million dollars over the 2010 season. That is over a million dollars a game respectively. Money may not grow on trees, but for colleges across the country they can count on college football to bring in the money.

Some people outside of the college football world might question, “What is so important about ranking the top 25 teams correctly? What does it matter if the 2^nd best team is technically labeled as the 3^rd?” The answer can be found in a variety of explanations. Schools that play in bowl games get their college or university recognized by millions of people at once. This exposure alone will increase applications to that school. Partying and tailgating go hand in hand with college football, so the more successful the team is the more the school’s reputation grows as being a“party”school. As reputation increases there is a corresponding increase in application numbers among graduating high school seniors. Another monetary value is the fact that schools are given money to attend bowl games. Individual bowl game’s sponsors pay for schools to attend and play in their respective games. For example this past year, the Chick-Fil-A bowl paid 3.25 million dollars to the teams participating in their bowl game (O’Toole). Typically the more important the bowl game, the higher the payout. This accuracy in ranking the teams is important. Along with that revenue, the school can also count on merchandise sales to increase. However, not every motivation is strictly about the money. Coaches recruiting high school players into their program will be able to boast about their programs history of winning and attending bowl games. This fact tells the players that the program is successful, and with successful programs typically comes successful athletes. These athletes then in turn are scouted by professional football teams, so the more exposure the better. Therefore it benefits the recruiting process to attend and win bowl games.

Currently, the BCS uses a variety of polls and computer system rankings to determine the official top 25 teams. First we will start with the polls. The Associated Press (AP) Top 25 poll is written by reporters who each rank the top 25 teams in Division 1-A. The coaches of Division 1-A also have a poll, where they rank the top 25 teams as well. After the last week of the regular season, the polls are taken and each team is assigned points for where they are ranked in that respective poll. For example, if Coach A thinks Notre Dame is the best team, they will receive 25 points in Coach A’s rankings. If Coach B thinks they are the 3^rd best team, they will receive 23 points in Coach B’s rankings and so on. Each team’s total points are calculated across all the coaches and polls and are used to create the team’s percentage of points earned over all possible points. So if UCLA received 1,580 out of a possible 1,625 points by the writers, they are given a 97.2 percentage value. These are added to a computer average collected from 6 separate systems. Each computer system ranks the top 25 teams, however for each respective team the highest and lowest ranking is dropped. The other four rankings are assigned points like the polls, and then those 6 assigned values are averaged to create one final ranking. The team with the highest value is ranked first.

Some people believe this current BCS system is flawed and does not accurately represent the best 25 teams and call for a new update system to take its place. Some arguments against the system is that an undefeated team can be placed below a team with 1 or even 2 losses, as well as the fact that subjective voting takes place. Coaches can vote/rank their team number 1 in their respective poll even if they do not deserve to be, and the same can go for reporters. Reporters in the AP poll with ties to schools or coaches can vote for that school and are therefore biased. President Barack Obama was asked his thoughts on the current BCS system, and he told reporters that he suggest that an 8 team playoff would be more fitting that the current system (Dufresne). Other criticisms include that the computer systems do not take into consideration lopsided victories and injuries. A star player hurt on a good team could have a huge impact on the outcome of future games, although the computer systems fail to recognize that.

In a paper titled Random Walker Ranking for NCAA Division I-A Football by Thomas Callaghan, Peter Mucha, and Mason Porter, they have come up with a unique approach that differs from the BCS model. They have suggested using a random walker to generate the rankings based on wins and losses. I decided to take their model and create it in R to determine how effective it was. In order to begin, I had to record the entire season’s schedule and outcomes from all of Division I-A. Luckily, a fellow named James Howell has posted exactly that data on his website (Howell). I then copied his data into an Excel Spreadsheet and separated the columns into Team 1 name, Team 1 score, Team 2 name, Team 2 score. I read that data into R, and stored each column. Because Division 1-A football has 195 teams, I created 2 matricies that were 195 by 195 in size. The first matrix was created to store opponents and the second was used to score outcomes of those games. In order to fill those matricies, I first needed to break down the columns of the excel file. I created a list of all the teams alphabetically, and assigned them a number from 1 – 195. I then filled the opponent matrix based on the corresponding team numbers. Each row in the matrix corresponded with a unique team, and their opponents numbers will filled in across the row. Each column across the entire matrix corresponded to each week of college football. So if the opponent matrix read [2,6]=120, that meant that in the 6^th week of the season team 2 played team 120. For the outcome matrix, I filled a corresponding matrix with 1 and -1 values representing a win(1) or a loss(-1) between those teams. So if team 2 beat team 120 in week 6, the outcome matrix would read [2,6]=1, while [120,6]=-1. Now I had the entire season’s schedule and results into 2 matrices.

Now I created my random walker loop that would eventually rank each team from 1 to 195. To set this up, I first needed to establish how many steps I wanted my random walker to take. I decided on 10 millions steps, which would give him enough steps to traverse the entire Division. I then said for each step from 1 to 10 million, I would start at a team, and randomly select one of their opponents from random from the opponent matrix. After I selected that team, I would check the outcome matrix to check to see which team won. Once that was determined, I would create a random number between 0 and 1. I would compare this number to the probability value that the better team won the head to head matchup, or what I call the p-value. Basically, the p value must be greater than .5, but less than 1. The reason for this is if the p value equals a half that means the team that won was no better than the team that lost and basically wins and losses are a coin flip. If p is 1, that means that the team that won would never lose to the team it beat if it played them again, which is unlikely. Field conditions, injuries, and the nature of competition all point to the fact that a team has some chance of beating an opponent, regardless of how good they are.

To better illustrate this walk, let’s take a look at an example. Suppose the walker is at team 2, and randomly selects team 120 as its opponent. Looking at the opponent matrix, we see team 2 has won the regular season matchup between the teams. Now we create a random number between 0 and 1. If it is less than the p value, the walker will remain on team 2 and increase its count by 1. If the random number is greater than p, the walker will leave team 2 and go to team 120, and increase team 120’s count by 1. However, if team 2 had lost the head to head matchup, the random generated number would be compared to (1-p). Therefore is the number was less that (1-p) it would stay, and greater than (1-p) it would leave.

Now we do this walk for 10 million steps with the same p value throughout the entire process. We count the amount of steps the walker has taken and the count each team has. The higher the count for each team, the higher the ranking they will receive. Below are the results my program produced with their respective p-values. As the p value gets closer to .5, the teams the rankings begin to reflect less of the records and become more of a toss-up. Again, this is what we expected because with the lower p value wins and losses become irrelevant. What is nice about this system is that it does in fact take into consideration strength of schedule. If a team plays better competition who has better records, the walker will be funneled to their location more often than a team with fewer wins. As the walker is located by the teams that win more often, they are also exposed to teams that those teams play. So Wisconsin benefits from playing tougher opponents in the Big 10 conference because as those teams accumulate victories, it allows the walker more exposure to Wisconsin. Thus Wisconsin’s ranking should increase if they were able to beat their competition.

What is interesting to note is that with the higher p values, a non division I-A team was actually ranked in the top 25. I believe this to be caused by a “perfect storm” type scenario; early in the season Virginia Tech lost to non-BCS opponent James Madison University. Virginia Tech then went on to have a very good season, and only other regular season loss was to a Boise State team that had 12 wins and only one loss. Therefore due to the fact that James Madison University only played one game against Division I-A opponents, and because they won that game, as well as their opponent having a very good record and beating a strong opponent, they ended up with a top 25 ranking. I imagine that the walker would funnel his way toward Boise State and Virginia Tech (due to their high amounts of wins), and eventually fall on James Madison University through Virginia Tech. Once there, the only way he could leave was to attempt to walk back to Virginia Tech, but the odds were against him because James Madison University won that head to head matchup. This anomaly is therefore a flaw in the system, and should be noted. To potentially correct this error, one might look to modify the code in such a way that only games against Division I-A opponents were potential candidates for the walker to go, and everything else would not exist. This might give a slight advantage to teams that played only Division I-A opponents and therefore would have an extra game to their schedule, and thus another opportunity the walker to land on them, but that hypothesis was not examined in this experiment.

Looking back at the data, clearly it is obvious that there are distinct changes to the rankings when compared to the BCS formula. It is also clear that the rankings differ when the p value is changed, so then there becomes an obvious question. Which p-value is best? There is no official answer; it lies in how much value you put in a win. Win differential does not matter in this random walk, so a win by 1 is equally as valuable as a win by 35. Therefore it would be up a panel to determine which p-value they want to use so that a win is valued high enough that it gives credit to those with winning records, but not too high that only wins determine the ranking. The more “excuses” one can come up with that caused a team to lose should technically lower the p-value. Bad playing conditions, injuries, playing on a short week, and simply bad play execution are all reasons why a team might lose one game against an opponent yet still feel like they were the better team. For this reason the p-value needs to be more closely.

After thinking about the code, I decided that I would look into adding my own twist to the random walk experiment to see how that changed the rankings. I decided to change p-values based on the location of the game. The reason being is that better teams should win at home, and therefore a road win is more valuable than a win on a neutral playing field. I say this because when looking at college football teams at home, they should have an outstanding advantage against their opponent. The elite in college football call home to stadiums that seat a hundred thousand people who are constantly rooting for and cheering for the home team. They also do not have to travel to play, and therefore have the luxury of being at home. Sleeping in their own beds instead of sleeping in a hotel with teammates, eating quality food instead of being forced to eat what was at the hotel buffet, and simply the familiarity with the area can all bring an ease to the player before a big game. For a team to overcome this disadvantage is quite an accomplishment, and therefore I looked to reflect that in my p-value calculation.

In order to show this in my code, I decided to create 3 p values. One would be for home games, one would be for neutral games, and one would be for away games. I decided on having the p value at a neutral field remain the same, while I increased the p value for away games and decreased the value for home games. I decided to change the p value half the distance from either endpoint of my interval, so the p values was listed as

Code:

pneut = .75

phome=(pneut+.5)/2

paway=(pneut+1)/2

Next, I needed to record the location of each game and include that into my data. I modified my original excel spreadsheet to include locations for the game. I followed the exact steps I did with the outcome and opponent matrices to create a home matrix that would represent if the team was home, away, or a neutral location. A 0 would represent a neutral location, -1 for a away game, and 1 for a home game. Each row represented a certain team out of the 195 possible teams, and each column represented a given week over the stadium. So if location [54,8] = 0, that meant team 54 played a game at a neutral location on week 8.

When I performed my random walk, I added an extra condition in my for loop to determine the location of the game as well as the p value. As I started my function, I called on the matrix to tell me the numerical identifier of the location of the game. Then I called on the outcome matrix to tell me who won, and then called on the random generated number to tell me if I moved to my opponent’s location or not. The R code looks as following...

Code:

for(i in 1:steps){ #this is for the entire random walk. Steps represents the 10 million steps.

whichopp<-sample(1:g[pos],1) #this is saying I wish to select one opponent from the vector of opponents.

winlose<-outcomematrix[pos,whichopp] #winlose represents the outcome of the game between the teams

homeoraway<-homematrix[pos,whichopp] #this represents the location of the game

u<-runif(1,0,1) #this is the random generated number

if(homeoraway==0){ #neutral playing field game

if(winlose==1){ #and the team the walker is currently at won

if(u

pos<-pos} #...the walker stays where he is.

if(u>pneut){ #if not…

pos<-opponentmatrix[pos,whichopp]}} #...he leaves for the other opponent.

if(winlose==-1){ #if the team the walker is currently at lost

if(u>(1-pneut)){ #then the inverse of the p value is used. If the random number is greater…

pos<-opponentmatrix[pos,whichopp]} #...the walker leaves.

if(u<(1-pneut)){ #if not…

pos<-pos}}} #...he stays.

I will repeat this process for home and away games, and at the end of the function increase both the count for the location as well as the step. That way the walker knows he has one less step to take, and the team’s ranking gets increased by a vote. Once I completed the random walk, I generated the top 25 teams and compared them to the original random walk as well as the BCS’s rankings. The results can be found in the appendix.

We run into the same problem regarding James Madison Univeristy, mainly because they won on the road against a very good opponent. The increased p-value for a road win makes it that much tougher for the walker to leave their location and move on to Virginia Tech. My code seems to be a mix of both the original code and the BCS formula. There are a view teams that jump high or low in the rankings, notably Nevada jumping up 7 places while Stanford dropping 7 places. However, I am satisfied with my results. The results show that the top 25 teams are consistent with both the BCS and the original system, only that the ordering of them is different. This is a good fact, because it means no surprise teams “snuck” its way into my rankings.

According to my system, the national championship game would be played between TCU and Auburn. The other 34 bowl games would then be seeded according to their specific rules, such as the Rose Bowl being decided by the Big Ten conference champion and the Pac-10 conference champion. Therefore it is my assumption that Oregon would then be part of the Rose Bowl, and would probably play Wisconsin (this is because Oregon and Wisconsin won their respective conferences and were not included in the national championship game). The rest of the bowl games would be decided in this fashion.

Going forward, I believe the original system created by Callaghan, Mucha, and Porter could be altered like I have done. My idea for changing the p value for location was a start, but it was not perfect. For example, if Penn State were to play UCLA at Villanova, it would technically be a neutral game but might give Penn State a slight advantage because they have to travel a shorter distance and would not have to change time zones. One change I can see that would be fairly easy to make is to factor in margin of victory. A win of 50 points should not be equal to a win of 1 point. Perhaps increasing the p-value based on the margin in which the game was won might be considered. Another idea might to take into consideration how the teams are currently playing at the end of the season. A hot streak to end the season might reflect that a team that began with a slow start but were able to turn it around and win the rest of the games might in fact be the best team. However, this begins a debate over what the national championship game means. Should it reflect the teams with the best overall seasons, or the teams who are playing the best football at the end of the season? Hypothetical questions that have no correct answer makes ranking teams that much harder.

Regardless of how the teams are ranked, someone will always be unhappy. Some teams will argue that one system favors one team instead of another, and call for a new system to take its place. I created a system that constructed a random walk with 10 million steps to determine who would be ranked in the top 25. Perhaps my system will be added onto the BCS formula, or perhaps it will take over the BCS system entirely. However, I believe I created a fairly simple, yet complex, system that ranked teams based on wins and strength of schedule. It does have flaws, such as being biased to teams that play more games, however I believe it shows how powerful and creative a Random Walk can be.

Appendix

AP	USA	BCS
Auburn	Auburn	Auburn
TCU	Oregon	Oregon
Oregon	TCU	TCU
Stanford	Wisconsin	Stanford
Ohio State	Stanford	Wisconsin
Oklahoma	Ohio St.	Ohio State
Wisconsin	Michigan St.	Oklahoma
LSU	Arkansas	Arkansas
Boise State	Oklahoma	Michigan State
Alabama	Boise St.	Boise State
Nevada	LSU	LSU
Arkansas	Virginia Tech	Missouri
Oklahoma State	Nevada	Virginia Tech
Michigan State	Missouri	Oklahoma State
Mississippi State	Alabama	Nevada
Virginia Tech	Oklahoma St.	Alabama
Florida State	Nebraska	Texas A&M
Missouri	Texas A&M	Nebraska
Texas A&M	South Carolina	Utah
Nebraska	Utah	South Carolina
UCF	Mississippi St.	Mississippi State
South Carolina	West Virginia	West Virginia
Maryland	Florida St.	Florida State
Tulsa	Hawaii	Hawaii
North Carolina State	Connecticut	Central Florida

Appendix

Random Walk

Random Walk

Random Walk

Random Walk

Random Walk

Auburn

Auburn

Auburn

Auburn

Auburn

TCU

TCU

TCU

TCU

TCU

Oregon

Oregon

Oregon

Oregon

Oregon

Ohio State

LSU

Stanford

Stanford

Stanford

LSU

Stanford

LSU

LSU

Oklahoma

Arkansas

Ohio St

Oklahoma

Oklahoma

LSU

Stanford

Arkansas

Ohio St

Arkansas

Arkansas

Nevada

Oklahoma

Arkansas

Ohio St

Boise St

Boise State

Nevada

Boise State

Boise St

Ohio St

Oklahoma

Boise State

Alabama

Alabama

Alabama

Alabama

Alabama

Nevada

Nevada

Nevada

Wisconsin

Wisconsin

Wisconsin

Oklahoma St

Oklahoma St

Michigan State*

Oklahoma St

Oklahoma State

Wisconsin

Virginia Tech

Oklahoma St

Michigan St

Missouri

Virginia Tech

Missouri

Missouri

Missouri

Michigan St

Missouri

Wisconsin

South Carolina

Virginia Tech

Virginia Tech

Michigan St

Michigan St

Virginia Tech

South Carolina

Texas A&M

Texas A&M

Nebraska

Texas A&M

Texas A&M

South Carolina

Nebraska

Texas A&M

Nebraska

Nebraska

Nebraska

South Carolina

South Carolina

Florida State

Florida St

Florida St

Florida St

Florida St

Mississippi State

Mississippi St

Utah

Utah

Utah

Hawaii

Utah*

Mississippi St

Mississippi

Mississippi St

Iowa

Iowa

Iowa

Iowa

North Carolina St

Utah

Hawaii

Hawaii

North Carolina State

Iowa

North Carolina State

North Carolina St

North Carolina St

Hawaii

Hawaii

p=.95

p=.90

p=.85

p=.80

p=.75

10,000,000 steps

10,000,000 Steps

10,000,000 steps

10,000,000 steps

10,000,000 steps

Appendix

Random Walk

Random Walk

Random Walk

Random Walk

Random Walk

Auburn

Auburn

Auburn

Auburn

Central Florida

Oregon

TCU

Oklahoma

Oklahoma

Nevada

TCU

Oklahoma

Oregon

Nevada

Nebraska

Stanford

Oregon

Stanford

Virginia Tech

Hawaii

Oklahoma

Stanford

TCU

Stanford

Northern Illinois

LSU

LSU

LSU

Oregon

Auburn

Arkansas

Arkansas

Virginia Tech

TCU

Miami (Ohio)

Ohio St

Nevada

Nevada

Florida St

Oklahoma

Boise St

Alabama

Arkansas

LSU

Southern Methodist

Nevada

Ohio St

Alabama

Nebraska

South Carolina

Alabama

Virginia Tech

Ohio St

South Carolina

Virginia Tech

Virginia Tech

Boise St

Boise St

Boise State

Florida St

Oklahoma St

Oklahoma St

Oklahoma St

Alabama

Penn State

Missouri

Missouri

Florida St

Missouri

Toledo

Wisconsin

Wisconsin

Missouri

Arkansas

Ohio

Michigan St

Nebraska

Nebraska

Oklahoma St

Boise State

Nebraska

Michigan St

South Carolina

Ohio St

Iowa

Texas A&M

Texas A&M

Michigan St

Hawaii

Wisconsin

Florida St

Florida St

Texas A&M

Wisconsin

Southern Mississippi

South Carolina

South Carolina

Wisconsin

Michigan St

Idaho

Utah

Utah

Mississippi St

Texas A&M

Texas Tech

Mississippi St

Mississippi St

Utah

Utah

Michigan St

North Carolina St

Hawaii

Hawaii

Central Florida

Ohio St

Hawaii

North Carolina St

North Carolina St

USC

Michigan

USC

USC

USC

Mississippi St

Georgia

p=.70

p=.65

p=.60

p=.55

p=.50

10,000,000 steps

10,000,000 steps

10,000,000 steps

10,000,000 steps

10,000,000 steps

Appendix

Team

Wins

Losses

Auburn

14

0

TCU

13

0

Oregon

12

1

Stanford

12

1

Ohio State

12

1

Oklahoma

12

2

Wisconsin

11

2

LSU

11

2

Boise State

12

1

Alabama

10

3

Nevada

13

1

Arkansas

10

3

Oklahoma State

11

2

Michigan State

11

2

Mississippi State

9

4

Virginia Tech

11

3

Florida State

10

4

Missouri

10

3

Texas A&M

9

4

Nebraska

10

4

UCF

11

3

South Carolina

9

5

Maryland

9

4

Tulsa

10

3

North Carolina State

9

4

Appendix

Random Walk

My System

BCS

Auburn

Auburn

Auburn

TCU

TCU

Oregon

Oregon

Oregon

TCU

Stanford

Ohio St

Stanford

Oklahoma

LSU

Wisconsin

LSU

Arkansas

Ohio State

Arkansas

Stanford

Oklahoma

Boise St

Nevada

Arkansas

Ohio St

Boise State

Michigan State

Alabama

Oklahoma

Boise State

Nevada

Alabama

LSU

Oklahoma St

Wisconsin

Missouri

Virginia Tech

James Madison*

Virginia Tech

Missouri

Michigan St

Oklahoma State

Wisconsin

Oklahoma St

Nevada

Michigan St

Missouri

Alabama

Nebraska

South Carolina

Texas A&M

Texas A&M

Virginia Tech

Nebraska

South Carolina

Texas A&M

Utah

Florida St

Nebraska

South Carolina

Utah

Florida St

Mississippi State

Mississippi St

Mississippi St

West Virginia

North Carolina St

Hawaii

Florida State

Iowa

Iowa

Hawaii

Hawaii

Utah

Central Florida

p=.75

pneut = .75

10,000,000 steps

10,000,000 Steps

Works Cited

Dufresne, Chris. "The BCS and Barack Obama ... | The Fabulous Forum | Los Angeles Times." Top of the Ticket | Sunday Shows: Ryan, Bachmann, Rubio, Van Hollen | Los Angeles Times. LA Times, 4 Nov. 2008. Web. 01 May 2011. .

Hootens, Staff. "College Football Facts & Figures: Attendance, Viewership Keep Going up - Hootens.com." Hootens.com. Hootens, 23 Mar. 2011. Web. 01 May 2011. .

Howell, James. "James Howell's College Football Scores." Wisc.edu. James Howell, 1999. Web. 01 May 2011. .

Mucha, Peter, Thomas Callaghan, and Mason Porter. "The Mathematical Association Of America." Science 114 (2007): 761-77. Web. .

O'Toole, Thomas. "$17M BCS Payouts Sound Great, but ... - USATODAY.com." News, Travel, Weather, Entertainment, Sports, Technology, U.S. & World - USATODAY.com. USA Today, 06 Dec. 2006. Web. 01 May 2011. .

Download 218.61 Kb.

Share with your friends: