Thursday, October 25, 2012

Big 12 Spreadsheet FAQ

As of the time of this post, its title is a lie.  The Big 12 Spreadsheet is only a few days old.  Nobody has asked any questions about the Big 12 Spreadsheet.  The only people who know about the spreadsheet wish I would stop talking about it.  So I'm going to just guess what sorts of questions one might have about the Big 12 Spreadsheet, answer those, and add any other questions which are asked frequently as we traipse merrily into the future.

Q. What is the Big 12 Spreadsheet?

A. The Big 12 Spreadsheet is literally a spreadsheet, created by me in Microsoft Excel, into which I input game statistics of Big 12 football teams and out of which I pull some data from low-level analysis and charts.  I am hopeful that it will prove a useful tool for comparing the teams, analyzing matchups, and possibly even predicting game winners and scores.  That latter in particular is probably excessively optimistic.  At the very least it is a database of certain specific statistics which I cull primarily from the drive logs over on ESPN.

Q. What is the basis for this analysis?

A. As the saying goes, the final score is the only stat that matters.  What I am trying to do is distill the essense of how teams get to their final scores.  My thesis is that, disregarding almost everything else, the story of how teams get those scores is told by the drives.  My hypothesis is that if you can analyze how an offense drives the ball, and how a defense tries to stop a drive, you can see how a game is likely to play out.

Q. What stats are you specifically collecting?

A. In keeping with the above thesis, I am pulling information about each teams' successive drives throughout their Big 12 conference play.  Basically I am taking each teams' score and breaking down how they got those scores, whether offensively, defensively, or on special teams.  I further break down the offensive scoring by the length of the drive, and whether or not the drive was initiated by a turnover.

Q. So what are the final inputs that go into the spreadsheet?

A. I have 14 input cells per team per game:
  • 1-3. Offensive drives starting inside offense 20, starting between offense 20 - defense 35, starting inside defense 35.
  • 4-6. Offensive scoring drives starting inside offense 20, starting between offense 20 - defense 35, starting inside defense 35.
  • 7-9. Offensive points off scoring drives starting inside offense 20, starting between offense 20 - defense 35, starting inside defense 35.
  • 10-12. Offensive points of scoring drives starting after turnovers inside offense 20, starting between offense 20 - defense 35, starting inside defense 35.
  • 13. Defensive scoring.
  • 14. Special teams scoring.
Q. Why did you pick the offense 20 and defense 35 yard lines as being important?

A. As much as I could, I tried to stick with objective data.  This was a subjective thing I couldn't get rid of.  I picked the offense 20 yard line as the point to which balls are bought after touchbacks on punts and turnover recoveries.  That seemed as reasonable a place as any to divide drives of 'normal' length and drives of unusually 'long' length.  I debated whether or not to go with the 25 due to the new kickoff touchback rule, but decided against it.  I'm still not sold on that rule anyway, and pinning someone inside the 20 was and is viewed as a coup for punters.

The defense 35 yard line was a bit harder.  I waffled for awhile, and finally decided it was a useful point to divide drives of 'normal' length and drives of unusually 'short' length.  If the offense were to get the ball at the 35 yard line and get absolutely stuffed at the line of scrimmage by the defense on three straight plays, they're looking at a long but within generally accepted field goal range - about 52-ish yards.  It wouldn't be fair to blame a defense for yielding a scoring drive if the opponent started more or less within field goal range to begin with.  If you take the 40 yard line instead of the 35, you're looking at a 57 yard field goal, which is certainly doable for many college kickers but is starting to push the outer edges of their range.  If you take the 30 instead, that gives a field goal of less than 50 yards, which ought to be well within the range of someone getting a scholarship to put points up on the board three at a time.

Q. Are there any other subjective things that are reflected in the spreadsheet?

A. There are probably a bunch of subjective things, including exactly how I decided to crunch the numbers, and what data I picked to analyse to begin with.  But the other subjective thing that comes immediately to mind  with respect to the data itself is whether or not non-scoring drives at the end of a half get counted towards a team's total.  For example, suppose a team is up by 30 points or so and gets the ball with 3 minutes to play.  They're not necessarily driving to score, they're driving to get a couple of first downs and run out the clock.  If the team that is down at the end of the half has the ball, that drive would seem to demand counting, but what if that team takes over with less than 20 seconds to play - is it quite fair to drop their fraction of scoring drives in that case by including this non-scoring drive?  So there is some subjectivity when it comes to which end-of-half drives get counted, and which ones don't.  I tend to err on the side of counting the drive if the team that is down has the ball, and not counting the drive if the converse holds true; but I also try to take into account how long the drive lasted, how much yardage was picked up, and even what sort of plays the offense was running as an attempt to figure out just what the offense was trying to do.

Q. What do you do with the data?

A. Basically I use this data to do as much separating of offensive and defensive performance as possible.  If the defense scored on a fumble recovery, for instance, I don't credit that score to the offense when assessing their performance.  If the offense drove inside the opponent 20 and then turned the ball over on the 5 yard line, I have no problem blaming the defense for the ensuing touchdown after the opponent's 95 yard drive.  If the offense turned the ball over on their own 5, and the defense then allowed a 5 yard scoring 'drive,' I can filter that out when analyzing the defense.  That's the reason I separated out the scoring drives in the first place.

Q. What results seem particularly illuminating?

A. The simple fractions of scoring drives and average points per scoring drive are interesting measures of offensive performance, and the converse for the defense (fractions of opponent drives stopped, and opponent points per scoring drive).  I also went through and had the spreadsheet calculate how many more average scoring drives a team would have needed to overcome the score differential in the case of a loss (or conversely, how many opponent scoring drives would needed to have been stopped) and add those to the team's total actual scoring drives as a measure of just how much better the offense would have needed to be to be undefeated.  Additionally, the spreadsheet calculated the minimum number of offensive scoring drives a team would have needed to win all their games - a metric that gives some useful information, but one I just don't like as much since it seems to be analogous to allowing an English major to take grade points from a literature class and apply them to an organic chemistry class.  Oklahoma, for instance, has scored more than enough points to win all of its conference games.  But a lot of those points came from OU kicking Texas while the Longhorns were down; you can't take those 'extra' points and apply them to the Kansas State game.

Beyond that, I tried to filter out scoring not directly attributable to the offense (i.e. punt return for touchdown), and opponent scoring that the defense couldn't be blamed for (i.e. turnover returned for touchdown) and re-assess the offensive and defensive performances in the light of the scoring changes after such filtration.

Q. Could you do this for other conferences?

A. Theoretically, but it wouldn't be as easy (hah).  At this particular point in history this project is easier than it has ever been for the Big 12, since every conference member plays every other conference member, so a master template can be created into which game data can be input for one team's offensive analysis and concomittantly linked to the opponent's defensive analysis for that game.  Creating such a spreadsheet for the PAC-12, B1G, SEC, ACC, or other similarly large conference would be complicated by the divisional structure of those conferences.  It could be done, but there would have be some redundancy in the creation of a master template to account for year-to-year scheduling fluctuations, and the variation between the teams' cross-divisional opponents might limit the extrapolation of such data (I say might; I have no idea how useful this data will be for the Big 12 itself, yet).  For that (to paraphrase Voltaire) agglomeration which was called and which still calls itself the Big East football conference, but is neither Big nor East nor a football conference, I suppose it could be done as easily; likewise for the Mountain West, Sun Belt, and current incarnation of the WAC.

Q. YOUR QUESTION HERE!

No comments:

Post a Comment