Profiling 2019 NFL Offenses with nflscrapR Data and Clustering
Project Overview:
I used nflscrapR 2019 season data to organize teams' playcalling in normalized relative proportions, then used a clustering algorithm to categorize each team into one of five groups.
Analysis and Report by Kevin Kraege found at @kevgk2 on Twitter
Cluster numbers were generated arbitrarily, and not according to specific order of clustering.
It may seem odd to refer to teams by their cities instead of team names, but I tried to remain consistent between my analysis and how the data is charted and graphed.
For this project, I tend to gloss over the details of the mathematics and methods used, such as principal components, agglomerative clustering, silhouette method, or EPA. I advise you to research these yourself to find someone more qualified to explain them. Contact me on twitter if you wish for some help doing so.
Data Specifications:
Each play recorded fit into one of 12 groups. These groups are combinations of: three types of runs (end, guard, tackle), three types of passes (deep, short , middle), and two types of formations (shotgun, under center). I understand that NFL plays have much deeper categorizations than that, and the intention of the play may not be the recorded result, but for data limitations and simplicity, this is what I used.
The first reason I only used these 12 specifications is that was the data available from nflscrapR, the NFL database that records the outcomes of every play from each season. It does not record the personnel groupings on the field, route specifications, formations, or even specify between pistol and shotgun. It does specify a left or right outcome for each play, but I decided that was unnecessary to specify in analyzing teams, as the direction of the play likely had more to do with defensive formation, injuries, match-ups, etc and was not worth splitting plays into 24 groups instead of 12. This was an arbitrary decision though, and I risked potentially worse analysis by simplifying the data.
Short and deep are mutually exclusive groups, but neither are mutually exclusive to middle in the unedited nflscrapR data. I instead attributed middle to every pass that was recorded as middle in the nflscrapR data regardless of being deep or short, and attributed deep or short to every respective pass that was not middle. This created the three mutually exclusive pass types.
There is some merit, since the total number of short passes considered is 14,483, and only 3,489 of those were considered middle too, roughly 24%. Likewise with deep passes, 3,296 were totaled, and only 712 were middle passes too, roughly 22%. I felt making this distinction was worth adding a third pass type, and including all middle passes separate from depth. There can be strategic merit too, since passes in the middle come from route concepts like slants, posts, seams, ins, crossers, especially from the slot. Passes not in the middle come from outs, flats, corners, gos, comebacks, especially from the boundary. This distinction seems consistent enough with route distinctions to be used.
A very important caveat for this analysis, is it does not include sacks. This is an issue from the data because we cannot know where the QB would have tried to throw the ball. This is also the case for fumbles when the QB was taking a drop-back. This is going to bias the quantity of passes for the depth of intended target from the playcalling perspective since the QB is more likely to be sacked the longer he waits to throw. Thankfully incomplete passes and interceptions are included, as well as the corresponding EPA, because the location of the pass attempt was recorded.
The data was filtered through a couple of situations where teams simplify their relevant strategies, particularly situations with high or low chances of winning. In the former case, teams tend to run the ball, more often under center, and run inside more for a lower chance of negative yards. In the latter situation, teams tend to pass more, out of shotgun, and aim deeper. Ironically, including this data likely converges playcalling tendencies since teams that run the ball often tend to start losing and passing to compensate, while teams that pass more tend to do the opposite. Instead I wanted to find the differences between teams in neutral situations and highlight that.
The constraints are:
No play was included where one of the teams had a calculated win probability above 95% (or below 5%)
I realize this seems less limiting that what others decide to use, but it is a somewhat arbitrary decision. I find something like 80% win probability starts excluding plays where teams are up by two scores in the early second quarter, a situation I do not believe significantly changes play calling tendencies enough to exclude.
No plays with a score differential greater than or equal to 30 points
I cannot think of a situation where this wouldn't be completely redundant to the previous constraint, but this is here for security in case it weren't.
No plays with a score differential greater than or equal to 14 with less than a quarter remaining
This is a situation likely redundant to the first constraint as well, but if not, it is another situation I believe impacts playcalling enough to be worth excluding.
No plays on 3rd or 4th down with over 3 yards to go
Any play on 3rd or 4th down with more than 3 yards is likely to be a pass, and likely to be considered a 'short' one since the distinction in nflscrapR is any pass under 15 yards. Even if the yards to go is greater than 15, often teams take checkdowns to set up a field goal or lower risk of turnover. This is another situation where I am not trying to find teams that often get stuck in 3rd or 4th and long. Even 3 yards may be too biased in favor of passes, but this is an arbitrary decision and not likely to be conclusive.
Clustering:
I used an Agglomerative Hierarchical Clustering algorithm, which tries identifies how similar each team is to each other, and creates clusters with the closest teams until one entire cluster is created. The advantage of this is it allows anybody to decide how many clusters they prefer.
The above graph is a dendrograph, which allows someone to make horizontal slices to separate teams into clusters. It may seem that certain teams are close together, especially at the bottom, but this is not necessarily the case. Teams that are close together but not at the complete bottom actually seperate earlier than those farther down. While it may seem Indianapolis and Philadelphia are close, as is New England and New Orleans, these would actually split sooner than most other ends of the dendrograph.
The first cut from the top (about 2.5) separates a cluster that includes Baltimore, Arizona, and Kansas City from every other team. The next cut (2.0) creates a cluster that only includes Minnesota from the previous two clusters. A third cut (about 1.6) separates the largest cluster into two: one from Atlanta to New Orleans, the other from Buffalo to Philadelphia. A fourth cut (1.4) separates Baltimore from Arizona and Kansas city, bringing us to the five clusters I decided to investigate.
How many clusters would be ideal? The obvious endpoints are either one large cluster, or 32 separate clusters, but neither lead to interesting results. One tool is a silhouette method for determining how effectively the objects are clustered, and the consistency within the clusters. Here a graph shows the optimal number of clusters according to the method used is two: those determined by the first cut along the dendrograph above. This does not seem interesting enough to report, so I chose five clusters instead, which is the second most optimal number according to the silhouette method.
A common way to visualize clusters is a simple scatterplot. A problem though is how to graph a scatterplot with data of twelve dimensions. Instead, the axes are replaced with Principal Components, which find the directions to project the data onto in two dimensions that lead to the scatterplot with the most spread. This still leaves it difficult to understand the difference between teams' formational and play calling tendencies, but makes it significantly easier by allowing us to visualize it at all. It will always be difficult, and that is why clustering is used to begin with.
Here it can easily be seen that Minnesota and Baltimore are quite a distance away from any other team, while Arizona and Kansas City sit together away from the two central clusters. It is important to remember that this data is in twelve dimensions, not three, so while Miami appears to be in the left central cluster (4), it is actually just sitting behind/in front of the cluster depending on orientation. It is also important to remember that distances cannot be interpreted well with this graph in general. Green Bay separates from a cluster of Cincinnati, Seattle, Philadelphia, Indianapolis, and Seattle on the dendrograph (bottom right clusters) much later than Jacksonville, where they appear closer to on the scatterplot (bottom corner of the leftmost central cluster), according to this simplified perspective that does not sit along a particular axis.
Interpreting Results:
Here you can see the play types used and the league-wide Expected Points Added per play.
This is a chart of the playcalls normalized -- what percentile they were in.
Here are the same plots, colored by team color.
I want to thank @benbbaldwin, @friscojosh, @Stat_Ron, and especially @LeeSharpeNFL for their wonderful work with nflscrapR and awesome guides!
DVOA and Pythagorean Wins data from footballoutsiders.com, game data from nflscrapR package in R
https://gist.github.com/guga31bb/5634562c5a2a7b1e9961ac9b6c568701
https://github.com/leesharpe/nfldata/blob/master/RSTUDIO-INTRO.md
https://github.com/leesharpe/nfldata/blob/master/WPCHARTS.md
https://github.com/leesharpe/nfldata/blob/master/COLORS_IN_R.md
https://github.com/maksimhorowitz/nflscrapR/blob/master/R/ep_wp_calculator.R
https://github.com/maksimhorowitz/nflscrapR
https://www.dropbox.com/s/5k2wbnroyn8i1ux/Getting%20Started%20With%20R%20for%20NFL.pdf?dl=0
Analysis and Report by Kevin Kraege found at @kevgk2 on Twitter
I used nflscrapR 2019 season data to organize teams' playcalling in normalized relative proportions, then used a clustering algorithm to categorize each team into one of five groups.
Analysis and Report by Kevin Kraege found at @kevgk2 on Twitter
Cluster numbers were generated arbitrarily, and not according to specific order of clustering.
It may seem odd to refer to teams by their cities instead of team names, but I tried to remain consistent between my analysis and how the data is charted and graphed.
For this project, I tend to gloss over the details of the mathematics and methods used, such as principal components, agglomerative clustering, silhouette method, or EPA. I advise you to research these yourself to find someone more qualified to explain them. Contact me on twitter if you wish for some help doing so.
Data Specifications:
Each play recorded fit into one of 12 groups. These groups are combinations of: three types of runs (end, guard, tackle), three types of passes (deep, short , middle), and two types of formations (shotgun, under center). I understand that NFL plays have much deeper categorizations than that, and the intention of the play may not be the recorded result, but for data limitations and simplicity, this is what I used.
The first reason I only used these 12 specifications is that was the data available from nflscrapR, the NFL database that records the outcomes of every play from each season. It does not record the personnel groupings on the field, route specifications, formations, or even specify between pistol and shotgun. It does specify a left or right outcome for each play, but I decided that was unnecessary to specify in analyzing teams, as the direction of the play likely had more to do with defensive formation, injuries, match-ups, etc and was not worth splitting plays into 24 groups instead of 12. This was an arbitrary decision though, and I risked potentially worse analysis by simplifying the data.
Short and deep are mutually exclusive groups, but neither are mutually exclusive to middle in the unedited nflscrapR data. I instead attributed middle to every pass that was recorded as middle in the nflscrapR data regardless of being deep or short, and attributed deep or short to every respective pass that was not middle. This created the three mutually exclusive pass types.
There is some merit, since the total number of short passes considered is 14,483, and only 3,489 of those were considered middle too, roughly 24%. Likewise with deep passes, 3,296 were totaled, and only 712 were middle passes too, roughly 22%. I felt making this distinction was worth adding a third pass type, and including all middle passes separate from depth. There can be strategic merit too, since passes in the middle come from route concepts like slants, posts, seams, ins, crossers, especially from the slot. Passes not in the middle come from outs, flats, corners, gos, comebacks, especially from the boundary. This distinction seems consistent enough with route distinctions to be used.
A very important caveat for this analysis, is it does not include sacks. This is an issue from the data because we cannot know where the QB would have tried to throw the ball. This is also the case for fumbles when the QB was taking a drop-back. This is going to bias the quantity of passes for the depth of intended target from the playcalling perspective since the QB is more likely to be sacked the longer he waits to throw. Thankfully incomplete passes and interceptions are included, as well as the corresponding EPA, because the location of the pass attempt was recorded.
The data was filtered through a couple of situations where teams simplify their relevant strategies, particularly situations with high or low chances of winning. In the former case, teams tend to run the ball, more often under center, and run inside more for a lower chance of negative yards. In the latter situation, teams tend to pass more, out of shotgun, and aim deeper. Ironically, including this data likely converges playcalling tendencies since teams that run the ball often tend to start losing and passing to compensate, while teams that pass more tend to do the opposite. Instead I wanted to find the differences between teams in neutral situations and highlight that.
The constraints are:
No play was included where one of the teams had a calculated win probability above 95% (or below 5%)
I realize this seems less limiting that what others decide to use, but it is a somewhat arbitrary decision. I find something like 80% win probability starts excluding plays where teams are up by two scores in the early second quarter, a situation I do not believe significantly changes play calling tendencies enough to exclude.
No plays with a score differential greater than or equal to 30 points
I cannot think of a situation where this wouldn't be completely redundant to the previous constraint, but this is here for security in case it weren't.
No plays with a score differential greater than or equal to 14 with less than a quarter remaining
This is a situation likely redundant to the first constraint as well, but if not, it is another situation I believe impacts playcalling enough to be worth excluding.
No plays on 3rd or 4th down with over 3 yards to go
Any play on 3rd or 4th down with more than 3 yards is likely to be a pass, and likely to be considered a 'short' one since the distinction in nflscrapR is any pass under 15 yards. Even if the yards to go is greater than 15, often teams take checkdowns to set up a field goal or lower risk of turnover. This is another situation where I am not trying to find teams that often get stuck in 3rd or 4th and long. Even 3 yards may be too biased in favor of passes, but this is an arbitrary decision and not likely to be conclusive.
Clustering:
I used an Agglomerative Hierarchical Clustering algorithm, which tries identifies how similar each team is to each other, and creates clusters with the closest teams until one entire cluster is created. The advantage of this is it allows anybody to decide how many clusters they prefer.
The above graph is a dendrograph, which allows someone to make horizontal slices to separate teams into clusters. It may seem that certain teams are close together, especially at the bottom, but this is not necessarily the case. Teams that are close together but not at the complete bottom actually seperate earlier than those farther down. While it may seem Indianapolis and Philadelphia are close, as is New England and New Orleans, these would actually split sooner than most other ends of the dendrograph.
The first cut from the top (about 2.5) separates a cluster that includes Baltimore, Arizona, and Kansas City from every other team. The next cut (2.0) creates a cluster that only includes Minnesota from the previous two clusters. A third cut (about 1.6) separates the largest cluster into two: one from Atlanta to New Orleans, the other from Buffalo to Philadelphia. A fourth cut (1.4) separates Baltimore from Arizona and Kansas city, bringing us to the five clusters I decided to investigate.
How many clusters would be ideal? The obvious endpoints are either one large cluster, or 32 separate clusters, but neither lead to interesting results. One tool is a silhouette method for determining how effectively the objects are clustered, and the consistency within the clusters. Here a graph shows the optimal number of clusters according to the method used is two: those determined by the first cut along the dendrograph above. This does not seem interesting enough to report, so I chose five clusters instead, which is the second most optimal number according to the silhouette method.
Here it can easily be seen that Minnesota and Baltimore are quite a distance away from any other team, while Arizona and Kansas City sit together away from the two central clusters. It is important to remember that this data is in twelve dimensions, not three, so while Miami appears to be in the left central cluster (4), it is actually just sitting behind/in front of the cluster depending on orientation. It is also important to remember that distances cannot be interpreted well with this graph in general. Green Bay separates from a cluster of Cincinnati, Seattle, Philadelphia, Indianapolis, and Seattle on the dendrograph (bottom right clusters) much later than Jacksonville, where they appear closer to on the scatterplot (bottom corner of the leftmost central cluster), according to this simplified perspective that does not sit along a particular axis.
Interpreting Results:
Here you can see the play types used and the league-wide Expected Points Added per play.
The highest EPA to pass out of shotgun is in the middle of the field. The highest EPA to pass under center is deep, very likely driven by play-action to move linebackers and safeties. In fact, passing under center is more effective per depth category that out of the shotgun, likely from play action as well. Running out of under center is higher EPA for ends (barely), and tackle runs, but lower for between the guards. It could be that between the guard runs have higher EPA out of shotgun driven by RPOs and draw plays.
Next are several charts showing the tendencies per cluster and EPA per play (grouped by teams before averaging unless specified).
The top chart is the proportions between playcalls grouped by cluster.
The second chart is the proportions of plays for each cluster categorized, along with some descriptive stats for their season effectiveness (both EPA averaged before grouping clusters and after).
The third chart is the EPA for each cluster split between play type.
Here is a chart that shows teams' proportions of playcalls, including grouped by formation and play type
This is a chart of the playcalls normalized -- what percentile they were in.
Here is the EPA for each playcall including grouped by formation and play type.
First Cluster Cut:
The first cut yielded a cluster with Arizona, Kansas City, and Baltimore. This should immediately create some suspicions for those who are wondering why the teams that passed the most in the league are grouped so closely with the team that ran the most in the league. The answer lies in formational use. This did not weight the formation any lower than play type differences. Baltimore was in the shotgun formation an astounding 95% of neutral situation plays. The closest teams of shotgun usage was Arizona at 89% and five teams in the 70-79% range, including Kansas City. This is a stark contrast from Minnesota, who played almost entirely under center, yet both ran more than anybody else in the league (except Seattle). Baltimore was first in shotgun rushes between the guards or tackles, and tied for 3rd in shotgun end runs. Arizona led the league in shotgun end rush percentage, ironically since they lead the league in passing percentage. Baltimore was 3rd in the league in shotgun passes in the middle, yet among the average of the league in shotgun passing not in the middle, while Kansas City and Arizona were among average in shotgun passes in the middle, yet were first and second respectively in shotgun passes outside. This cluster is mostly driven by shotgun formation, and thus Baltimore is severed from this cluster by the third cut.
Second Cluster Cut:
This helps explain why Minnesota was the first team removed from any cluster: they barely played any shotgun in neutral game scenarios. 84% of their plays were under center, and 46% of their plays were under center runs specifically. The second highest usage of under center was the Los Angeles Rams, at 67%, while almost every other team was in the 30-60% range. They rushed 48% of the time, tied with Seattle, second only to Baltimore as most in the league. In their relative playcalling, they lead the league in every type of rushing under center, and passes across the middle under center as well. They were either least or second least in any play type from shotgun formation. Since they rarely did anything unconventional with their play design like Baltimore, they have strong evidence to be second most unique offense in the league by virtue of playcalling tendencies. Arizona and Kansas city likely still beat them out with play design uniqueness, but it would be hard to keep Minnesota out of the top 5 of unique offenses from 2019 at least.
Third Cluster Cut:
Not much needs to be said here that wasn't already, Baltimore separates from the Kansas City, Arizona cluster, driven by how the team rushing most in the league would be different enough than the teams that lead the league in passing percentage.
Fourth Cluster Cut:
We finally have two remaining clusters:
Cluster 2: Atlanta, Los Angeles Rams, Denver, Detroit, Oakland, San Francisco, Tennessee, Tampa Bay, Dallas, Miami, Washington, New England, New Orleans
Cluster 4: Buffalo, Cleveland, New York Jets, Jacksonville, Chicago, Houston, Pittsburgh, Las Angeles Chargers, New York Giants, Carolina, Seattle, Cincinnati, Green Bay, Indianapolis, Philadelphia
What seems to be the primary separation between these is cluster 2 playing 45% shotgun and 55% under center, versus cluster 4 playing 64% shotgun and 36% under center. Despite this, they have the same split of passing and rushing percentages. Ironically, despite shotgun being considered the most common formation for passing, cluster 2, playing more under center, has a higher passing EPA for each play type, rushing EPA for each play type, average wins, EPA per play and EPA per play per team, and EPA for either plays under center or in shotgun.
Cluster 2's best teams were San Francisco, Tennessee, New England, and New Orleans, while cluster 4's best teams were Buffalo, Houston, Seattle, and Green Bay. To me, this is a discrepancy at the best teams in each cluster, in favor of cluster 2. Despite this discrepancy, the average wins per cluster was very similar suggesting that cluster 2's strategy is more of a boom or bust offensive style than cluster 4's. Cluster 2 playing more under center despite passing the same percentage, along with league-wide results showing passing under center being more effective than out of the shotgun, leads to the implication that play action, or at least passing out of run formations, is how the best passing teams in the league yield success.
The last visualization for this project is some spider plots or radar charts. These draw quite the ire from the analytics and data visualization crowd, since interpreting circular space is unintuitive, and makes it hard to compare values between different teams.
Despite this, they were fun to make and look cool, so the nerds need to relax.
Truth be told, trying to find intricacies between the individual values of individual playcalls seem frivolous when a lot of playcalling is also driven by opponents, game situations, injuries, etc. What I was interested in is the clustering to see which teams were similar, as well as the primary strategies for each team.
I would argue it can be just as difficult to reduce twelve dimensions of data for 32 teams in any other form. An advantage of the polar plotting is there is more space to see the categories teams are above the median at, and the categories they are below the median at have shrunk, and thus are not the focus of the graph.
Another advantage of the polar plotting is one can easily separate the graph into quarters and halves to better interpret categories that aren't mutually exclusive:
Below you can see that 12, 1, and 2 o'clock are shotgun rushes. 3, 4, and 5 o'clock are under center rushes. 6, 7, and 8 o'clock are shotgun passes. 9, 10, and 11 o'clock is passes under center.
12 through 5 o'clock is rush plays, and 6 through 11 o'clock is passing plays.
12 through 2 o'clock and 6 through 8 o'clock is shotgun formation. 3 through 5 o'clock and 9 through 11 o'clock is under center.
First are teams plots categorized by division, and colored by cluster.
DVOA and Pythagorean Wins data from footballoutsiders.com, game data from nflscrapR package in R
https://gist.github.com/guga31bb/5634562c5a2a7b1e9961ac9b6c568701
https://github.com/leesharpe/nfldata/blob/master/RSTUDIO-INTRO.md
https://github.com/leesharpe/nfldata/blob/master/WPCHARTS.md
https://github.com/leesharpe/nfldata/blob/master/COLORS_IN_R.md
https://github.com/maksimhorowitz/nflscrapR/blob/master/R/ep_wp_calculator.R
https://github.com/maksimhorowitz/nflscrapR
https://www.dropbox.com/s/5k2wbnroyn8i1ux/Getting%20Started%20With%20R%20for%20NFL.pdf?dl=0
Analysis and Report by Kevin Kraege found at @kevgk2 on Twitter
Comments
Post a Comment