We’re trying something a little new here at BBallBreakdown. In recent years, there has been a statistical revolution sweeping the NBA. The proliferation of new metrics and measurements can leave your head-spinning. As much as I, and people like me, enjoy nerding out to the minutiae of data analysis and arguing over the finer points of “replacement level” (don’t ask, but if you must, #AskBBALL), it can be more than a little daunting if one simply wants to learn a little more in order to become a smarter fan without needing to immerse oneself totally.
Some of the stats are actually quite simple, recombining familiar box score information in ways better reflecting players’ impact on a game or season. Others are just as complicated as one might imagine. Still, even for the most complex metric, a basic understanding of how it works (at a very abstract level) combined with the knowledge of what it does well, and what it might miss, will at least allow for following the conversation as it continues. And if you don’t like stats, no big deal, there are plenty of ways to enjoy the game, the numbers are just one.
To that end, the main goal of this feature is to help demystify, which brings us to our first question:
— K Denise Lewis (@loreesdaughter) July 22, 2015
Though we’re probably stuck with it for the time being (it is and will probably remain the Sloan Sports Analytics Conference for the foreseeable future), the terms “analytics” or even “advanced stats” aren’t particularly helpful. Many of the “advanced” stats rely much more on the logic of how things fit together than fancy mathematical technique, and some of the most powerful tools to increase your understanding of the game come from basic algebra.
For example, the three-pointer has never been more in vogue than it is today. In the NBA, the three-point line is about a mile-and-a-quarter from the hoop, so naturally the more shots from way out there, the lower the percentage. But when they go in, they are worth a point more. Factoring that extra point is important since the object of the game is to score more points, not make more baskets. So to that end, there is Effective Field Goal Percentage (commonly cited as eFG%) which accounts for that extra point by making each made three worth 1.5 times a “normal,” two-point, bucket. With that subtle change, you can compare a post player and a jump shooter. Factor in free throws (almost always a highly efficient way to score) and you have “True Shooting Percentage” (TS%).
I got away from your question a little bit there, but it’s not a fad, it’s just information. In the same way watching film is information, “analytics” are just a better set of tools to keep track of what you see on film and compare it to what happens in all the games you can’t watch. This point, that game tape is scouting reports is statistical analysis, is often lost when analytics are described in public. Most often this is because the people doing the defining are opposed to the endeavor in the first place. There is a, largely false, perception that metrics are coming for your spot, and that just isn’t the case. It isn’t “either/or”; it’s “yes/and.”
There were almost 200,000 shots taken in the NBA last season, nobody has the time to watch them all, and even if they did, nobody could accurately separate the good shooters from the bad after doing so because it’s simply too much information for the human brain to process at once. Part of why it seems like, as Denise said, “mumbo-jumbo” is the way it’s presented: full of jargon and terminology, and often short on answers for “what does this mean?” If I was to say, “eFG% is just a way to account for three-pointers in shooting percentage” you’d understand readily, even if I hadn’t shown the work of how to get there.
— jake (@jakeyner) July 22, 2015
Again, I wish we could reserve “advanced stats” for numbers which require more than algebra and excel to compute, but that’s a losing fight.
In terms of some of the easier ones to understand, I talked about eFG% and TS% above. Those stats are great in conjunction with “Usage Rate” (USG). Usage is simply a measure of how often a player shoots, goes to the line or turns the ball over. In other words, uses an offensive possession. This is important for two reasons. First, there is only one ball, only one player can shoot it every trip down the floor, and if I shoot, you can’t. This matters most when considering what a player will bring to a new team.
It was completely predictable that at least one of LeBron James, Kevin Love or Kyrie Irving would see a big drop in their scoring between 2013/14 and 2014/15, because their combined usage playing together was almost certainly going to drop, especially as Love and Irving transitioned from being the unquestioned lead dog to another member of LeBron’s team. There simply weren’t enough shots to keep feeding all three at the same levels, which is largely how it played out as Irving and especially Love saw declines in their usage rates from 2013/14 in 2014/15.
The average usage is right around 20 percent, with most primary scorers hitting the high 20s and low 30s while only a select few ever reach above 35 percent. Russell Westbrook’s ridiculous offensive load saw him carry a usage of 38.4 percent last season, second highest since the introduction of the three pointer. This hints at the second reason usage is important, as it tells us a lot about how skilled an offensive player someone is. It’s relatively easy to be an efficient player if all you shoot are dunks, layups and wide open threes. That will get you to a usage in the low teens. We’ll discuss it more in a moment, but sometimes those good shots aren’t freely available, and so a player has to get the best “bad” shot they can. Guys who can score efficiently on tough shots like Chris Paul and Steph Curry are superstars because they can both shoot a lot and shoot well while doing so. Broadly speaking, players fall into “usage-efficiency” categories as shown below:
Of course, there is a lot of variation within those categories, and players can cross from one to another depending on role. On one team a player might be a chucker, but turn into a high-efficiency specialist on a better team (and yes, I have just described J.R. Smith’s 2014/15 season). Still, looking at usage and efficiency is a great first pass at evaluating a player’s role and his effectiveness in that role. As a final note, average eFG% has been right around .500 for the last several years, while an average TS% is usually somewhere between 53 and 54 percent.
On the team level, the best stats to learn about are Dean Oliver’s Four Factors. Essentially, good basketball is about making shots, getting to the free throw line, grabbing rebounds and not turning the ball over, and then preventing the opposition from doing all those things on defense. Oliver (recently parted from the Sacramento Kings front office after having worked at ESPN for a number of years prior to that) discusses this (and many other insights) in his book Basketball on Paper, which, even though over a decade old, remains valuable and is largely accessible even to the statistical neophyte.
If you want a quick reference for how good or bad a team is on offense, defense or overall, Offensive Rating, Defensive Rating and Net Rating (simply ORTG – DRTG) are great. These stats are simply points/possessions * 100. Without getting too much in the statistical weeds, while TEAM O and D ratings are great, individual offensive and defensive rating metrics are extremely problematic and should be avoided until you more fully understand what a given formulation is and isn’t measuring.
— Dane Despres (@DaneDLion) July 24, 2015
This is a great question. For me, the biggest thing has been some of the SportVU data illustrating the difference between an “open” shot and an “NBA open” shot. Stand facing a wall, extend your arm and back up til your fingers are about a foot away from the wall. That distance is “NBA open.” Take one step back, that’s NBA “wide open.” In everyday experience playing basketball, those are contested shots, but not so for top pros. Even if the defender running and jumping at a shooter, that degree of defensive proximity doesn’t bother NBA-level shooters that much. Basically this means a lot of what look like “contested” shots on TV are actually relatively open shots with the defense closing out from too far away and being too late. It’s easy to get seduced into thinking a team is doing a good job of contesting shots when in actuality, the offense might just be missing a greater than usual number of open or semi-contested looks.
— Jacob Bikshorn (@OldManBikshorn) July 22, 2015
Player height is a tricky place to start because what you are really asking about is size, and there are all kinds of size – head height, wingspan, standing reach, reach after maximum vertical leap, accounting for strength and bulk, etc. A simpler way to answer the question is to say the “bigger” a team plays in terms of positions, the better defensively and worse offensively it will play in general. Obviously, the Warriors played great defense playing a lot of small ball this past year, and perhaps more teams will move that way. But typically bigs add defensive value while smalls help on offense. Even within those broad averages, the specific players matter far more than their nominal position. Oklahoma City’s defense will almost certainly not be “better” when they play big with Kanter at the 5 and Ibaka at the 4 this year than when they play smaller with Durant at the 4 and Ibaka at the 5 because Kanter is bad at defense despite being a center.
As an initial point, I’d be caution against saying “stats people do X.” Many complaints or problems the general public has with certain stats or metrics are mirrored by long-running disagreements over the same issues in the statistical community. The value of shot creation has long been such a topic for debate, and the “shot creation doesn’t matter at all” position implied in the question is at this point as much on one fringe as “yay points!” is on the other.
With a 24-second clock on possessions, somebody has to shoot, and having guys who can do better, even in difficult, late clock situations has value. As a team you want to avoid those situations of course, but they occur regularly. Just over 12 percent of all shots were taken with four seconds or less on the shot clock, and even the go-go Warriors found themselves in very late clock scenarios 7.3 percent of the time. Even raising their percentage in late-clock situations by five percentage points would meaningfully improve the offense for every team in the league.
It’s hard to say which players are best specifically at late clock shot creation, as the sample sizes are so small. Only 12 players in the entire league attempted at least 100 shots in the last five seconds of the shot clock after holding the ball at least 2.5 seconds per SportVU data (2.5 seconds of “touch time” prior to a shot is a useful standin for “getting your own shot” without having to look at and code all 200,000 shots individually). Broadening the question out a little to consider all such “self-created” shots, the most effective scorers are probably not names that will surprise. Among the 51 players with at least 250 self-created shots last year, here are the top and bottom ten in terms of eFG%:
— Griffin Connolly (@GriffinConnolly) July 24, 2015
Aldridge is fairly controversial player, analytically speaking, simply because of his underwhelming efficiency. A few things to note: first, looking just at shooting slightly understates his offensive efficiency as he is an exceptionally low turnover player. Inaccurate shots are still massively better than no shots at all from a team standpoint. In Portland, it often fell to Aldridge to create offense late in the shot clock when nothing else happened. Below are the proportions of shots the Blazers’ starters took early, middle and late in the shot clock:
So Aldridge ended up taking much more than his share of the “somebody shoot, please” looks at the end of the clock and was slightly above average in terms of accuracy on those late clock attempts. By my fairly simple measure, Aldridge took the 12th toughest mix of shots last season among players with 500 or more attempts, while he had the single toughest mix the season before. He shoots a fair amount better than the average player would on those attempts. While he could certainly improve his shot selection some (he takes a few too many one-dribble contested pull-ups for my taste), Aldridge being decent at these tougher shots has probably been a big reason why Portland’s offense has been significantly better with him on the floor than off and a large portion of why they have been an elite offense (2nd in 2013/14 in ORTG and 8th in 2014/15) when healthy the last few years.
Short answer is sort of. Some studies have shown players have a tendency to become better shooters as their careers go on, but there may be a “survivor bias” here. It could be that if you’re a marginal player you either learn to shoot or you are out of the league and the older players tend to be the ones who have survived that bit of roster culling. So the full response is players can become better shooters. “Will they?” depends on hard work, smart work and maybe a little bit of luck to be in a situation where they are allowed to show improvement rather than remain pigeon-holed as a non-shooter.
Where is the best source for finding defensive impact numbers and opponent fg% when guarded by certain players? #AskBBall
— Art Vandelay (@pretzeltheory) July 23, 2015
There isn’t a good public source for this info yet, largely because “guarded by” is a tricky concept. Public SportVU data does capture the “closest defender” to a shot on release, but for a number of reasons even the best individual defense metrics are prone to being highly misleading. This isn’t to say we should give up and go home, but the line between “forcing a miss” and “standing near a guy when he happens to miss” is a pretty thin one. The public sphere is much further along in terms of measuring team defense, but sussing out individual contributions from a purely analytical perspective is going to take time, and almost certainly better, more inclusive raw data.
— Titop222 (@titop_7) July 23, 2015
I don’t want to say never, but contested three-pointers (with a defender four feet or closer to the shooter) and wide open mid-range shots (with a defender six feet or more from the shooter) had similar efficiency last year – roughly 45 percent eFG%. That said, three-pointers don’t just happen. They are often the end result of a process that leaves a shooter with enough time and space to catch and shoot from that distance. So greater numbers more open threes will require something else to happen first, such as collapsing the defense by either penetration or posting up. Many teams could increase their three-point attempts significantly, however, by slightly altering the way they run their sets.
— Peyton Fine (@peyton_fine) July 24, 2015
The full SportVU data can provide this quite readily, but that dataset is hard to come by if you don’t work for or with a team. It can be gleaned from public data, but it’s kind of a pain. Daren Willman, Justin Willard and Mike Beuoy, among others, have all done some interesting work on this type of data, but as of yet it’s not readily available publicly in one place.
— Paul James (@PaulBrady63) July 22, 2015
Well, let me back up. There were a few variations of this question asked this week, so I’ll try to summarize a little here.
Yes, there are many metrics that purport to do this. Some of them are even pretty good. In my opinion, none of them are accurate enough to be used as “rankings”. There are two primary types of these metrics: box-score based and lineup-based. Box score metrics have all the problems of box score stats – they reward accumulation, largely ignore defense, systematically over or underrate certain stats and so on. Player Efficiency Rating (PER) is probably the best known of these, and it’s fine for what it is. If you use it as a quick first pass of who matters, you won’t go too far wrong. But all the problems listed above are in effect in full force with PER. There are other, more involved box-score metrics, all of which share similar problems to one degree or another (and no, I don’t want to argue why any given metric is better or worse, thanks!)
Lineup-based metrics are from the lineage of Adjusted Plus/Minus (APM). APM was created with the realization that measuring players simply by comparing the team’s performance with a player on the court versus off the court is hugely influenced by the four guys a player is playing with and the five he’s playing against at any give time. To solve this, lineups were thrown into a bunch of statistical regression techniques and voila, APM. The adjustment in APM alters a player’s rating based on who else was on the floor for both teams. For a number of reasons we don’t need to go into here, this was a noisy measurement, and so APM has been reworked and improved upon with various, more advanced, statistical techniques. Probably the best current version of this sort of model is Regularized Adjusted Plus/Minus (RAPM). Just trust me, it’s better/more accurate. In any event, RAPM spits out a number measuring the estimated impact of the player in terms of points per 100 possessions above or below what a theoretical “average” player would provide. Anything above around +3/100 is All-Star level, anything below -2/100 is a guy who probably shouldn’t be on the court, with the vast bulk of players in between.
So what’s the problem? There are several, really. None of these issues are to suggest one-number metrics are worthless. Rather, they have limitations which need to be acknowledged, while also making them unsuitable for use as a pure rating/ranking mechanism.
1. Inexact Science – It is almost inherent in how the results of these metrics are presented that there be some confusion as to the precision achieved. If it says this guy is +2.57/100 as a player, that seems pretty exact. However, for a number of reasons that number, no matter how many decimal places you take it to, remains an estimate. An estimate influenced by small sample sizes, statistical errors and randomness. Over the short-to-medium term, how well your opponents and/or teammates shoot on open threes has a huge component of luck to it. If you are measuring the success/failure of a lineup by score margins, making and missing shots fairly obviously has a huge impact, and that’s only one aspect among many that can cause short-term peaks and valleys on the scoreboard. Now, the statistical methodology controls for this variance to a degree, but in actuality, when a metric says Player A is .5/100 “better” than player B, it’s really saying something more like it’s 60 percent likely the first player is better than the second. Which sounds like a lot until you remember that 50/50 is a coin flip. When you see people talking about confidence intervals and margins of error, this is what they are referring to.
2. Is vs. Did. – Players play better or worse sometimes. Some days they are tired, hurt, stressed, or distracted. Some days their bodies are feeling great and their focus laser-like. These things can last for a while. Maybe a player is feuding with a teammate, in the coach’s doghouse. Maybe he’s being showcased to drive up his trade value. One-number metrics only measure what the player actually did, it can’t know or control for the off-court or extraneous reasons why that might be. “Who had the most on-court impact over the last month” is a very different question from “who would you pick first for a game tomorrow.”
3. Context, context, context – By far the biggest problem is this: players have individual skillsets. For any player, if you put them in the position to do only the things they are good at, and hide the areas in which they struggle, they will look good. If you do the opposite, they will look bad. Many “RAPM-superstar” players benefit from being placed in roles in which the excel. Matt Bonner was long a classic example in that Gregg Popovich was excellent at using him only at times where his defensive shortcomings wouldn’t be much of an issue, and his ability to stretch the floor from the four or five would be valuable. If Pop had played him 35 minutes per night, Bonner would almost certainly have been exposed and ended up looking much worse on these metrics. He wouldn’t be any different of a player, but he would have been forced into a far less favorable context. This gets to the underlying problem with any one-number metric: they have become quite good at describing “what” and “how much” a player is doing while having very little insight into “why” and “how.” As basketball is a collaborative game, those how’s and why’s matter a great deal.
To reiterate, none of these issues render theses metrics useless, and a lot of good, precise, work has gone into creating them. If you are using them as a broad overview as far as who has played well or not, and maybe gotten some insights into players who are better (Amir Johnson) or worse (Enes Kanter) than their box score stats, peruse away. But when I see people say “Draymond Green and Khris Middleton (both players I love, FWIW) are top 10 players in this league because RPM says so,” I get a little stabby.
(BTW, if you are interested in a very long-winded discussion on one-number metrics, me and the good folks at Nylon Calculus had a long, somewhat dense, discussion of the pros and cons earlier this year.)
That’s all for this week. Apologies if I didn’t get to your submission. Some of the ones I didn’t answer were very similar to those I answered above, so hopefully I addressed those. We’re not sure how often we’ll run this feature, but if the response to this one is any indication, the answer is “regularly.” We’d also like to include a “do-it-yourself” tutorial, so “how do I…?” questions are more than welcome.