Feature Extraction for NBA MVPs: What makes an NBA player the Most Valuable?

Viren Gadkari
15 min readMar 8, 2022


The Most Valuable Player. A player that is acknowledged for their outstanding efforts during the regular season and is dubbed as one of the league’s best performers. The NBA allows the fans and a panel of top broadcasters and sportswriters to determine who is awarded the MVP. Some natural questions arise:

  • What makes a player the “most valuable”?
  • What features determine a viable MVP winner or candidate?
  • Can we find significant statistics that determine a potential MVP winner or candidate?

The Data

I attempted to answer these three questions by analyzing NBA MVP Voting data from the 1955 season to the 2020 season. I had web scraped this data from BasketballReference.com, and made the necessary preprocessing techniques to get the data prepared for analysis. As a disclaimer, my analysis considered the seasons from 1979–2020, because the 1979 season was when the 3-Point Line was introduced. The data used for the analysis consists of the NBA MVP candidates from 1979–2020, and contains voting statistics, per game statistics, and a few advanced metrics. After preprocessing, the variables of interest were:

  • Age
  • Pts Won- Amount of MVP Votes that were awarded to the player
  • PTS- Points per game
  • TRB- Total Rebounds per game
  • AST- Assists per game
  • STL- Steals per game
  • BLK- Blocks per game
  • FG%- Field Goal percentage per game
  • 3P%- 3-Point percentage per game
  • FT%- Free Throw percentage per game
  • WS- Win Shares
  • WS/48- Win Shares per 48 Minutes

The Analysis Approaches

Initially, when answering this question, I wanted to see if there were any patterns or shifts in the features associated with MVPs. The NBA has changed a lot over time, and I wanted to see if there was any change in the type of positions that were considered as MVPs. I was interested in seeing if there were any trends in the player position associated with MVPs and if there were any deviations from such a trend. Ultimately, I wanted to conclude the analysis by having a subset of variables associated with MVPs, so I also applied various statistical techniques to achieve this as well. My approaches for analyzing the data could be summarised using the following tasks:

  • MVP Feature Breakdown with Principal Component Analysis
  • A Travel Through Time: Exploring patterns and shifts in MVP Winners through Time
  • Feature Ranking & Selection for MVP Candidates with Regularized Regression and Decision Trees

MVP Feature Breakdown

Out of all the statistics in the box score, only a few determine a player's value to a team. This dataset consisted of a large number of features for each player, and my first plan of attack was to first explore the variability in these features. While this dataset wasn’t considered “high dimensional”, I did notice that some stats could be correlated to others. Moreover, having correlated predictors is a recipe for disaster when it comes to statistical modeling, so using PCA (Principal Component Analysis), I wanted to decompose these player statistics into a few uncorrelated principal components, that could help describe the variability in the data. Before we move on, I will give a brief overview of what Principal Component Analysis is, and why it was used.

Principal Component Analysis (PCA)

Principal Component Analysis, also known as PCA, is an unsupervised “machine learning” algorithm that helps decompose and understand very noisy and high dimensional data, hence why it is also known as a tool for “dimensionality reduction”. The term “high dimensional” refers to data where we have more columns than rows in our dataset. Another use of PCA is to help with the problem of multicollinearity, which includes the situation when many of your columns may be correlated to one another. PCA allows us to compute “principal components” from our original dataset, which are essentially transformations of our original predictors into a few uncorrelated predictors. The magic of PCA is that it allows us to decompose a large number of predictors which were correlated, into a set of a few uncorrelated predictors, all while retaining most of the variability of the data! For you Linear Algebra fans, this involved finding the eigenvalues and their associated eigenvectors of the covariance matrix. The eigenvector with the largest associated eigenvalue, represent the principal component which captures the most variance.

MVP Player “archetypes”

A visualization of the first three principal components

As we can see here, we have decomposed our original 12 predictors, into a set of three uncorrelated principal components which describe 66% of the variability of our data. Each component has associated features from the original dataset which contribute to it, as well as the associated contribution “loading” for each feature. The plot has been adjusted to show the relative magnitude of the contribution of each feature, hence why features labeled as “negative” have a positive value associated with them. Upon looking at the features associated with each principal component, one could try and form a grouping for the types of MVPs.

PC 1: Tenacious Rebounders and Rim Protectors

When looking at the top two features in component 1, we see blocks and total rebounds as being the main contributors. A natural interpretation could be that MVPs mostly associated with this principal component would be players who are more “defensive” minded. These could be forwards or centers that are known for their work on glass and rim protection. Of course, we could also consider scrappy guards who are known for their defensive presence, and could get their fair share of rebounds as well.

PC 2: Offensive High Impact Scoring Machines

In principal component 2, we see the features with the highest contribution are Win/Shares per 48 minutes, Points, as well as Free Throw and 3-Point shooting percentages. The types of players that come to mind here are high-impact players who are high-volume scorers. MVPs associated with this component could be players who are slashers who know how to get to the free-throw line and sharpshooters who can get on a hot scoring streak quickly. The Win Shares per 48 minutes statistic measures the relative impact of a player and their contribution to a teams wins. Win shares per 48 minutes is a key statistic in measuring player impact while adjusting for the time spent on the court.

PC 3: Experienced Veterans of the Professional Game

In principal component 3, we observe Age as one of the main features contributing to the overall variance captured by this component. The types of players represented here could be veterans in the NBA who have played enough to know the ropes of the professional game. Their talent combined with their age makes them a great addition to a team who may need senior leadership. Such players make great leaders on teams with young talent, as they serve as mentors and can help a team mature quickly.

A Travel Through Time

An interesting discussion that has come up many times is how the professional game has changed throughout the years. Many have claimed that the game has changed drastically, from interior post play in the early 80s and 90s, to more perimeter play and high volume three point shooting. In this section of my analysis I wanted to see if this trend followed in MVP winners. The main question I wanted to answer in this section of the analysis was: Do we see a shift in the positions of MVP winners through time?

Clustering with Principal Components

Now that we have our principal components resembling the various MVP winner “archetypes”, I wanted to cluster the data to see if certain players were more associated with one principal component vs another. Since the first two principal components capture most of the variance of the data, we will focus on clustering the MVP candidates using PC1 and PC2. The goal of this is to see how the frequency of the positions of MVP winners shifted through time. The following visualizations will show scatterplots, where each data point shaded in red represents a MVP candidate, and the points shaded in black represent MVP winners. The darker the shade of red, the more points that the MVP candidate won in the given year. We will do a decade by decade breakdown from 1979–2020 to test the claim of how time has an impact on MVP winners.

A Travel Through Time: 1979–1989 & 1990–2000

The two plots show a scatterplot of MVP candidates, where we cluster them according to PC1 and PC2. Players most associated with either component indicate that the player was most associated with the features that make up that component. As we can see from first plot, we see a few MVP winners who were the stand out guards, such as Michael Jordan and Magic Johnson. Interestingly enough, we see Michael Jordan most associated with PC1, and this would make sense since Jordan had won a Defensive Player of the Year Award in 1988 with the Chicago Bulls. Players like Dr.J, Magic Johnson, and Larry Bird are in the center of the plot, which makes sense since these players had more of an all-rounded game.

From 1990–2000, we see a high frequency of MVP winners most associated with defensive characteristics, such as TRB and BLK, and its no coincidence that these are the dominant big men of the era. Aside from our outlier of Michael Jordan, most of the MVP winners in the pre-2000s era were defensive minded forwards and centers. In summary, the NBA before the 2000s had MVP winners that resembled interior post play, which makes sense as this was how the game was played in the 80s and 90s.

A large presence of interior post players as MVP winners

A Travel Through Time: 2000–2010 & 2010–2020

Moving to the post-2000s, we see a shift in the MVP winners, as we see an immediate explosion of guards, most associated with the offensive, high-impact characteristics of PC2, such as PTS, 3P%, FG%, and WS/48. From 2000–2010, we see some forwards and centers that keep the interior post play era going, as players like Kevin Garnett and Tim Duncan. It is here in this era where we see that changepoint from interior post play to perimeter play, as the era of the big man is dying down, and quick offensive minded guards are coming onto the scene. KG and Tim Duncan are some of the only MVP winners in this era who were considered “traditional” post players. Allen Iverson wowed Sixers fans with his lightning fast speed and wizard-like ball handling, and Steve Nash pleased Suns fans with his lights out shooting and pass first mindset, as he embodied the idea of a “true” point guard.

MVP Winners in the 2000–2010 era most associated with PC2

However, when we look at the MVP winners from 2010–2020, we see more of a presence of athletic guards and sharp shooters. Players like Stephen Curry, Derrick Rose, and Russell Westbrook are some of the prominent MVP winners of this era, which resembles a stark contrast from the type of MVP winners we observed in 90s. Derrick Rose and Russell Westbrook gave rise to a new trend of “high flying” point guards with freak like athleticism, whereas Stephen Curry left defenders in 2015 breathless with his limitless shooting range. We see some MVP winners who played more of a “point forward” role, such as Giannis and Lebron James. Another thing to keep in mind is that since a lot of the MVP winners in this era are associated with PC2, these players were associated with the WS/48 statistic. Could we argue that MVP winners in this era made more of a relative impact to their teams wins with offensive characteristics, rather than MVP winners in the pre-2000s with defensive characteristics? Its a bold claim to make, and we would have to do more analysis to see what statistics go into calculating the WS/48 statistic, but from the data it seems as though that MVPs in the post-2000s era display more high value to their teams wins through offensive characteristics, rather than defensive characteristics.

A huge influx of perimeter playing guards as MVP Winners from 2010–2020

Upon taking the travel through time, we clearly see how much time dependence there is for features associated with MVP winners. This is something to take into account when doing our feature selection, since clearly, the feature associated with MVP winners changed after 20 years. We saw that early MVP winners were associated with defensive characteristics, and were represented by interior post players. However, once we looked past the 2000s, we see a shift in more of the MVP winners who were more offensive minded and high impact players, and were represented by athletic perimeter players. Its important to keep this time dependence in mind when we assess the uncertainty of our models later on.

Feature Selection & Feature Ranking for MVP Candidates

We now get to the question we want to answer: What are the characteristics of MVP candidates? Before we go into the analysis, I will briefly explain the two statistical methods I used, and how they helped with the results.

Regularized Regression and Decision Trees for MVP Feature Selection

Regularized Regression

In Regression, the problem that analysts often run into is including predictors in their model which may not be associated with the response variable. The process of removing uneccessary predictors in our model is known as “Feature Selection”, and this is the method that was used to pick our characteristics associated with MVP candidates. When we have lots of predictors in our model, we want to apply Regularization to our coefficient estimates. Regularization involved adding a “penalty” to our predictors, and shrinks the coefficient estimates of some predictors close to, or near zero. Regularized Regression bakes in the concept of possible feature selection, as the penalized coefficients are shrunk, and the non-penalized coefficients become our “selected” features from our model. The two regularized regression methods we used in this analysis were Ridge Regression and Lasso Regression. Ridge Regression allows for natural feature ranking, as it will apply the penalty to all the predictors, and shrinks coefficient estimates for unecessary predictors close to zero. In Lasso Regression, we get feature selection, as the penalty is added to all predictors, but a subset of the predictors have coefficients shrunk completely to zero.

Decision Trees

Decision Trees are another class of statistical models which allow for a more flexible approach to modeling, while still retaining intepretability of the model. Decision Trees take into account the non-linear relationship between the response variable and predictors that Linear Regression may fail to capture. In addition, the output of such models display the features associated with the response in a tree-like structure, and make them easy to interpret. Decision Trees partition the predictors, and use an algorithm known as “Recursive Binary Splitting” to fit the tree. When we take a look at the output of decision trees in the context of our problem, we will see how the different predictors in our dataset are most associated with MVP candidates in an easy to intepret tree diagram.

Regularized Regression for MVP Feature Selection

In both of the regression models, we treated our response variable as the “Pts Won”, which are the points awarded to the MVP candidate of that year, and the predictors to be the associated defensive and offensive statistics with the given player. The following plots below indicate the coefficient estimates for each of the predictors, for the Lasso and Ridge Regression models.

Features Selected from Lasso Regression Model

From this graph we can see that our Lasso Regression model identified the subset of features most associated with MVP candidates to be a good dose of offensive statistics combined with the players contributions to wins. Our model found that players who were more offensive minded and contributed to teams wins had the characteristics associated with winning more points in the MVP race. These players also had a presence on the glass as well, as their hard work in the paint made them more valuable in the eyes of MVP voting. Combined with their ball movement, these features really represented MVP candidates that were true team players.

Features Selected from Ridge Regression Model

In the case of Ridge Regression, we see that our model produced similar results to our previous model, as it ranked similar characteristics associated with offensive minded, team players. In addition, we see that our Ridge Regression model had blocks as having a positive coefficient, indicating that holding all other predictors equal, and increase in the number of blocks is associated in an average increase in the number of points won in the MVP race. This is interesting, as our Ridge Regression model expands the type of player that our MVP candidates are, in that they should be more of a two way player. In the same context, when we look at Age as having a positive coeffiecient, we could also make the case that our MVP candidate is someone who may have had more experience playing in the league, but due to its small value, it doesn’t seem to have as much of an impact on points won vs. the other features, and this makes sense since there have been many NBA MVPs who were younger. The addition of BLK as an important feature, could also suggest that MVP candidates are more than just offensive minded players that make an impact through their high volume scoring, but rather, they are more of a two way player who has a defensive presence as well.

Features Selected from the Decision Tree Model

In interpreting the Decision Tree, the heirarchy of the tree stumps symbolizes the most important feature at each level. For example, the most important feature associated with the amount of MVP points won is WS/48. At each stump there is a “decision” that is made, and if the criteria is met, the data is partitioned according to the result. For example, if the given MVP candidate had a WS/48 less than 0.24, the player was partitioned into the “yes” side, vs if they were greater than 0.24, they were partitioned into the right side. This process repeats as a specific % of observations are sectioned off based on the criteria. At each tree node, we see the number above the %, which represents the predicted amount of MVP points won for the players in that node. Based on this, we can see that players who were partitioned in the bottom right portion of the tree, had 874 predicted amount of MVP points won, and made up 3% of the data. This indicates that these players were most likely our MVP winners, and we can work our way up the tree to figure out which features were associated with such players. When we look at the top partition, most of our high value MVP candidates had a WS/48 greater than 0.24. From here we can make two observations:

  1. MVP candidates with the highest number of MVP votes had WS/48 greater than 0.28.
  2. MVP candidates with the second highest number of MVP votes had WS/48 less than 0.28, but had high assist totals and shot less than 81% from the free throw line. This indicates that even though the players that met this criteria weren’t extremely accurate from the free throw line, they were better team players and moved the ball around a lot more.

From these two observations, the Decision Tree model tells us that the features most associated with high value MVP candidates are players who move the ball around and are high impact players on the floor as they contribute a lot to their teams wins.

Conclusions and Final Thoughts

Through multiple analysis approaches, we made the following observations:

  • There has been a shift in the player type of MVP players, starting out with Center and Forwards with defensive characteristics, and moving into a more Guard dominated league with offensive characteristics
  • Win Shares Per 48 minutes, a statistic measuring the impact of a player and their contribution to teams wins, is a big factor in determining MVP winners
  • The NBA MVP is a two-way player, who moves the ball around, and can score at will when he needs to

GMs and the Front Office can get an idea of what players in free agency could have potential superstar MVP talent by looking at high impact statistics like WS/48, as well as a mix of offensive and defensive characteristics. Optimal MVP candidates will be high scorers who are two way players, and are high contributors to their teams wins while they are on the floor. However, there is a caveat with this action.

We noticed that the features associated with MVPs varied drastically across time, and this time dependence is something that could be something to consider. We saw how much of a shift there was in the type of game that was being played in the NBA from 1990s to 2010, and we can’t guarantee that perimeter guard play will be the type of game being played 10 years from now! In general, it is hard to extrapolate with statistical models, especially when there is time dependence. The data does give GMs a good idea of a set of features associated with MVP candidates, but there should be a good balance of consideration on the current trends in the NBA, and any shifts in how the game has changed. Who knows, for all we know the interior post play from the 90s could make a return, and be a driving force in future seasons.

Thanks for taking the time to read this article! Addressing interesting sports questions is always fascinating for me, and I appreciate any feedback on my writing or technical details.

Software: R, Python

Packages: ggplot2, dplyr, tidymodels, BeautifulSoup4 & Requests

Dataset Scraped From: BasketballReference.com

Connect With Me!: https://www.linkedin.com/in/viren-gadkari-13a287191/



Viren Gadkari

Statistics Student, Sports Enthusiast