The Evolution of the 3 point shot

By: Ryan Warner and Eric Zhang

The 3 pointer is the most deadly method of scoring in basketball, offering the most potential points per shot attempt. However, it was not always like this. Basketball during the Michael Jordan era (and even earlier eras) was mainly dependent on team-based basketball with 3 pointers not as popularized throughout the game. For the NBA, this is important as the three point shot has only been increasing in popularity in recent years. Over the past 5 or so years, the Golden State Warriors revolutionized the game by shooting a whole bunch of three pointers, and other teams have started to copy their strategy as they were very successful. In a sense, the Warriors have turned the 3 pointer into a weapon and a style of play. However, there has recently been discussions about whether or not this is actually a successful strategy or if these other teams and players are actually becoming worse by shooting too many three pointers.

There are two main approaches that we took to analyzing the effectiveness of the 3 point line for individual players and teams:

For players we will be asking the question: Will a shooting at a higher 3 pt percentage translate to higher player success (higher salary)? If a higher 3 point percentage does translate to success, then should future basketball players solely focus and train on 3 point shooting (since it’ll mean they get paid more)? Or will a player need to be more well-rounded to have a higher salary and value? In addition we wanted to see if we could predict a player’s salary and market-value given their 3 point shooting statistics. These questions not only have implications on future and current basketball players, but also on NBA owners and general managers. Deriving a model is important to assessing the market value of an NBA player. If NBA owners and General managers can predict market values of players, they can approach player negotiations better. The same goes for players when approaching contract negotiations as well.

For teams will be analyzing the following: Do 3 point attempts and 3 points made have an impact on if the team will win or lose? This specific question has a lot of impact on NBA front offices. It can potentially impact the decisions that they make and the players that they acquire. It also is important for sports betting and placing odds on the game. If it can be accurately predicted if a team will win or lose based on 3 point percentage, it may make sense to bet more on the higher 3 point shooting teams in the NBA.

In [17]:
import warnings
warnings.filterwarnings("ignore")

from bs4 import BeautifulSoup
import requests
import pandas as pd
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
import seaborn as sns
from sklearn.model_selection import KFold
from sklearn.datasets import load_iris
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import ttest_ind
from sklearn import metrics
from scipy import stats

url1 = 'https://www.basketball-reference.com/leagues/NBA_2020_per_game.html'
url2 = "https://hoopshype.com/salaries/players/2019-2020/"
headers = {"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0"}

1. Player Data

1.A Data Collection and Cleaning

To proceed with the player question, we must first gather data about player statistics and salaries. Here we webscrape 2 seperate databases. One being from basetkball reference (the primary data source for all basketball statistics and one from hoopshype (for the salaries).

In [18]:
# Import the data set of all players stats from 2020
r1 = requests.get(url1, headers = headers)
root1 = BeautifulSoup(r1.content)
lnks1 = root1.find('table')
pretty1 = lnks1.prettify()
table1 = pd.read_html(pretty1)
stats = table1[0]
stats.head()
Out[18]:
Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
0 1 Steven Adams C 26 OKC 63 63 26.7 4.5 7.6 ... .582 3.3 6.0 9.3 2.3 0.8 1.1 1.5 1.9 10.9
1 2 Bam Adebayo PF 22 MIA 72 72 33.6 6.1 11.0 ... .691 2.4 7.8 10.2 5.1 1.1 1.3 2.8 2.5 15.9
2 3 LaMarcus Aldridge C 34 SAS 53 53 33.1 7.4 15.0 ... .827 1.9 5.5 7.4 2.4 0.7 1.6 1.4 2.4 18.9
3 4 Kyle Alexander C 23 MIA 2 0 6.5 0.5 1.0 ... NaN 1.0 0.5 1.5 0.0 0.0 0.0 0.5 0.5 1.0
4 5 Nickeil Alexander-Walker SG 21 NOP 47 1 12.6 2.1 5.7 ... .676 0.2 1.6 1.8 1.9 0.4 0.2 1.1 1.2 5.7

5 rows × 30 columns

In [19]:
stats.drop(stats[stats.Rk == 'Rk'].index, inplace=True)
# This data set repeats the headings in the data set, so those rows are dropped

counter = Counter(stats['Player'])
for player in counter:
    if counter[player] > 1:
        stats.drop(stats[(stats['Player'] == player) & (stats['Tm'] != 'TOT')].index, inplace=True)
        # Players who appear for multiple teams have all stats except their total stats dropped
stats.head()
Out[19]:
Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
0 1 Steven Adams C 26 OKC 63 63 26.7 4.5 7.6 ... .582 3.3 6.0 9.3 2.3 0.8 1.1 1.5 1.9 10.9
1 2 Bam Adebayo PF 22 MIA 72 72 33.6 6.1 11.0 ... .691 2.4 7.8 10.2 5.1 1.1 1.3 2.8 2.5 15.9
2 3 LaMarcus Aldridge C 34 SAS 53 53 33.1 7.4 15.0 ... .827 1.9 5.5 7.4 2.4 0.7 1.6 1.4 2.4 18.9
3 4 Kyle Alexander C 23 MIA 2 0 6.5 0.5 1.0 ... NaN 1.0 0.5 1.5 0.0 0.0 0.0 0.5 0.5 1.0
4 5 Nickeil Alexander-Walker SG 21 NOP 47 1 12.6 2.1 5.7 ... .676 0.2 1.6 1.8 1.9 0.4 0.2 1.1 1.2 5.7

5 rows × 30 columns

In [20]:
# Import the salary information for all players in 2020
r2 = requests.get(url2, headers = headers)
root2 = BeautifulSoup(r2.content)
table2 = root2.find("table")
pretty2 = table2.prettify()
pd_table2 = pd.read_html(pretty2)
pand2 = pd_table2[0]
pand2 = pand2.drop(columns=["Unnamed: 0"])
nba_salaries = pand2.drop(columns=["2019/20(*)"])
nba_salaries.columns = ['Player', 'Salary']
nba_salaries.head()
Out[20]:
Player Salary
0 Stephen Curry $40,231,758
1 Russell Westbrook $38,506,482
2 Chris Paul $38,506,482
3 James Harden $38,199,000
4 John Wall $38,199,000
In [21]:
final_dataset = pd.merge(stats, nba_salaries, how = 'inner', on = 'Player')
final_dataset = final_dataset[final_dataset['3P%'].notna()]
# Salary and stats data sets are merged and players who did not play enough for their percentage to exist are dropped.

final_dataset['3P'] = pd.to_numeric(final_dataset['3P'])
final_dataset['3PA'] = pd.to_numeric(final_dataset['3PA'])
final_dataset['3P%'] = pd.to_numeric(final_dataset['3P%'])

i = 0
while i < len(final_dataset):
    final_dataset.iat[i,30] = final_dataset.iat[i,30][1:]
    i += 1
final_dataset['Salary'] = final_dataset['Salary'].str.replace(',', '').astype(int)
# Salary column is cleaned so it can be treated as an int

reg = linear_model.LinearRegression()
x = []
y = []
final_dataset.head()
Out[21]:
Rk Player Pos Age Tm G GS MP FG FGA ... ORB DRB TRB AST STL BLK TOV PF PTS Salary
0 1 Steven Adams C 26 OKC 63 63 26.7 4.5 7.6 ... 3.3 6.0 9.3 2.3 0.8 1.1 1.5 1.9 10.9 25842697
1 2 Bam Adebayo PF 22 MIA 72 72 33.6 6.1 11.0 ... 2.4 7.8 10.2 5.1 1.1 1.3 2.8 2.5 15.9 3454080
2 3 LaMarcus Aldridge C 34 SAS 53 53 33.1 7.4 15.0 ... 1.9 5.5 7.4 2.4 0.7 1.6 1.4 2.4 18.9 26000000
4 5 Nickeil Alexander-Walker SG 21 NOP 47 1 12.6 2.1 5.7 ... 0.2 1.6 1.8 1.9 0.4 0.2 1.1 1.2 5.7 2964840
5 6 Grayson Allen SG 24 MEM 38 0 18.9 3.1 6.6 ... 0.2 2.0 2.2 1.4 0.3 0.1 0.9 1.4 8.7 2429400

5 rows × 31 columns

After gathering the two datasets, we merged them to combine into a large datset, including player statistics and salary from the 2020 season. A couple challenges is that we had to get rid of duplicate players (if a player was traded to another team mid-season) and convert the salary and other variables into numeric numbers for easier analysis. Above is a view of our final dataframe that we will use for analysis for specific players.

1.B Data Exploration and Analysis

In [22]:
plt.plot(final_dataset['3PA'], final_dataset['3P%'], 'o')
plt.xlabel('Three Point Attempts per Game')
plt.ylabel('Three Point Percentage')
plt.title('Three point Percentage vs Attempts per Game')
plt.ylim([0, 0.7])
# Allows the graph to be seen better, cuts off a couple outliers of people who show 100% on almost zero attempts
plt.show()

This graph shows that people who shoots more attempts tend to have a higher percentage of their shots that go in. This makes sense as players who don't shoot as well likely will try to score in other ways rather than just shooting three pointers. Additionally, this graph shows that as the attempts rise above around 6 per game, the percentage doesn't really increase, and may actually decrease, as attempts go up. I think this makes sense as the people shooting the most three pointers per game are probably shooting some of their shots when they are well defended, lowering their chance of making it compared to if they only shot when wide open. For example, James Harden, the player who shot over 12 three point attempts per game, made only 35.5% of them. However, if he decided not to shoot a few of the lowest percentage shots each game, his attempts would significantly decrease, while his makes would likely decrease at a lower rate as he would be eliminating the shots with the lowest percentage of going in. Thus, he could likely raise his percentage by eliminating the attempts with the lowest percent chance of going in, meaning that by shooting more, he is likely lowering his three point percentage.

In [23]:
plt.plot(final_dataset['3PA'], final_dataset['Salary'], 'o')
plt.xlabel('Three Point Attempts per Game')
plt.ylabel('Salary')
plt.title('Salary vs Three Point Attempts per Game')
plt.show()
In [24]:
plt.plot(final_dataset['3P%'], final_dataset['Salary'], 'o')
plt.xlabel('Three Point Percentage')
plt.ylabel('Salary')
plt.title('Salary vs Three Point Percentage')
plt.xlim([0, 0.7])
plt.show()

The first graph seems to show a positive correlation between three point attempts and salary. Intuitively, this makes sense as players who make more money likely play and shoot more, leading to more attempts per game. However, the second graph shows less of a relationship between three point percentage and salary, probably because a player a lot of minutes per game and shooting 35% on many attempts is much more valuable than a player playing very little and shooting 35% on very few attempts.

1.C Regression Analysis

In [25]:
i = 0
while i < len(final_dataset):
    salary = final_dataset.iat[i,30]
    three_attempt = final_dataset.iat[i,12]
    three_made = final_dataset.iat[i,11]
    y.append(salary)
    x.append([three_attempt, three_made])
    i += 1
reg.fit(x, y)
# Regression trying to predict salary based on three point attempts and makes

print("Coefficient are " + str(reg.coef_))
print("Y intercept is " + str(reg.intercept_))
print("R^2 of model is " + str(reg.score(x, y)))

i = 0
sum_res = 0

while i < len(final_dataset):
    act_salary = final_dataset.iat[i,30]
    three_attempt = final_dataset.iat[i,12]
    three_made = final_dataset.iat[i,11]
    prediction = reg.intercept_ + ((three_attempt * reg.coef_[0]) + (three_made * reg.coef_[1]))
    residual = prediction - act_salary
    square_res = residual ** 2
    sum_res = sum_res + square_res
    i += 1
mean_square_error = sum_res / len(final_dataset)
print("Mean Square Error " + str(mean_square_error))
Coefficient are [ 3121860.55702893 -3033573.29786537]
Y intercept is 1010825.6758958511
R^2 of model is 0.23098869206640604
Mean Square Error 62127373869472.79

We ran a multivariate linear regression involving 3 pointers attempted and 3 pointers made to try and predict salary. The model yielded an R squared of 0.23 and a large Mean Square Error. Although the MSE is rather larger than normal, we think this makes sense given the fact that NBA salaries are often in the multi-millions and any error in prediction will be amplified due to the large numbers.

In [26]:
reg = linear_model.LinearRegression()
x = []
y = []
i = 0
while i < len(final_dataset):
    salary = final_dataset.iat[i,30]
    three_attempt = final_dataset.iat[i,12]
    y.append(salary)
    x.append([three_attempt])
    i += 1
reg.fit(x, y)
# Regression trying to predict salary based on three point attempts

print("Coefficient are " + str(reg.coef_))
print("Y intercept is " + str(reg.intercept_))
print("R^2 of model is " + str(reg.score(x, y)))

i = 0
sum_res = 0

while i < len(final_dataset):
    act_salary = final_dataset.iat[i,30]
    three_attempt = final_dataset.iat[i,12]
    prediction = reg.predict(np.array([[three_attempt]]))
    residual = prediction - act_salary
    square_res = residual ** 2
    sum_res = sum_res + square_res
    i += 1
mean_square_error = sum_res / len(final_dataset)
print("Mean Square Error " + str(mean_square_error[0]))
Coefficient are [1973753.80968125]
Y intercept is 1264385.877945791
R^2 of model is 0.22747366009920622
Mean Square Error 62411348504093.17

After viewing the graphs with the seperate variables graphed vs Salary we decided to drop the 3 points made variable. Based on the viewing of the graphs, it seemed like the 3 point attempts had the biggest linear correlation with Salary. This intuitively made sense, since if you shot more 3 pointers, you are more likely to be a more successful scorer ("You miss 100% of the shots you don't take"- Wayne Gretzsky - Michael Scott) and thus would probably be compensated with a higher salary.

As it turns out the singe regression model didn't make too dramatic of a difference with the prediction, with a rather large Mean Square Error and a drop in the R^2 of the model.

A thing to note is that we did not include 3 point percentage in our model. Since 3 point percentage is just the number of 3 pointers made/3 pointers attempted, it is a redundant variable and would impact the model accordingly. This is the approach we will use for the team win model as well.

Now lets move onto analyzing 3 point attempts and 3 points made on team wins:

2. Team Data

2.A Data Collection and Cleaning

In [27]:
url3 = 'https://www.basketball-reference.com/teams/MIL/2020/gamelog/'
# Game stats for all games played by MIL

r3 = requests.get(url3, headers = headers)
root3 = BeautifulSoup(r3.content)
lnks3 = root3.find('table')
pretty3 = lnks3.prettify()
table3 = pd.read_html(pretty3)
log = table3[0]
log.columns = ['Rk', 'G', 'Date', '@', 'Opp', 'W/L', 'Tm', 'Op', 'FG', 'FGA', 'FG%', '3P', '3PA', '3P%', 'FT', 'FTA', 'FT%', 'ORB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'null', 'FG1', 'FGA1', 'FG%1', '3P1', '3PA1', '3P%1', 'FT1', 'FTA1', 'FT%1', 'ORB1', 'TRB1', 'AST1', 'STL1', 'BLK1', 'TOV1', 'PF1']
log = log.drop(columns = ['@', 'Op', 'null', 'FG1', 'FGA1', 'FG%1', '3P1', '3PA1', '3P%1', 'FT1', 'FTA1', 'FT%1', 'ORB1', 'TRB1', 'AST1', 'STL1', 'BLK1', 'TOV1', 'PF1'])
log = log[log['Rk'].notna()]
log.drop(log[log.Rk == 'Rk'].index, inplace=True)
# Data set is cleaned up


teams = set(log['Opp'])
# All other teams were opponents in this data set, so this gives all teams
for team in teams:
    url3 = url3[:43] + team + url3[46:]
    # String replacement is used to scrape the data of the other teams
    r3 = requests.get(url3, headers = headers)
    root3 = BeautifulSoup(r3.content)
    lnks3 = root3.find('table')
    pretty3 = lnks3.prettify()
    table3 = pd.read_html(pretty3)
    temp = table3[0]
    temp.columns = ['Rk', 'G', 'Date', '@', 'Opp', 'W/L', 'Tm', 'Op', 'FG', 'FGA', 'FG%', '3P', '3PA', '3P%', 'FT', 'FTA', 'FT%', 'ORB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'null', 'FG1', 'FGA1', 'FG%1', '3P1', '3PA1', '3P%1', 'FT1', 'FTA1', 'FT%1', 'ORB1', 'TRB1', 'AST1', 'STL1', 'BLK1', 'TOV1', 'PF1']
    temp = temp.drop(columns = ['@', 'Op', 'null', 'FG1', 'FGA1', 'FG%1', '3P1', '3PA1', '3P%1', 'FT1', 'FTA1', 'FT%1', 'ORB1', 'TRB1', 'AST1', 'STL1', 'BLK1', 'TOV1', 'PF1'])
    temp = temp[temp['Rk'].notna()]
    temp.drop(temp[temp.Rk == 'Rk'].index, inplace=True)
    log = pd.concat([log, temp])
    # The data for all other teams is concatenated to the end of the data frame.

log = log.reset_index()
log = log.drop(columns = ['index'])
log.head()
Out[27]:
Rk G Date Opp W/L Tm FG FGA FG% 3P ... FT FTA FT% ORB TRB AST STL BLK TOV PF
0 1 1 2019-10-24 HOU W 117 46 99 .465 16 ... 9 18 .500 6 53 31 7 10 11 27
1 2 2 2019-10-26 MIA L 126 41 94 .436 17 ... 27 35 .771 5 47 22 8 4 18 32
2 3 3 2019-10-28 CLE W 129 48 92 .522 17 ... 16 24 .667 8 50 29 9 7 12 15
3 4 4 2019-10-30 BOS L 105 38 82 .463 14 ... 15 24 .625 5 45 21 3 4 15 21
4 5 5 2019-11-01 ORL W 123 47 93 .505 17 ... 12 18 .667 11 58 24 11 4 13 19

5 rows × 22 columns

The basketball reference database only had links to specific teams and their entire season of games (including overall game statistics and if the team won or lost that specific game). Instead of going to all the teams databases, we quickly realized that the url can be manipulated to navigate to a specific team. For example the Milwaukee Bucks url is https://www.basketball-reference.com/teams/MIL/2020/gamelog/, with the MIL section of the url representing the abbreviation for Milwaukee.

To go about pulling the entire dataset, we first pulled the Bucks dataset from the page. The dataset came with the Bucks data as well as the opponents statstics from that game. Since we were planning on pulling the data from all teams, we felt that including opponents data from that game would be redundant, so this specific portion of the dataset was dropped.

Next we noted that the Bucks played all of the NBA teams last season. In addition, the team abbreviation was included in the opponents section. We used a list to collect unique team abbrevations and with a little string manipulation, we were able to construct an url referencing to the team data. We performed the same data cleaning as the Bucks dataset and were able to merge all the respective datasets into the final dataset.

The final dataset comprising of total stats and the win or loss is visualized above.

In [28]:
log = log.reset_index()
log = log.drop(columns = ['index', 'Rk', 'G', 'Date', 'Opp', 'Tm'])
# Unneeded columns are dropped

i = 0
while i < len(log):
    label = log.iat[i, 0]
    if label == 'W':
        log.iat[i,0] = int(1)
    else:
        log.iat[i,0] = int(0)
    i += 1
    # A win is encoded to a 1 and a loss is encoded to a 0
    
log.head()
Out[28]:
W/L FG FGA FG% 3P 3PA 3P% FT FTA FT% ORB TRB AST STL BLK TOV PF
0 1 46 99 .465 16 46 .348 9 18 .500 6 53 31 7 10 11 27
1 0 41 94 .436 17 54 .315 27 35 .771 5 47 22 8 4 18 32
2 1 48 92 .522 17 38 .447 16 24 .667 8 50 29 9 7 12 15
3 0 38 82 .463 14 45 .311 15 24 .625 5 45 21 3 4 15 21
4 1 47 93 .505 17 47 .362 12 18 .667 11 58 24 11 4 13 19

Before running the model, we first had to convert the 'W' and 'L' column to numeric numbers: 1 representing a Win and 0 representing a loss.

2.B Data Exploration and Analysis

In [31]:
log['W/L'] = pd.to_numeric(log['W/L'])
log['3P%'] = pd.to_numeric(log['3P%'])
ax = sns.violinplot(x='W/L', y='3P%', data=log)
sns.set(rc={'figure.figsize':(8,5)})
ax.set_xticklabels(['L', 'W'])
plt.show()
# Violin plot showing three point percentage in wins vs losses

This plot shows that teams who win tend to make a higher percentage of three pointers than teams that lose. Since there are over 2000 points in the plot and the mean and distribution of three point percentage look different, it stands to reason that three point percentage is correlated with a team won or lost the game. Furthermore, it appears that teams are more likely to win if they make a higher percentage of three pointers which makes sense as this helps them score more points.

2.C Classification Model and Analysis

In [14]:
log['3P'] = pd.to_numeric(log['3P'])
log['3PA'] = pd.to_numeric(log['3PA'])
log['W/L'] = pd.to_numeric(log['W/L'])


x = log.drop(columns = ['W/L','FG','FGA','FG%','3P%','FT','FTA', 'FT%','ORB','TRB','AST','STL','BLK','TOV','PF']).values
y = log['W/L'].values
kf = KFold(n_splits=10)
decision_error = 0
random_error = 0


for train_index, test_index in kf.split(x):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]


    ##Training Model
    DTC_CLF = tree.DecisionTreeClassifier().fit(x_train,y_train)
    RF_CLF =  RandomForestClassifier(n_estimators=10).fit(x_train,y_train)
    
    ##Predicting
    DTC_Y = DTC_CLF.predict(x_test)
    RF_Y = RF_CLF.predict(x_test)

    
    ##Computing Error Estimate
    decision_error += stats.sem(np.round_(DTC_Y - y_test))
    random_error += stats.sem(np.round_(RF_Y - y_test))
    
    print(stats.ttest_rel(np.round_(DTC_Y), y_test))
    print(stats.ttest_rel(np.round_(RF_Y), y_test))

decision_error /= 10
random_error /= 10
print("Decision Error " + str(decision_error) )
print("Random Error " + str(random_error) )
Ttest_relResult(statistic=-1.909165688891133, pvalue=0.05759746081368803)
Ttest_relResult(statistic=-1.0427869671381735, pvalue=0.29824025339565974)
Ttest_relResult(statistic=0.4520601215666959, pvalue=0.6516900441718162)
Ttest_relResult(statistic=1.6333659226105048, pvalue=0.10388332250781605)
Ttest_relResult(statistic=0.6617101118144355, pvalue=0.5088790047200347)
Ttest_relResult(statistic=2.5897226612354083, pvalue=0.010274359163943161)
Ttest_relResult(statistic=-0.7412095333622579, pvalue=0.45939052753279275)
Ttest_relResult(statistic=0.8522522555304847, pvalue=0.39504040020494036)
Ttest_relResult(statistic=-1.900488906069808, pvalue=0.05873219241043083)
Ttest_relResult(statistic=-0.5546476254819726, pvalue=0.579723145172127)
Ttest_relResult(statistic=0.6461071007033186, pvalue=0.5189120867532302)
Ttest_relResult(statistic=1.900488906069808, pvalue=0.05873219241043083)
Ttest_relResult(statistic=-0.8186504933874156, pvalue=0.4139094623808859)
Ttest_relResult(statistic=1.485699915922903, pvalue=0.13885107517742773)
Ttest_relResult(statistic=-3.405610237564174, pvalue=0.0007902327951535493)
Ttest_relResult(statistic=-2.1018855696906296, pvalue=0.03674964198282493)
Ttest_relResult(statistic=-1.2811469025908921, pvalue=0.20155500958101524)
Ttest_relResult(statistic=-0.6461028735788326, pvalue=0.5189181591544052)
Ttest_relResult(statistic=0.7412057823208493, pvalue=0.4593967173711675)
Ttest_relResult(statistic=0.8723771561685671, pvalue=0.38399900186205593)
Decision Error 0.043194616231266546
Random Error 0.043306954612639635

We tried to classify a game as a win or a loss based on three point attempts and makes. We used both a decision tree classifier and a random forest classifier to do this classification. We used 10 fold cross validation with a paired t-test to look at how well our classifiers did. Since most of the p values in the t-tests are greater than the standard of 0.05, we are unable to determine that there is a significant relationship between the predicted win/loss and the actual result of the game. Therefore, looking at three point attempts and makes is not enough information to confidently predict whether a team won or lost.

Conclusion

Overall, it seems that on average, teams tend to make a higher percentage of three pointers in wins than losses. This makes sense as games are often close and making one or two more shots can affect the outcome of the game. However, when looking at the results of all NBA games last year, we found it difficult to predict whether a team won or lost based on three point makes and attempts. We believe that this may have to do with the different styles teams choose to play. For example, a team that doesn’t focus on shooting three pointers can have a good game when shooting and making a low volume of three pointers, but a team that focuses on shooting three pointers will likely struggle if they shoot a low number of three pointers.

Additionally, we found it difficult to predict the salary of an individual player based off of either three point attempts or three point percentage. While there seemed to be a stronger correlation between three point attempts and salary compared to the correlation between three point percentage and salary, predicting the salary was still very difficult. While on average, it makes sense that better players will play and shoot more, due to the differences in play style and positions, this information wasn’t enough to accurately predict salary. For example, a highly paid center may choose to almost never shoot from three if he is a better scorer closer to the basket, while a shooting guard may take many attempts from three but struggle in other aspects of the game, resulting in a lower salary.

While the three point shot is undoubtedly an important aspect in basketball today, due to the several different strategies and play styles, we found it hard to predict either player success (in terms of salary) or team success (in terms of wins) based on how the player or team shot three pointers. As the game continues to evolve, it will be interesting to see whether the three point trend remains as popular as it is today and what strategies teams will come up with to reduce its effectiveness.