Estimated reading time: 201 minutes

Comment on the predictive strenght of interactions: no more thn two, otherwise the brain capsizes.

This notebook would study the possibilty of preempting customer churn so as to allow for the opportunity to intervene. As an analyst you have to understand who is leaving and why. Customers who have left in the last month are known as customers that have churned. This is thus the target you want to investigate, and you want to understand the characteristics associated with this target. All you do is a simple prediction excercise with a tree, understandable variables and feature importance measures for single and interaction variables. Another interesting feature is a wordcloud of important predictors. Here is an example of some wordclouds, https://nicholastsmith.wordpress.com/2017/08/16/using-random-forests-and-wordclouds-to-visualize-feature-importance-in-document-classification//

An important part now is the actuate on the knowledge that we have gathered. This is easy and can be done by formulating simple decision rules to identify the customers most at risk. This is done by identifying the important predictors and isolating those for which a large group of predictors point in a similat direction. I would recommend isolating no more than five variables and playing around with them as inputs to the model keeping all other variables stable at median or mean - good question as to which would be the most stable and then identify the liklihood to churn. What you can do is split the features by median in five pieces and then slowsly list the top ten most likley groups to fail. That is all you really need, I think it is a roundabout way of understand, but a useful method none the less.

import pandas as  pd

df = pd.read_csv("data/telco_churn.csv")

df.shape

(7043, 21)

How you treat your categorical variables depend on the cardinality. tenure, MonthlyCharges and TotalCharges are not cateogical variables, hence they show large cardinality. The rest of the variables are indeed categorical, in this problem set, one hot encoding would work fine because of the low cardinality present. In applied machine learning you do not want to exploit leakages, you want to fix them.

df["SeniorCitizen"] = df["SeniorCitizen"].astype("object")

pd.options.display.max_rows = None

df["TotalChargesConv"] = pd.to_numeric(df["TotalCharges"], errors='coerce')

df["TotalChargesConv"].isnull().sum()

TotalCharges is completely missing. From further analysis, I am convinced that they are missed at random and that the fact that they are missing does not contain any value. So I would go ahead and just fill the values with the mean

df.head()

	customerID	gender	Partner	Dependents	tenure	PhoneService	MultipleLines	InternetService	OnlineSecurity	...	TechSupport	StreamingTV	StreamingMovies	Contract	PaperlessBilling	PaymentMethod	MonthlyCharges	TotalCharges	Churn	TotalChargesConv
0	7590-VHVEG	Female	Yes	No	1	No	No phone service	DSL	No	...	No	No	No	Month-to-month	Yes	Electronic check	29.85	29.85	No	29.85
1	5575-GNVDE	Male	No	No	34	Yes	No	DSL	Yes	...	No	No	No	One year	No	Mailed check	56.95	1889.5	No	1889.50
2	3668-QPYBK	Male	No	No	2	Yes	No	DSL	Yes	...	No	No	No	Month-to-month	Yes	Mailed check	53.85	108.15	Yes	108.15
3	7795-CFOCW	Male	No	No	45	No	No phone service	DSL	Yes	...	Yes	No	No	One year	No	Bank transfer (automatic)	42.30	1840.75	No	1840.75
4	9237-HQITU	Female	No	No	2	Yes	No	Fiber optic	No	...	No	No	No	Month-to-month	Yes	Electronic check	70.70	151.65	Yes	151.65

5 rows × 22 columns

Here I would now also seek to see if there are more missing feature present.

def missing_data(data):
    total = data.isnull().sum().sort_values(ascending = False)
    percent = (data.isnull().sum()/data.isnull().count()*100).sort_values(ascending = False)
    return pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])

missing_data(df)

	Total	Percent
TotalChargesConv	11	0.156183
Churn	0	0.000000
gender	0	0.000000
SeniorCitizen	0	0.000000
Partner	0	0.000000
Dependents	0	0.000000
tenure	0	0.000000
PhoneService	0	0.000000
MultipleLines	0	0.000000
InternetService	0	0.000000
OnlineSecurity	0	0.000000
OnlineBackup	0	0.000000
DeviceProtection	0	0.000000
TechSupport	0	0.000000
StreamingTV	0	0.000000
StreamingMovies	0	0.000000
Contract	0	0.000000
PaperlessBilling	0	0.000000
PaymentMethod	0	0.000000
MonthlyCharges	0	0.000000
TotalCharges	0	0.000000
customerID	0	0.000000

df["TotalChargesConv"] = df["TotalChargesConv"].fillna(df["TotalChargesConv"].mean())

del df["TotalCharges"]

import numpy as np
# Keeping 15% for the validation set, thats about 1000 instances to test on 
msk = np.random.rand(len(df)) < 0.15
df["is_test"] = msk

cols_drop = ["customerID","Churn", "is_test"]
id_target = df[cols_drop]

df = pd.get_dummies(df.drop(cols_drop, axis=1))
df = pd.concat((id_target, df),axis=1)

df["Churn"] = df["Churn"].apply(lambda x: 0 if x=="No" else 1)

## Immediately Remove High Correlates
corr_matrix = df.corr().abs()

# Select upper triangle of correlation matrix
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool))

# Find index of feature columns with correlation greater than 0.90
to_drop = [column for column in upper.columns if any(upper[column] > 0.999999)]

to_drop

['gender_Male',
 'SeniorCitizen_1',
 'Partner_Yes',
 'Dependents_Yes',
 'PhoneService_Yes',
 'MultipleLines_No phone service',
 'OnlineSecurity_No internet service',
 'OnlineBackup_No internet service',
 'DeviceProtection_No internet service',
 'TechSupport_No internet service',
 'StreamingTV_No internet service',
 'StreamingMovies_No internet service',
 'PaperlessBilling_Yes']

df = df.drop(to_drop, axis=1)

train = df[df["is_test"]==False]

X_train = df[df["is_test"]==False].drop(cols_drop,axis=1)
y_train = df[df["is_test"]==False]["Churn"]

from sklearn.model_selection import KFold
from sklearn.metrics import roc_auc_score
from lightgbm import LGBMClassifier
import gc
import random

params = {
         'boosting_type': 'gbdt',
          'max_depth': 6,
          'objective': 'binary',
          'n_estimators': 5000, 
          'nthread': 12,
          'num_leaves': 32,
          'learning_rate': 0.02,
          'subsample': 0.85,
          'colsample_bytree': 0.95,
          'reg_alpha': 0.5,
          'reg_lambda': 0.8,
          'min_split_gain': 0.05,
          'min_child_weight': 40,
          'min_child_samples': 5,
          'metric': 'auc'}



def pars(n_estimator_bottom, n_estimator_top):
    return random.sample(range(n_estimator_bottom, n_estimator_bottom+ n_estimator_top), 8) + random.sample(range(n_estimator_bottom, int(n_estimator_bottom + n_estimator_top/np.log(n_estimator_top))), 6) + random.sample(range(n_estimator_bottom, int(n_estimator_bottom+(n_estimator_top/np.log(n_estimator_top))/np.log(n_estimator_top))), 4)

    #return random.sample(range(n_estimator_bottom, n_estimator_bottom+ n_estimator_top), 4) + random.sample(range(n_estimator_bottom, int(n_estimator_bottom + n_estimator_top/np.log(n_estimator_top))), 3) + random.sample(range(n_estimator_bottom, int(n_estimator_bottom+(n_estimator_top/np.log(n_estimator_top))/np.log(n_estimator_top))), 2)



mdl = LGBMClassifier(
    **params
)

gridParams = {
    'learning_rate': [0.02, 0.05],
    'n_estimators': pars(10,15000),
    'num_leaves': pars(20,100),
    'boosting_type' : ['gbdt'],
    'random_state' : [405], # Updated from 'seed'
    'colsample_bytree' : [round(random.uniform(0.7,1),2) for _ in range(18)] ,
    'subsample' : [round(random.uniform(.2,.95),2) for _ in range(18)] ,
    'reg_alpha' : [round(random.uniform(0.1,4),2) for _ in range(18)],
    'reg_lambda' : [round(random.uniform(0.1,4),2) for _ in range(18)],
    }

from sklearn.model_selection import GridSearchCV from sklearn.model_selection import RandomizedSearchCV

Create the grid

LightGBM should be jobs 1

grid = RandomizedSearchCV(mdl, gridParams, cv=3, n_jobs=1, verbose=10) #grid = GridSearchCV(mdl, gridParams, cv=4, n_jobs=1, verbose=10)

Run the grid

grid.fit(X_train, y_train )

Took like a few minutes so worth it.

print(grid.best_params_) print(grid.best_score_)

Print the best parameters found

print(grid.best_params_) print(grid.best_score_)

params[‘subsample’] = grid.best_params_[‘subsample’] params[‘n_estimators’] = grid.best_params_[‘n_estimators’] params[‘colsample_bytree’] = grid.best_params_[‘colsample_bytree’] params[‘learning_rate’] = grid.best_params_[‘learning_rate’] params[‘num_leaves’] = grid.best_params_[‘num_leaves’] params[‘reg_alpha’] = grid.best_params_[‘reg_alpha’] params[‘reg_lambda’] = grid.best_params_[‘reg_lambda’]

params[‘subsample_for_bin’] = grid.best_params_[‘subsample_for_bin’]

print(‘Fitting with params: ‘) print(params)

X_test = df.loc[df.is_test, :].drop(cols_drop, axis=1)
y_test = df.loc[df.is_test, :]["Churn"]

from sklearn.model_selection import KFold, StratifiedKFold
from sklearn.metrics import roc_auc_score

n_splits = 5
cvv = StratifiedKFold(n_splits=n_splits, random_state=42)

oof_preds = np.zeros(X_train.shape[0])

sub = df.loc[df.is_test, :][cols_drop]
sub["Churn_Predict"] = 0
feature_importances = pd.DataFrame()
avg_iter = 0
for i, (fit_idx, val_idx) in enumerate(cvv.split(X_train, y_train)):
    
    X_fit = X_train.iloc[fit_idx]
    y_fit = y_train.iloc[fit_idx]
    X_val = X_train.iloc[val_idx]
    y_val = y_train.iloc[val_idx]
    
    model = LGBMClassifier(
**params
    )

    model.fit(
        X_fit,
        y_fit,
        eval_set=[(X_fit, y_fit), (X_val, y_val)],
        eval_names=('fit', 'val'),
        eval_metric='auc',
        early_stopping_rounds=150,
        verbose=False
    )
    
    print("itteration: ", model.best_iteration_)
    avg_iter += model.best_iteration_
    oof_preds[val_idx] = model.predict_proba(X_val, num_iteration=model.best_iteration_)[:, 1]
    sub['Churn_Predict'] += model.predict_proba(X_test, num_iteration=model.best_iteration_)[:,1]
    
    fi = pd.DataFrame()
    fi["feature"] = X_train.columns
    fi["importance"] = model.feature_importances_
    fi["fold"] = (i+1)
    
    feature_importances = pd.concat([feature_importances, fi], axis=0)
    
    print("Fold {} AUC: {:.8f}".format(i+1, roc_auc_score(y_val, oof_preds[val_idx])))
    
print('Full AUC score %.8f' % roc_auc_score(y_train, oof_preds)) 
print('Out of sample AUC: ', roc_auc_score(y_test, sub['Churn_Predict']/n_splits))
print("avg iteration: ",(avg_iter/n_splits))

itteration:  177
Fold 1 AUC: 0.86939524
itteration:  314
Fold 2 AUC: 0.85728010
itteration:  230
Fold 3 AUC: 0.84829261
itteration:  495
Fold 4 AUC: 0.84521372
itteration:  377
Fold 5 AUC: 0.85197786
Full AUC score 0.85256117
Out of sample AUC:  0.8286286199168862
avg iteration:  318.6

## The splits seems to work well.
## You can take the average iteration above or use the process below.
## You can retrain below on an ensemble of different seeds to produce the final model

## This will work if ou change parameters to get the new iteration score. 

from lightgbm import cv
from lightgbm import Dataset


def get_score(X, y, usecols, params,seeder,depther):  
     dtrain = Dataset(X[usecols], y)
     params["max_depth"] = depther
     ## These are forced for variable importance
     params["feature_fraction"] = 0.9
     params["bagging_fraction"] = 0.9

     eval =  cv(params,
             dtrain,
             nfold=5,
             stratified=True,
             num_boost_round=20000,
             early_stopping_rounds=160, ## After it stopped how long should go on. 
             verbose_eval=20,
             #verbose_eval=-1,
             seed = seeder,
             show_stdv=True)
     return max(eval['auc-mean']), eval["auc-mean"].index(max(eval["auc-mean"]))
  
seed_dict = {}
for seeder, depther in zip([5, 1, 9 ,12, 20], [6,6,6,6,6]):  ## Here is where you add parameter adaptions
    score, seed_best = get_score(X_train,y_train ,list(X_train.columns), params, seeder,depther)
    seed_dict[seeder] = seed_best

## Lets see if a full dataset prediction validation outperforms.

[20]	cv_agg's auc: 0.843784 + 0.00821279
[40]	cv_agg's auc: 0.846575 + 0.00947819
[60]	cv_agg's auc: 0.848162 + 0.00912504
[80]	cv_agg's auc: 0.849335 + 0.00902077
[100]	cv_agg's auc: 0.850259 + 0.00888681
[120]	cv_agg's auc: 0.851368 + 0.008769
[140]	cv_agg's auc: 0.852178 + 0.00865427
[160]	cv_agg's auc: 0.852431 + 0.00848813
[180]	cv_agg's auc: 0.852657 + 0.00825845
[200]	cv_agg's auc: 0.852751 + 0.00810582
[220]	cv_agg's auc: 0.853085 + 0.00787604
[240]	cv_agg's auc: 0.853163 + 0.00771073
[260]	cv_agg's auc: 0.853307 + 0.00762619
[280]	cv_agg's auc: 0.853409 + 0.00761826
[300]	cv_agg's auc: 0.853207 + 0.00768979
[320]	cv_agg's auc: 0.853189 + 0.00778502
[340]	cv_agg's auc: 0.853249 + 0.00784857
[360]	cv_agg's auc: 0.853228 + 0.00789094
[380]	cv_agg's auc: 0.853026 + 0.00807676
[400]	cv_agg's auc: 0.852888 + 0.00812161
[420]	cv_agg's auc: 0.852822 + 0.0081956
[20]	cv_agg's auc: 0.844505 + 0.0153633
[40]	cv_agg's auc: 0.846352 + 0.0156829
[60]	cv_agg's auc: 0.847766 + 0.0153002
[80]	cv_agg's auc: 0.848652 + 0.0151338
[100]	cv_agg's auc: 0.849505 + 0.0150432
[120]	cv_agg's auc: 0.850304 + 0.0151445
[140]	cv_agg's auc: 0.850746 + 0.0153762
[160]	cv_agg's auc: 0.851053 + 0.0153237
[180]	cv_agg's auc: 0.851171 + 0.0152782
[200]	cv_agg's auc: 0.851375 + 0.015367
[220]	cv_agg's auc: 0.851382 + 0.0151171
[240]	cv_agg's auc: 0.851239 + 0.0151573
[260]	cv_agg's auc: 0.851301 + 0.0152773
[280]	cv_agg's auc: 0.851341 + 0.0151455
[300]	cv_agg's auc: 0.851295 + 0.0150807
[320]	cv_agg's auc: 0.851225 + 0.0149669
[340]	cv_agg's auc: 0.851117 + 0.0148903
[360]	cv_agg's auc: 0.851039 + 0.0148297
[20]	cv_agg's auc: 0.846313 + 0.0114744
[40]	cv_agg's auc: 0.847691 + 0.011438
[60]	cv_agg's auc: 0.848695 + 0.0115246
[80]	cv_agg's auc: 0.849907 + 0.0117519
[100]	cv_agg's auc: 0.850889 + 0.0117958
[120]	cv_agg's auc: 0.85157 + 0.0117482
[140]	cv_agg's auc: 0.852051 + 0.011485
[160]	cv_agg's auc: 0.85236 + 0.0112426
[180]	cv_agg's auc: 0.852455 + 0.011207
[200]	cv_agg's auc: 0.852387 + 0.0112711
[220]	cv_agg's auc: 0.852347 + 0.0113187
[240]	cv_agg's auc: 0.852387 + 0.0113198
[260]	cv_agg's auc: 0.85245 + 0.0111506
[280]	cv_agg's auc: 0.852548 + 0.011055
[300]	cv_agg's auc: 0.852749 + 0.0110681
[320]	cv_agg's auc: 0.852543 + 0.0111498
[340]	cv_agg's auc: 0.852551 + 0.0112671
[360]	cv_agg's auc: 0.852435 + 0.0113013
[380]	cv_agg's auc: 0.852323 + 0.0113198
[400]	cv_agg's auc: 0.852281 + 0.0114486
[420]	cv_agg's auc: 0.852147 + 0.0115187
[440]	cv_agg's auc: 0.852138 + 0.0114774
[20]	cv_agg's auc: 0.844818 + 0.00752731
[40]	cv_agg's auc: 0.8469 + 0.00812337
[60]	cv_agg's auc: 0.848418 + 0.00853306
[80]	cv_agg's auc: 0.849986 + 0.00962428
[100]	cv_agg's auc: 0.850866 + 0.00988992
[120]	cv_agg's auc: 0.851671 + 0.00982287
[140]	cv_agg's auc: 0.852286 + 0.00978718
[160]	cv_agg's auc: 0.85262 + 0.00990508
[180]	cv_agg's auc: 0.852703 + 0.010033
[200]	cv_agg's auc: 0.852879 + 0.0100981
[220]	cv_agg's auc: 0.852956 + 0.0100493
[240]	cv_agg's auc: 0.853011 + 0.00976151
[260]	cv_agg's auc: 0.852986 + 0.00949627
[280]	cv_agg's auc: 0.852963 + 0.0095649
[300]	cv_agg's auc: 0.852869 + 0.00948038
[320]	cv_agg's auc: 0.852862 + 0.00946797
[340]	cv_agg's auc: 0.852779 + 0.00925175
[360]	cv_agg's auc: 0.852641 + 0.00918829
[380]	cv_agg's auc: 0.852628 + 0.00909059
[20]	cv_agg's auc: 0.846441 + 0.0161064
[40]	cv_agg's auc: 0.847855 + 0.0163468
[60]	cv_agg's auc: 0.849537 + 0.0167314
[80]	cv_agg's auc: 0.850486 + 0.0159733
[100]	cv_agg's auc: 0.851169 + 0.0152941
[120]	cv_agg's auc: 0.852089 + 0.0150693
[140]	cv_agg's auc: 0.852565 + 0.0147605
[160]	cv_agg's auc: 0.852684 + 0.0145832
[180]	cv_agg's auc: 0.852811 + 0.0142475
[200]	cv_agg's auc: 0.852997 + 0.0139145
[220]	cv_agg's auc: 0.853024 + 0.0136548
[240]	cv_agg's auc: 0.853132 + 0.0135377
[260]	cv_agg's auc: 0.853081 + 0.0133628
[280]	cv_agg's auc: 0.852963 + 0.0130755
[300]	cv_agg's auc: 0.852855 + 0.012968
[320]	cv_agg's auc: 0.852727 + 0.0127505
[340]	cv_agg's auc: 0.852658 + 0.0125208
[360]	cv_agg's auc: 0.852502 + 0.0123709
[380]	cv_agg's auc: 0.852345 + 0.0121535
[400]	cv_agg's auc: 0.852298 + 0.012038
[420]	cv_agg's auc: 0.852294 + 0.0118129

## This process gives you 15% extra data, but you have no validation set. 
# This can of course be improved by saving the best iterations, which is what I would do
dip  = {}
dip[1] = [6, 5]
dip[2] = [6, 1]
dip[3] = [6, 9]
dip[4] = [6, 12]

dip[5] = [6, 20]

n_splits = 5
cvv = StratifiedKFold(n_splits=n_splits, random_state=42)

sub = df.loc[df.is_test, :][cols_drop]
sub["Churn_Predict"] = 0

feature_importances = pd.DataFrame()

ba= 0 
for i, r in enumerate(range(n_splits)):
    ba = ba + 1
    
    params["num_boost_round"] = seed_dict[dip[ba][1]]  
    params["seed"] = dip[ba][1]
    params["max_depth"] = dip[ba][0] # if you want to customise uncomment
    
    model = LGBMClassifier(
**params
    )

    model.fit(
        X_train,
        y_train,
        eval_metric='auc',
        verbose=False
    )
    
    preds = model.predict_proba(X_test, num_iteration=seed_dict[dip[ba][1]] )[:,1]
    sub['Churn_Predict'] += preds
    
    fi = pd.DataFrame()
    fi["feature"] = X_train.columns
    fi["importance"] = model.feature_importances_
    fi["fold"] = (i+1)
    
    feature_importances = pd.concat([feature_importances, fi], axis=0)
    print("Fold {} AUC: {:.8f}".format(i+1, roc_auc_score(y_test, preds)))
  

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/lightgbm/engine.py:99: UserWarning: Found `num_boost_round` in params. Will use it instead of argument
  warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))

Fold 1 AUC: 0.82761689

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/lightgbm/engine.py:99: UserWarning: Found `num_boost_round` in params. Will use it instead of argument
  warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))

Fold 2 AUC: 0.82700115

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/lightgbm/engine.py:99: UserWarning: Found `num_boost_round` in params. Will use it instead of argument
  warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))

Fold 3 AUC: 0.82692065

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/lightgbm/engine.py:99: UserWarning: Found `num_boost_round` in params. Will use it instead of argument
  warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))

Fold 4 AUC: 0.82756032

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/lightgbm/engine.py:99: UserWarning: Found `num_boost_round` in params. Will use it instead of argument
  warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))

Fold 5 AUC: 0.82757338

print('Out of sample AUC: ', roc_auc_score(y_test, sub['Churn_Predict']/n_splits))

Out of sample AUC:  0.8246883227083832

feat_imp = feature_importances.groupby("feature").mean().sort_values("importance", ascending=False)

feat_imp.head(10)

# if is_test shows significance it would point to leakage issue 

	importance	fold
feature
MonthlyCharges	694.6	3.0
TotalChargesConv	593.6	3.0
tenure	571.4	3.0
MultipleLines_No	130.0	3.0
PaymentMethod_Electronic check	123.6	3.0
OnlineSecurity_No	118.4	3.0
PaperlessBilling_No	113.6	3.0
Contract_Month-to-month	112.6	3.0
Contract_One year	108.0	3.0
TechSupport_No	85.8	3.0

## Often Times the cross-validation splits can just be completely ignored.

#180

import numpy as np
l = [x for x in seed_dict.values()]
best_iterations = np.mean(l)

best_iterations

252.8

### best_iterations  394.6 for inst
### 252 -  litle subsample

import lightgbm as lgb
d_train = lgb.Dataset(X_train, label=y_train)
d_test = lgb.Dataset(X_test, label=y_test)
# 1.1 for no leaks, 0.7 used because best model feature importance 
# this slightly outperforms enemble but it leaks
params["num_boost_round"] = int(best_iterations*1.1) 
params["num_boost_round"] = int(best_iterations*1.1)


## Here you have to force feature fractions for predictor analysis

import shap

params["num_boost_round"] = int(best_iterations*1.1)
splits = 5 
f = -1
for seeds in [15,1,6,7,2]:
    f = f +1
    print(seeds)
    params["feature_fraction_seed"] = seeds
    
    params["random_seed"] = seeds + 1
    
    model = lgb.train(params, d_train, verbose_eval=1000)

    shap_values = shap.TreeExplainer(model).shap_values(df.drop(cols_drop, axis=1))

    shap_fram = pd.DataFrame(shap_values[:,:-1], columns=list(X_train.columns))

    shap_new = shap_fram.sum().sort_values().to_frame()

    shap_new.columns = ["SHAP"]

    shap_new["Largest_Effect"] = shap_new["SHAP"].apply(lambda x: "-" if x<0else "+")

    shap_new["SHAP_abs"] = shap_new["SHAP"].abs()

    direction = df.corr()["Churn"].to_frame()
    direction.columns = ["corr"]
    shap_new = pd.merge(shap_new, direction, left_index=True, right_index=True, how="left")

    shap_new["direction"] = shap_new["corr"].apply(lambda x: "-" if x<0else "+")
    shap_new["abs_corr"] = shap_new["corr"].abs()
    
    if f==0:
        shap_fin = shap_new
    else:
        shap_fin = shap_fin + shap_new

15


/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/lightgbm/engine.py:99: UserWarning: Found `num_boost_round` in params. Will use it instead of argument
  warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))


1
6
7
2

from lightgbm import plotting

plotting.create_tree_digraph(model, tree_index=1)

svg

plotting.plot_tree(model)

<matplotlib.axes._subplots.AxesSubplot at 0x1a410be278>

png

model.plot_metric()

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-162-48ad7f4179a8> in <module>()
----> 1 model.plot_metric()

AttributeError: 'Booster' object has no attribute 'plot_metric'

shap_fin.loc[:,["SHAP","SHAP_abs","corr","abs_corr"]] = shap_fin.loc[:,["SHAP","SHAP_abs","corr","abs_corr"]]/5

shap_fin["SHAP_corr_corrrected"] = 0

shap_fin.head()

	SHAP	Largest_Effect	SHAP_abs	corr	direction	abs_corr
Contract_Month-to-month	906.785271	+++++	906.785271	0.405103	+++++	0.405103
Contract_One year	-130.124222	-----	130.124222	-0.177820	-----	0.177820
Contract_Two year	110.094202	+++++	110.094202	-0.302253	-----	0.302253
Dependents_No	-5.010731	-----	5.010731	0.164221	+++++	0.164221
DeviceProtection_No	-0.006343	--+++	0.732087	0.252481	+++++	0.252481

b = "Contract_Month-to-month"
das  = df.corr()[df.corr().index.isin(list(shap_fin.index))].abs()

shap_fin.head()

	SHAP	Largest_Effect	SHAP_abs	corr	direction	abs_corr
Contract_Month-to-month	906.785271	+++++	906.785271	0.405103	+++++	0.405103
Contract_One year	-130.124222	-----	130.124222	-0.177820	-----	0.177820
Contract_Two year	110.094202	+++++	110.094202	-0.302253	-----	0.302253
Dependents_No	-5.010731	-----	5.010731	0.164221	+++++	0.164221
DeviceProtection_No	-0.006343	--+++	0.732087	0.252481	+++++	0.252481

for b in das.iloc[:,2:].columns:
    cora = df.corr()[df.corr().index.isin(list(shap_fin.index))][b].abs()
    for c, v in zip(cora.index, cora.values):
        shap_fin["SHAP_corr_corrrected"].loc[c] = shap_fin["SHAP_corr_corrrected"].loc[c] + (shap_fin["SHAP_abs"].loc[b]/shap_fin["SHAP_abs"].max())*v

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/pandas/core/indexing.py:194: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)

shap_fin["SHAP_corr"] = shap_fin["SHAP_corr_corrrected"].copy()

shap_fin["SHAP_corr"] = shap_fin["SHAP_corr"]/(shap_fin["SHAP_corr"].max()/shap_fin["SHAP_abs"].max())

shap_fin["SHAP_corr_corrrected"] = shap_fin["SHAP_corr_corrrected"]/(shap_fin["SHAP_corr_corrrected"].max()/shap_fin["SHAP_abs"].max())

shap_fin["SHAP_corr_corrrected"] = shap_fin["SHAP_corr_corrrected"] + shap_fin["SHAP_abs"]

shap_fin["SHAP_corr_corrrected"] = shap_fin["SHAP_corr_corrrected"]/(shap_fin["SHAP_corr_corrrected"].max()/shap_fin["SHAP_abs"].max())

shap_fin.sort_values('SHAP_corr', ascending=False).head(10)

	SHAP	Largest_Effect	SHAP_abs	corr	direction	abs_corr	SHAP_corr_corrrected	SHAP_corr
Contract_Month-to-month	906.785271	+++++	906.785271	0.405103	+++++	0.405103	906.785271	906.785271
tenure	-308.250063	-----	308.250063	-0.352229	-----	0.352229	516.718340	725.186618
Contract_Two year	110.094202	+++++	110.094202	-0.302253	-----	0.302253	386.615172	663.136143
TotalChargesConv	-58.927057	-----	58.927057	-0.199428	-----	0.199428	311.159987	563.392916
TechSupport_No	-170.060887	-----	170.060887	0.337281	+++++	0.337281	349.015912	527.970937
Contract_One year	-130.124222	-----	130.124222	-0.177820	-----	0.177820	316.545191	502.966159
OnlineSecurity_No	-68.364262	-----	68.364262	0.342637	+++++	0.342637	270.712949	473.061636
DeviceProtection_No	-0.006343	--+++	0.732087	0.252481	+++++	0.252481	215.028381	429.324675
PaymentMethod_Electronic check	-88.616541	-----	88.616541	0.301919	+++++	0.301919	244.125871	399.635200
OnlineBackup_No	-14.979056	-----	14.979056	0.268005	+++++	0.268005	204.751625	394.524194

dans = pd.read_csv("amalgamated.csv")

dans.sort_values('SHAP_corr', ascending=False).head(10)

	Unnamed: 0	SHAP	Largest_Effect	SHAP_abs	corr	direction	abs_corr	SHAP_corr_corrrected	SHAP_corr
0	Contract_Month-to-month	89.722288	+++++	89.722288	0.405103	+++++	0.405103	89.722288	89.722288
1	tenure	-32.415159	-----	32.415159	-0.352229	-----	0.352229	55.615119	78.815078
2	Contract_Two year	56.824406	+++++	56.824406	-0.302253	-----	0.302253	67.805145	78.785884
3	TotalChargesConv	-2.957160	-+--+	11.577356	-0.199428	-----	0.199428	41.444353	71.311350
4	TechSupport_No	-3.145848	-+++-	10.992171	0.337281	+++++	0.337281	35.971348	60.950524
5	OnlineSecurity_No	6.738696	++++-	12.082303	0.342637	+++++	0.342637	35.709987	59.337670
6	InternetService_No	4.387260	++-+-	8.929218	-0.227890	-----	0.227890	32.621494	56.313771
7	InternetService_Fiber optic	8.872516	++-+-	13.110938	0.308020	+++++	0.308020	32.963081	52.815223
8	DeviceProtection_No	-3.105107	-+---	4.129762	0.252481	+++++	0.252481	27.780575	51.431389
9	OnlineBackup_No	-14.044804	----+	15.399742	0.268005	+++++	0.268005	33.136412	50.873081

### There is a way to prove that it works - using all. 
#shap_fin.sort_values('SHAP_corr', ascending=False).to_csv("amalgamated.csv")

"SHAP values are sensitive to high correlations among different features. When features are correlated, their impact on the model score can be split among them in an infinite number of ways. This means that the SHAP values will be lower than if all but one of the correlated feature(s) had been removed from the model. "

My recommendation based on the SHAP_corr (adjusted shapely correlation values)

Only target customers who don’t have internet or are less likley to get internet
Those who have paid more in the past are locked in by sunk cost and stay on for longer
Potentially lower the amount to get more clients
The longer a client has stayed on, the longer they are expected to stay on, the lindy effect, therefore get clients to stay on for longer, even if you trade at a loss at first.
Sign customers on for yearly and not month to month agreements
If you do target internet enabled customers, focus on those who do not have fiber.
Online Security and Tech support are markers for ineptness and clients who require this would less likley churn.
If they stream movies and tv they are more likley to churn, so these are two important questions to ask, I believe these proxy for competence and age i.e. liklihood to churn.

Here we try out the global feature importance calcuations that come with XGBoost. Note that they all contradict each other, which motivates the use of SHAP values since they come with consistency gaurentees (meaning they will order the features correctly).

shap_fin.sort_values("SHAP_corr_corrrected",ascending=False).head(20)

	SHAP	Largest_Effect	SHAP_abs	corr	direction	abs_corr	SHAP_corr_corrrected	SHAP_corr
Contract_Month-to-month	89.722288	+++++	89.722288	0.405103	+++++	0.405103	89.722288	77.623205
Contract_Two year	56.824406	+++++	56.824406	-0.302253	-----	0.302253	66.017732	66.308543
tenure	-32.415159	-----	32.415159	-0.352229	-----	0.352229	60.448009	80.329423
MonthlyCharges	-23.233877	---+-	23.831915	0.193356	+++++	0.193356	59.466914	87.082779
TotalChargesConv	-2.957160	-+--+	11.577356	-0.199428	-----	0.199428	53.569346	88.337491
InternetService_No	4.387260	++-+-	8.929218	-0.227890	-----	0.227890	52.892006	89.722288
InternetService_Fiber optic	8.872516	++-+-	13.110938	0.308020	+++++	0.308020	46.719933	74.028728
OnlineSecurity_No	6.738696	++++-	12.082303	0.342637	+++++	0.342637	44.640472	71.178857
TechSupport_No	-3.145848	-+++-	10.992171	0.337281	+++++	0.337281	43.851507	70.797451
OnlineBackup_No	-14.044804	----+	15.399742	0.268005	+++++	0.268005	40.939261	60.958105
StreamingTV_Yes	-5.488447	--+--	8.484568	0.063228	+++++	0.063228	39.678901	65.522520
PaymentMethod_Electronic check	-14.921700	-----	14.921700	0.301919	+++++	0.301919	39.614285	58.964869
StreamingMovies_Yes	-2.688222	+----	4.784227	0.061382	+++++	0.061382	38.097282	66.272904
DeviceProtection_No	-3.105107	-+---	4.129762	0.252481	+++++	0.252481	36.274619	63.527831
DeviceProtection_Yes	0.877864	++-+-	3.441347	-0.066160	-----	0.066160	35.175298	62.165847
TechSupport_Yes	-3.379816	--+-+	7.888294	-0.164674	-----	0.164674	34.985053	57.364064
MultipleLines_Yes	-6.962591	-----	6.962591	0.040102	+++++	0.040102	34.215635	56.854687
OnlineSecurity_Yes	3.851610	-++++	7.781800	-0.171226	-----	0.171226	33.883269	55.415566
OnlineBackup_Yes	-0.167503	++++-	4.217049	-0.082255	-----	0.082255	33.378034	58.037978
StreamingMovies_No	4.124048	+++++	4.124048	0.130845	+++++	0.130845	32.164288	55.867162

## I would argue that SHAP_corr are more causal in manner.
## Other shit is correlated with SHAP_corr, so if it falls away not too much issues.
shap_fin.sort_values("SHAP_corr",ascending=False).head(20)

## New Number I want to TRY
shap_fin = shap_fin.sort_values("SHAP_abs",ascending=False)

shap_new.to_csv("imp1.csv")

preds = model.predict(X_test, num_iteration=model.best_iteration)

print('Out of sample AUC: ', roc_auc_score(y_test, preds))

# To be truthfull the prediction excerse is not that interesting, what is more interesting
# fitting the data to the response, i.e. training the model and looking at the feature
# interactions it proposes. Feature importance is only interesting to the extent it is
# different to correlation plot, in that regard, higher dimensional interactions become
# important

## To get direction use general df
## df.corr()["Churn"]
shap.summary_plot(shap_values, df.drop(cols_drop, axis=1))

png

shap.TreeExplainer(xgb).shap_interaction_values(df.drop(cols_drop, axis=1))

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-147-092cf5c20b25> in <module>()
      2 
      3 
----> 4 shap.TreeExplainer(xgb).shap_interaction_values(df.drop(cols_drop, axis=1))


NameError: name 'xgb' is not defined

shap_fram = pd.DataFrame(shap_values[:,:-1], columns=list(X_train.columns))

shap_new = shap_fram.sum().sort_values().to_frame()

shap_new.columns = ["SHAP"]

shap_new["Largest_Effect"] = shap_new["SHAP"].apply(lambda x: "-" if x<0else "+")

shap_new["SHAP_abs"] = shap_new["SHAP"].abs()

df.drop(cols_drop, axis=1).shape

(7043, 33)

print('Out of sample AUC: ', roc_auc_score(y_test, preds))

Out of sample AUC:  0.8275733774286896

## Here at this point, you migth want to investigate individual mistakes from the validation
## set
## You can simply focus on train on validation instead of train on test.
## However I doubt any leakage would occur here as there is not much to be gained.

sub["Churn_Predict"] = sub['Churn_Predict']/n_splits
sub["Churn"] = y_test

sub["Error"] = sub["Churn"] - sub["Churn_Predict"]

## Worse false positive
sub.sort_values("Error").head(2)

	customerID	Churn	is_test	Churn_Predict	Error
5140	7577-SWIFR	0	True	0.886304	-0.886304
5307	1941-HOSAM	0	True	0.880972	-0.880972

[5140, 5307, 3735, 532, 1550, 3969, 5071, 2493, 85, 1724, 3582, 684, 727, 2453, 6362, 5839, 1022, 4173, 3822, 5144, 5846, 4150, 1482, 5755, 1446, 3337, 2998, 4124, 1719, 2530, 1297, 2173, 4094, 1282, 4648, 4473, 5658, 7029, 1829, 2364, 6345, 5716, 5187, 1628, 2398, 702, 5907, 3469, 4478, 2652, 6265, 5993, 4641, 376, 2046, 5707, 4335, 491, 5189, 2866, 2966, 4016, 1079, 4940, 3048, 2236, 5748, 6669, 390, 5631, 3554, 3221, 2171, 5119, 5699, 5188, 3545, 1160, 4986, 5514, 3791, 2343, 3481, 50, 4514, 2385, 5298, 2065, 2107, 6109, 3798, 1833, 4338, 5656, 5604, 4869, 3492, 6668, 1811, 1972]

## All worse false positives:
## Contract Month to Months seems to be the culprit, it would be good idea to look at its distribution 
## before and after the 

sub
shap.force_plot(shap_values[list(sub.sort_values("Error").head(100).index),:], df.drop(cols_drop, axis=1).iloc[list(sub.sort_values("Error").head(100).index),:], link="logit")

Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security.

## The distributions are not different, so this is not a concern

X_train["Contract_Month-to-month"].sum()/len(X_train)

0.550175908862456

X_test["Contract_Month-to-month"].sum()/len(X_test)

0.5502793296089385

## Different model giving different amounts, run initial model for shap values.
shap.initjs()

shap.force_plot(shap_values[5354,:], df.drop(cols_drop, axis=1).iloc[5354,:], link="logit")

shap.force_plot(shap_values[345,:], df.drop(cols_drop, axis=1).iloc[345,:], link="logit")

sub["Error_Abs"] = sub["Error"].abs()

## Best True Negative
sub[sub["Churn"]==0].sort_values("Error_Abs").head(2)

shap.force_plot(shap_values[5804,:], df.drop(cols_drop, axis=1).iloc[5804,:], link="logit")

shap_ordered = shap_new.sort_values("SHAP_abs",ascending=False)

# Remember this direction says the average effect on the overall outcome.
# The direction is generally not too informative as it is not the same
# As the direction of the variable relationship with response, for that
# you would have to look at the chart. 
shap_ordered[shap_ordered["Direction"]=="+"].head()

## All this says is that low tenure is more informative than high tenure. 
shap_ordered[shap_ordered["Direction"]=="-"].head()

df["CurrentRateTotal"] = df["tenure"] * df["TotalChargesConv"]
df["RateDiff"] = df["CurrentRateTotal"] - df["TotalChargesConv"] 

df[df.index.isin([4484, 5804])]

## Worse false negative
sub.sort_values("Error").tail(2)

shap.force_plot(shap_values[4272,:], df.drop(cols_drop, axis=1).iloc[4272,:], link="logit")

## Best True Positive
sub[sub["Churn"]==1].sort_values("Error_Abs").head(2)

sub_full = pd.merge(sub,df.drop(cols_drop, axis=1),left_index=True, right_index=True, how="left" )

sub_full.shape

sub_full.head()

Arbitary Decision Rules:

Churn

Tenure < 20
Contract_Month-to-month ==1
Monthly_Charges > 50

No Churn

Tenure > 20
Contract_Month-to-month ==0 {Implies Yearly or Two-yearly contracts}
Monthly_Charges < 50

The predicted and actual outcomes probabilities are very similar. The prediction model seems to be working well.

# Churn - Predict
sub_full[(sub_full["tenure"]<20) &(sub_full["Contract_Month-to-month"]==1)&(sub_full["MonthlyCharges"]>50) ]["Churn_Predict"].mean()

0.5948143191306948

# Churn - Predict
sub_full[(sub_full["tenure"]<20) &(sub_full["Contract_Month-to-month"]==1)&(sub_full["MonthlyCharges"]>50) ]["Churn"].mean()

0.6164383561643836

# No Chrun - Predict
sub_full[(sub_full["tenure"]>20) &(sub_full["Contract_Month-to-month"]==0)&(sub_full["MonthlyCharges"]<50) ]["Churn_Predict"].mean()

0.03017844932166786

# No Churn - Actual
sub_full[(sub_full["tenure"]>20) &(sub_full["Contract_Month-to-month"]==0)&(sub_full["MonthlyCharges"]<50) ]["Churn"].mean()

0.013986013986013986

df.head()

	customerID	Churn	is_test	tenure	MonthlyCharges	TotalChargesConv	gender_Female	gender_Male	SeniorCitizen_0	...	Contract_One year	PaperlessBilling_No	PaperlessBilling_Yes	PaymentMethod_Bank transfer (automatic)	PaymentMethod_Electronic check	PaymentMethod_Mailed check	CurrentRateTotal	RateDiff
0	7590-VHVEG	0	False	1	29.85	29.85	1	0	1	...	0	0	1	0	1	0	29.85	0.00
1	5575-GNVDE	0	False	34	56.95	1889.50	0	1	1	...	1	1	0	0	0	1	64243.00	62353.50
2	3668-QPYBK	1	False	2	53.85	108.15	0	1	1	...	0	0	1	0	0	1	216.30	108.15
3	7795-CFOCW	0	True	45	42.30	1840.75	0	1	1	...	1	1	0	1	0	0	82833.75	80993.00
4	9237-HQITU	1	False	2	70.70	151.65	1	0	1	...	0	0	1	0	1	0	303.30	151.65

5 rows × 51 columns

## New Features - Lets See Effect

X_train = df[df["is_test"]==False].drop(cols_drop,axis=1)
y_train = df[df["is_test"]==False]["Churn"]

X_test = df.loc[df.is_test, :].drop(cols_drop, axis=1)
y_test = df.loc[df.is_test, :]["Churn"]

## This process gives you 15% extra data, but you have no validation set. 
# This can of course be improved by saving the best iterations, which is what I would do
dip  = {}
dip[1] = [6, 5]
dip[2] = [6, 1]
dip[3] = [6, 9]
dip[4] = [6, 12]
dip[5] = [6, 20]

n_splits = 5
cvv = StratifiedKFold(n_splits=n_splits, random_state=42)

sub = df.loc[df.is_test, :][cols_drop]
sub["Churn_Predict"] = 0

feature_importances = pd.DataFrame()

ba= 0 
for i, r in enumerate(range(n_splits)):
    ba = ba + 1
    
    params["num_boost_round"] = seed_dict[dip[ba][1]]  
    params["seed"] = dip[ba][1]
    params["max_depth"] = dip[ba][0] # if you want to customise uncomment
    
    model = LGBMClassifier(
**params
    )

    model.fit(
        X_train,
        y_train,
        eval_metric='auc',
        verbose=False
    )
    
    preds = model.predict_proba(X_test, num_iteration=seed_dict[dip[ba][1]] )[:,1]
    sub['Churn_Predict'] += preds
    
    fi = pd.DataFrame()
    fi["feature"] = X_train.columns
    fi["importance"] = model.feature_importances_
    fi["fold"] = (i+1)
    
    feature_importances = pd.concat([feature_importances, fi], axis=0)
    print("Fold {} AUC: {:.8f}".format(i+1, roc_auc_score(y_test, preds)))
  
print('Out of sample AUC: ', roc_auc_score(y_test, sub['Churn_Predict']/n_splits))

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/lightgbm/engine.py:99: UserWarning: Found `num_boost_round` in params. Will use it instead of argument
  warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))

Fold 1 AUC: 0.86425604

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/lightgbm/engine.py:99: UserWarning: Found `num_boost_round` in params. Will use it instead of argument
  warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))

Fold 2 AUC: 0.86441416

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/lightgbm/engine.py:99: UserWarning: Found `num_boost_round` in params. Will use it instead of argument
  warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))

Fold 3 AUC: 0.86525830

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/lightgbm/engine.py:99: UserWarning: Found `num_boost_round` in params. Will use it instead of argument
  warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))

Fold 4 AUC: 0.86397384

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/lightgbm/engine.py:99: UserWarning: Found `num_boost_round` in params. Will use it instead of argument
  warnings.warn("Found `{}` in params. Will use it instead of argument".format(alias))

Fold 5 AUC: 0.86469149
Out of sample AUC:  0.8647741979429193

## It has led to worse performance 
Out of sample AUC:  0.8648277170687089

#Start here for XGBoost two interactions:
import xgboost

d_train = xgboost.DMatrix(X_train, label=y_train)
d_test = xgboost.DMatrix(X_test, label=y_test)
    
params = {
    "eta": 0.01,
    "objective": "binary:logistic",
    "colsample_bytree":0.2,
    "colsample_bylevel":0.2,
    "subsample": 0.5,
    "base_score": np.mean(y_train),
    "eval_metric": "logloss"
}
xgb = xgboost.train(params, d_train, 15000, evals = [(d_test, "test")], verbose_eval=100, early_stopping_rounds=200)

[0]	test-logloss:0.583503
Will train until test-logloss hasn't improved in 200 rounds.
[100]	test-logloss:0.511794
[200]	test-logloss:0.473851
[300]	test-logloss:0.451492
[400]	test-logloss:0.438492
[500]	test-logloss:0.430542
[600]	test-logloss:0.424635
[700]	test-logloss:0.420811
[800]	test-logloss:0.418328
[900]	test-logloss:0.415762
[1000]	test-logloss:0.414202
[1100]	test-logloss:0.413115
[1200]	test-logloss:0.412331
[1300]	test-logloss:0.411409
[1400]	test-logloss:0.410732
[1500]	test-logloss:0.410264
[1600]	test-logloss:0.409837
[1700]	test-logloss:0.409436
[1800]	test-logloss:0.408982
[1900]	test-logloss:0.408614
[2000]	test-logloss:0.408551
[2100]	test-logloss:0.408201
[2200]	test-logloss:0.408031
[2300]	test-logloss:0.40796
[2400]	test-logloss:0.407775
[2500]	test-logloss:0.407447
[2600]	test-logloss:0.407408
[2700]	test-logloss:0.407488
Stopping. Best iteration:
[2576]	test-logloss:0.407333

import _pickle as pickle
#pickle.dump(shap_inter, open("shap_inter.p", "wb"))  # save it into a file 
shap_inter = pickle.load(open("shap_inter.p", "rb")) # open pickle

import shap shap_inter = shap.TreeExplainer(xgb).shap_interaction_values(df.drop(cols_drop, axis=1))

shap_inter[1,1,:]

array([-0.00034301, -0.11787798,  0.00363676,  0.00428383,  0.00157433,
       -0.00140101, -0.0018202 ,  0.000298  ,  0.0005892 ,  0.0013317 ,
        0.00247201, -0.00043364, -0.00027109,  0.00078189,  0.00436718,
       -0.00215778,  0.00173602,  0.00467855,  0.00454791, -0.0008688 ,
       -0.00289185,  0.00525794,  0.00300427,  0.00142787, -0.00099375,
        0.00400078,  0.0022267 , -0.00055282,  0.00487384,  0.00060737,
        0.00056997, -0.00150263,  0.00884111,  0.        ], dtype=float32)

import shap

shap.summary_plot(shap_inter, df.drop(cols_drop, axis=1))

png

## This is a very important interaction that can't be over-emphesised.

Here you can see something very interesting, only if you sign or regularly sign two-year contracts does your tenure matter. Thus if you have already signed two year contracts, you are more likley to sign contracts in the future. On the other hand if you have been tenured for very long withu

df.head()

	customerID	Churn	is_test	tenure	MonthlyCharges	TotalChargesConv	gender_Female	SeniorCitizen_0	Partner_No	Dependents_No	...	StreamingMovies_No	Contract_Month-to-month	Contract_One year	PaperlessBilling_No	PaymentMethod_Bank transfer (automatic)	PaymentMethod_Electronic check	PaymentMethod_Mailed check
0	7590-VHVEG	0	False	1	29.85	29.85	1	1	0	1	...	1	1	0	0	0	1	0
1	5575-GNVDE	0	False	34	56.95	1889.50	0	1	1	1	...	1	0	1	1	0	0	1
2	3668-QPYBK	1	False	2	53.85	108.15	0	1	1	1	...	1	1	0	0	0	0	1
3	7795-CFOCW	0	False	45	42.30	1840.75	0	1	1	1	...	1	0	1	1	1	0	0
4	9237-HQITU	1	False	2	70.70	151.65	1	1	1	1	...	1	1	0	0	0	1	0

5 rows × 36 columns

Females are less willing to change providers if they have been with the company for a long time, males are more likley to churn and do not show as much loyalty as females.

shap.dependence_plot(
    ("tenure", "gender_Female"),
    shap_inter, df.drop(cols_drop, axis=1),
    display_features=df.drop(cols_drop, axis=1)
)

png

Young people are more likley to churn if they have been with the company for a long time as opposed to old people

shap.dependence_plot(
    ("tenure", "SeniorCitizen_0"),
    shap_inter, df.drop(cols_drop, axis=1),
    display_features=df.drop(cols_drop, axis=1)
)

png

Customers are a lot less likley to churn if they have spent exhorborant amounts on fees and have been with the company for a long time. It is likley that these companies have gained favour due to be long term customers, so this might have less to do with their total charges than their current rates.

shap.dependence_plot(
    ("tenure", "TotalChargesConv"),
    shap_inter, df.drop(cols_drop, axis=1),
    display_features=df.drop(cols_drop, axis=1)
)

png

df.head()

	customerID	Churn	is_test	tenure	MonthlyCharges	TotalChargesConv	gender_Female	SeniorCitizen_0	Partner_No	Dependents_No	...	StreamingMovies_No	Contract_Month-to-month	Contract_One year	PaperlessBilling_No	PaymentMethod_Bank transfer (automatic)	PaymentMethod_Electronic check	PaymentMethod_Mailed check
0	7590-VHVEG	0	False	1	29.85	29.85	1	1	0	1	...	1	1	0	0	0	1	0
1	5575-GNVDE	0	False	34	56.95	1889.50	0	1	1	1	...	1	0	1	1	0	0	1
2	3668-QPYBK	1	False	2	53.85	108.15	0	1	1	1	...	1	1	0	0	0	0	1
3	7795-CFOCW	0	False	45	42.30	1840.75	0	1	1	1	...	1	0	1	1	1	0	0
4	9237-HQITU	1	False	2	70.70	151.65	1	1	1	1	...	1	1	0	0	0	1	0

5 rows × 36 columns

If your total charges are high but you are on month to month, you are less likley to jump ship, because you do not feel constrained. So this might be a ptential method to get more total slaes, convert to monthly

shap.dependence_plot(
    ("TotalChargesConv","Contract_Month-to-month"),
    shap_inter, df.drop(cols_drop, axis=1),
    display_features=df.drop(cols_drop, axis=1)
)

png

This can help you understand certain price points. Similar to shelley points in game theory. There is certain values that customers just flat out do not like. When the price increases from 40 to 50 customers are unhappy, it migth be pshychological or the company is infact losing out at that price point against the market. From price point 60 - 90 the firm seems to be doing reasonably well in retaining customer etc.

shap.dependence_plot(
    ("MonthlyCharges","Contract_Two year"),
    shap_inter, df.drop(cols_drop, axis=1),
    display_features=df.drop(cols_drop, axis=1)
)

png

These partial dependence plots and correlation plots will 99% of the tme show the same relationship. The only issue is that with the corerlation plot you have to create groupby’s

This chart shows something concerning,

shap.dependence_plot(
    ("TotalChargesConv","MonthlyCharges"),
    shap_inter, df.drop(cols_drop, axis=1),
    display_features=df.drop(cols_drop, axis=1)
)

png

cols_drop

[‘customerID’, ‘Churn’, ‘is_test’]

[‘Contract_Month-to-month’, ‘tenure’, ‘Contract_Two year’, ‘TotalChargesConv’, ‘TechSupport_No’, ‘Contract_One year’, ‘OnlineSecurity_No’, ‘DeviceProtection_No’, ‘PaymentMethod_Electronic check’, ‘OnlineBackup_No’]

kt

df[kt].head()

	Contract_Month-to-month	tenure	TotalChargesConv	TechSupport_No	Contract_One year	OnlineSecurity_No	DeviceProtection_No	PaymentMethod_Electronic check	OnlineBackup_No	Churn
0	1	1	29.85	1	0	1	1	1	0	0
1	0	34	1889.50	1	1	0	0	0	1	0
2	1	2	108.15	1	0	0	1	0	0	1
3	0	45	1840.75	0	1	0	0	0	1	0
4	1	2	151.65	1	0	1	1	1	1	1

df["Churn"].value_counts()

0    5174
1    1869
Name: Churn, dtype: int64

import seaborn as sns

kt = list(shap_fin.sort_values("SHAP_corr",ascending=False).index[:4])
kt = []
kt.append("Churn")
kt.append("MonthlyCharges")
kt.append("TotalChargesConv")
g = sns.pairplot(df[kt], hue='Churn', palette = 'seismic',size=3,diag_kind = 'kde',diag_kws=dict(shade=True),plot_kws=dict(s=10) )
g.set(xticklabels=[])

/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/statsmodels/nonparametric/kde.py:494: RuntimeWarning: invalid value encountered in true_divide
  binned = fast_linbin(X,a,b,gridsize)/(delta*nobs)
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/statsmodels/nonparametric/kdetools.py:34: RuntimeWarning: invalid value encountered in double_scalars
  FAC1 = 2*(np.pi*bw/RANGE)**2
/Users/dereksnow/anaconda/envs/py36/lib/python3.6/site-packages/numpy/core/_methods.py:26: RuntimeWarning: invalid value encountered in reduce
  return umr_maximum(a, axis, None, out, keepdims)

<seaborn.axisgrid.PairGrid at 0x1a3feeda58>

png

## You are more likely to churn with high monthly charges and low total cost.
## You are not as likley to churn with high monthly charges and high monthly costs.
## See it shows the same relationship
## What it does unfortunately not show is the probability of being this and that as it is single points. 
## For that you might have to do a nearest neighbour transofmration or a strategised groupby.
## and that is simply what the tree provides, see it as a groupby function, in out example it is just not 
## entirely overfitted, so some information might be lost, but not necessarily if generalisability is important. 
## With scatter plots you have to do it with actual or expected values. 
## I have neverv seen expected value plots because to create it you would have to create some groupbys or knn's
## so inface your LightGBM is the expected value amounts - nothing can actually get as close as it, hence why it
## is used. And the expected value plot gives you a slightly better picture than colored scatter plots for categorical.
## However, for regression plots, it is going to show a very similar plot. - You can at a a later stage try that out 
## for regression plots. So I would recommend using the shap dependence plots when it comes to classification
## and maybe switching  to scater plots when it comes to regression. There is also an argument to just use the
## the scatter plots as the relationship might become moot as a result of another correlated vairable.