Predicting High Risk Credit Card Customers using Linear Support Vector Machine Model - Paolo G. Hilado
Situationer
This case presents a model for financial institutions to predict high risk credit card customers based on lifestyle variables. It uses the support vector machine model to check whether applicant is a considered high risk credit card customer or not.
Load Data
#load kernlab package
library(kernlab)
#Open file
<- read.table("credit_card_data.txt")
data # Check data frame structure
str(data)
'data.frame': 654 obs. of 11 variables:
$ V1 : int 1 0 0 1 1 1 1 0 1 1 ...
$ V2 : num 30.8 58.7 24.5 27.8 20.2 ...
$ V3 : num 0 4.46 0.5 1.54 5.62 ...
$ V4 : num 1.25 3.04 1.5 3.75 1.71 ...
$ V5 : int 1 1 1 1 1 1 1 1 1 1 ...
$ V6 : int 0 0 1 0 1 1 1 1 1 1 ...
$ V7 : int 1 6 0 5 0 0 0 0 0 0 ...
$ V8 : int 1 1 1 0 1 0 0 1 1 0 ...
$ V9 : int 202 43 280 100 120 360 164 80 180 52 ...
$ V10: int 0 560 824 3 0 0 31285 1349 314 1442 ...
$ V11: int 1 1 1 1 1 1 1 1 1 1 ...
# Setup categorical variables properly
c(1,5,6,7,8,11)] <- lapply(data[,c(1,5,6,7,8,11)], factor)
data[,str(data)
'data.frame': 654 obs. of 11 variables:
$ V1 : Factor w/ 2 levels "0","1": 2 1 1 2 2 2 2 1 2 2 ...
$ V2 : num 30.8 58.7 24.5 27.8 20.2 ...
$ V3 : num 0 4.46 0.5 1.54 5.62 ...
$ V4 : num 1.25 3.04 1.5 3.75 1.71 ...
$ V5 : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ V6 : Factor w/ 2 levels "0","1": 1 1 2 1 2 2 2 2 2 2 ...
$ V7 : Factor w/ 23 levels "0","1","2","3",..: 2 7 1 6 1 1 1 1 1 1 ...
$ V8 : Factor w/ 2 levels "0","1": 2 2 2 1 2 1 1 2 2 1 ...
$ V9 : int 202 43 280 100 120 360 164 80 180 52 ...
$ V10: int 0 560 824 3 0 0 31285 1349 314 1442 ...
$ V11: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
Preprocess continuous variables via normalization
library(caret)
<- preProcess(data[,-11], method = c("center", "scale"))
nvar # Factor variables have been skipped out in this process
nvar
Created from 654 samples and 10 variables
Pre-processing:
- centered (5)
- ignored (5)
- scaled (5)
Data Partition; 70% Training and 30% Testing
<- createDataPartition(data$V11, p=.70, list=0)
inTrain set.seed(2); train <- data[inTrain,]
set.seed(3); test <- data[-inTrain,]
Applying Support Vector Machine Linear Scaled
#Apply ksvm model with scale
set.seed(789); Smod <- ksvm(V11~., data = train, type = "C-svc", kernel = "vanilladot", C = 100, scaled = TRUE)
Setting default kernel parameters
Checking the Coefficients
#This is the code to show the coefficients. However, knowing that there's a lot to it, I have set
#r command to not show the results as it is very long (13 pages). Feel free to run the code on your
#machine at anytime.
attributes(Smod)
Perform linear kernel with equation of a*scaled(x) + a0
#The given equation a*scaled(x) is:
<- colSums(Smod@xmatrix[[1]] * Smod@coef[[1]])
scaled_a #Compute for constant which is the negative intercept of b in the model
#Using the same equation, a0 is:
<- -Smod@b
scaled_a0 #Let's check out our EQUATION with sum of scaled_a coefficients equals to 0
scaled_a
V10 V11 V2 V3 V4 V51
0.002379988 -0.002379988 -0.007717582 -0.004641372 0.013255107 2.011036840
V61 V71 V72 V73 V74 V75
0.010451678 0.027300780 0.014492811 0.022311870 0.032874312 0.021441293
V76 V77 V78 V79 V710 V711
0.025629810 0.007092351 0.032509436 0.026753560 0.010881315 0.033839039
V712 V713 V714 V715 V716 V717
0.022215982 0.000000000 0.000000000 0.000000000 0.000000000 0.005494827
V719 V720 V723 V740 V767 V81
0.000000000 -0.319768354 0.000000000 0.000000000 0.026479287 -0.003445462
V9 V10.1
-0.001186389 0.126561046
scaled_a0
[1] -1.008793
#Obtain predicted values via the model we have fitted
set.seed(156); pred <- predict(Smod, test[,1:10])
#Here is a sample of how the model predicts given the variables
pred
[1] 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
[75] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[149] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[186] 0 0 0 0 0 0 0 0 0 0
Levels: 0 1
Check Model Accuracy
#see what fraction of the model's predictions match the
#actual classification (Check model accuracy)
<- sum(pred == test[,11])/nrow(test)
acc round(acc*100, 2)
[1] 87.69
We see that our Linear SVM gives us an accuracy that can be considered as good forecasting. Time to save our model for future use.
saveRDS(Smod, "HiRiskMod.rds")