Machine Learning for Risk Management: Classifying Credit Card Default

Hi, today I am going to show you how to classify credit default using Classification Learner App First, we are going to import data into the MATLAB workspace

It’s a sample credit card data In this dataset, we have customer id, credit limit, gender, educational level, age, number of delayed payments, and historical billing amounts for period of 6 months Since we want to create a machine learning model to predict whether the client is going to default in the next 30 days or not, our response variable is default where 1 is equal to default and zero otherwise Then, let’s go to the app tab and select the classification learner app As we click on the new session here, the app will automatically detect the variables in the MATLAB workspace

Then, we can select the dataset that we want to use After that, the app will select if a variable is a predictor or a response based on the data type However, we can also change its role or remove the variable Let’s deselect the ID as it is not relevant to default risk at all To protect against problems like overfitting, I am going to choose the hold out validation of 25% It means that we are going to use 75% of the data for training and the rest for validating the model performance

Once we import the data into the app, we can visualize the relationship of predictor and response varoables using the scatter plot here Next, we are going to apply machine learning algorithms to train the data To evaluate the default risk in practice, you may easily have hundreds of predictors In such case, you may want to apply principal component analysis or PCA too PCA is a useful technique for dimensionality reduction However, we are going to leave this setting unchanged again as we don’t have too many predictors in this dataset

And it’s time for applying classification algorithms There are many classifiers available for you to choose in classification learners, And it’s time for applying classification algorithms There are many classifiers available for you to choose in classification learners, for examples, decision trees, nearest neighbors, and support vector machines Here you can select the classifier and click train one by one Or you can select a group of classifiers to train

And when we want to apply multiple algorithms at the same time, it is a good idea to use parallel computing to speed up the training process In this example, logistic regression has the highest accuracy of 796% In addition to the accuracy, there are other diagnostic tools that can provide more details about the performance of each classifier To me, confusion matrix is one of the most intuitive and easiest metrics for evaluating the classification performance The diagonal elements in green represents the validation samples that are correctly classified

On the other hand, the non-diagonal elements in red represents the validation samples that are misclassified You can use the options here to view other types of confusion matrix When you complete the classification process, you can choose to directly export the model back to the MATLAB workspace or generate MATLAB code Let me choose this option Now, we can also see the details of classification workflow

We can always customize the code and integrate it into other applications In this case, we may use this function to predict whether a client will default on the credit card in the next month or not If the model predicts that the client is likely to default, the bank may use such data to prevent additional credit risk For example, the bank may disapprove additional credit limit or even lower the existing credit limit Here, we can see the performance of the trained model in the testing dataset To learn more about machine learning algorithms and its applications, you can visit this webpage here There are examples and videos related to machine learning available here

Thank you for watching this video