Extrapolation of Loan Default using Predictive Analytics: A Case of Business Analysis

Authors

  •  Riktesh Srivastava City University College of Ajman, Ajman

DOI:

https://doi.org/10.53739/samvad/2021/v23/166261

Keywords:

Adaboost, Decision Tree, &#954, -nearest Neighbors (&#954, -NN), Logistic Regression, Naïve Bayes, Neural Network, Non-Banking Financial Companies (NBFC), Support Vector Machine (SVM), Random Forest.

Abstract

The research assesses the validity of a customer's appropriateness for a loan using a machine learning approach called predictive modeling. Banks and Non-Banking Financial Companies (NBFCs) are at danger of significant Non-Performing Assets (NPAs) due to customer non-payment of loans (Non-Performing Assets). The data for this study came from Kaggle, and eight different prediction models were employed to determine if the borrower would be able to repay the loan. Adaboost, κ-Nearest Neighbors (k-NN), Logistic Regression, Support Vector Machines (SVM), Decision Tree, Naive Bayes, Neural Networks, and Random Forest (RF) are the eight models, respectively. The purpose is to back up decisions made on the basis of factual evidence rather than subjective reasons. Classification Accuracy, Precision, Recall, and F-1 scores are the four performance parameters used to determine the results. With 70% and 30% respectively, the dataset is separated into train and test datasets. The whole analysis is done in two phases, with the first being a full model that is trained on 70% of the train data and the second being observed on 30% of the test data. The purpose of this study is to see how objective characteristics influence borrowers to default on loans, to identify the most common reasons for default, and to predict which customers would default. There are two evaluations we did for the research, wherein, first we took overall train set and make predictions using predictive modeling. The Adaboost predictive model delivers the greatest results, with a recall rate of 0.384, classification accuracy of 59.2 percent, true-positive rate of 69.74 percent. Second, we performed feature selection and discovered that Credit History with 31 percent had the utmost impact on loan default detection. By partitioning the dataset into Credit_History 1 and 0, we discovered that Credit History 1 produces superior results, with a rate of 0.444, 60.5 percent classification accuracy, and a true-positive rate of 68.7%.

Downloads

Download data is not yet available.

Downloads

Published

2022-01-21

How to Cite

(1)
Srivastava, R. Extrapolation of Loan Default Using Predictive Analytics: A Case of Business Analysis. samvad 2022, 23, 37-49.

Issue

Section

Articles

References

Alojail, M., & Bhatia, S. (2020). A Novel Technique for Behavioral Analytics Using Ensemble Learning Algorithms in E-Commerce. IEEE Access, 8, 150072–150080. https:// doi.org/10.1109/ACCESS.2020.3016419

Al-qerem, A., Al-Naymat, G., & Alhasan, M. (2019). Loan Default Prediction Model Improvement through Comprehensive Preprocessing and Features Selection. 2019 International Arab Conference on Information Technology (ACIT), 235– 240. https://doi.org/10.1109/ACIT47987.2019.8991084

Arutjothi, G., & Senthamarai, C. (2017). Prediction of loan status in commercial bank using machine learning classifier. 2017 International Conference on Intelligent Sustainable Systems (ICISS), 416–419. https://doi.org/10.1109/ISS1.2017.8389442

Blöchlinger, A., & Leippold, M. (2006). Economic benefit of powerful credit scoring. Journal of Banking & Finance, 30(3), 851–873. https://doi.org/10.1016/j.jbankfin.2005.07.014

Chopra, Y., Subramanian, K., & Tantri, P. L. (2020). Bank Cleanups, Capitalization, and Lending: Evidence from India. The Review of Financial Studies, hhaa119. https://doi.org/10.1093/rfs/hhaa119

Dastile, X., Celik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing, 91, 106263. https:// doi.org/10.1016/j.asoc.2020.106263

Einav, L., Jenkins, M., & Levin, J. (2013). The impact of credit scoring on consumer lending. The RAND Journal of Economics, 44(2), 249–274. https://doi.org/10.1111/17562171.12019

Ghosh, S. (2021). Wilful defaults took a turn for the worse in Apr-Dec amid pandemic. Mint. https://www.livemint.com/industry/banking/wilful-defaults-took-a-turn-for-theworsein-apr-dec-amid-pandemic-11619030170683.html

Hassan, A. K. I., & Abraham, A. (2013). Modeling consumer loan default prediction using neural netware. 2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE), 239–243. https://doi.org/10.1109/ICCEEE.2013.6633940

Jia, H. (2018, April 10). Credit Scoring with Machine Learning. Medium. https://medium.com/henry-jia/how-to-scoreyourcredit-1c08dd73e2ed

Krichene, A. (2017). Using a naive Bayesian classifier methodology for loan risk assessment:Evidence from a Tunisian commercial bank. Journal of Economics, Finance and Administrative Science, 22(42), 3–24. https://doi.org/10.1108/JEFAS-02-2017-0039

Microsoft. (2020). What is the Team Data Science Process? https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview

Moneycontrol. (2020). How Machine Learning Is Reducing Loan Defaults And Easing Debt Recovery. Moneycontrol. https://www.moneycontrol.com/news/technology/fintech-how-machine-learning-is-reducing-loan-defaultsandeasing-debt-recovery-4798461.html

Press, G. (2016). Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes. https://www.forbes.com/sites/gilpress/2016/03/23/ data-preparation-most-time-consuming-least-enjoyabledata-science-task-survey-says/

PTI. (2021). HDFC, ICICI Bank, SBI, among top-10 lenders in 2020; Google Pay, PhonePe top wallets: Report - Times of India. The Times of India. https://timesofindia.indiatimes.com/business/india-business/hdfc-icici-bank-sbi-amongtop10-lenders-in-2020-google-pay-phonepe-top-walletsreport/articleshow/79844080.cms

RBI. (2021). Need list of top the 10 Banks with lowest NPA. https://tradingqna.com/t/need-list-of-top-the-10-bankswithlowest-npa/100231

Reddy, M. V. J., & Kavitha, B. (2010). Neural Networks for Prediction of Loan Default Using Attribute Relevance Analysis. 2010 International Conference on Signal Acquisition and Processing, 274–277. https://doi.org/10.1109/ICSAP.2010.10

Redman, T. C. (2018, April 2). If Your Data Is Bad, Your Machine Learning Tools Are Useless. Harvard Business Review. https://hbr.org/2018/04/if-your-data-is-bad-yourmachinelearning-tools-are-useless

Shoumo, S. Z. H., Dhruba, M. I. M., Hossain, S., Ghani, N. H., Arif, H., & Islam, S. (2019). Application of Machine Learning in Credit Risk Assessment: A Prelude to Smart Banking. TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), 2023–2028. https://doi.org/10.1109/TENCON.2019.8929527

Shukla, S. (2021). Payment defaults rise 50% in May for NBFCs—The Economic Times. https://economictimes.indiatimes.com/industry/banking/finance/payment-defaults-rise-50-in-may-for-nbfcs/articleshow/82725399.cms?from=mdr

Statista. (2021). India: Gross non-performing loan ratio 2021.

Statista. https://www.statista.com/statistics/1013267/ non-performing-loan-ratio-scheduled-commercial-banksindia/

Sunitha, T., M, C., M, R., G, S. sri, T. V.s., J., & A, T. (2021). Predicting the Loan Status using Logistic Regression and Binary Tree (SSRN Scholarly Paper ID 3769854). Social Science Research Network. https://doi.org/10.2139/ssrn.3769854

Wu, M., Huang, Y., & Duan, J. (2019). Investigations on Classification Methods for Loan Application Based on Machine Learning. 2019 International Conference on Machine Learning and Cybernetics (ICMLC), 1–6. https:// doi.org/10.1109/ICMLC48188.2019.8949252

Zhao, S. (2021, March 8). Predicting Loan Defaults Using Logistic Regression. Medium. https://selenaezhao.medium.com/predicting-loan-defaults-using-logistic-regression71b7482a8cf7

Zhu, L., Qiu, D., Ergu, D., Ying, C., & Liu, K. (2019). A study on predicting loan default based on the random forest algorithm. Procedia Computer Science, 162, 503–513. https://doi.org/10.1016/j.procs.2019.12.017