Extrapolation of Loan Default using Predictive Analytics: A Case of Business Analysis
DOI:
https://doi.org/10.53739/samvad/2021/v23/166261Keywords:
Adaboost, Decision Tree, κ, -nearest Neighbors (κ, -NN), Logistic Regression, Naïve Bayes, Neural Network, Non-Banking Financial Companies (NBFC), Support Vector Machine (SVM), Random Forest.Abstract
The research assesses the validity of a customer's appropriateness for a loan using a machine learning approach called predictive modeling. Banks and Non-Banking Financial Companies (NBFCs) are at danger of significant Non-Performing Assets (NPAs) due to customer non-payment of loans (Non-Performing Assets). The data for this study came from Kaggle, and eight different prediction models were employed to determine if the borrower would be able to repay the loan. Adaboost, κ-Nearest Neighbors (k-NN), Logistic Regression, Support Vector Machines (SVM), Decision Tree, Naive Bayes, Neural Networks, and Random Forest (RF) are the eight models, respectively. The purpose is to back up decisions made on the basis of factual evidence rather than subjective reasons. Classification Accuracy, Precision, Recall, and F-1 scores are the four performance parameters used to determine the results. With 70% and 30% respectively, the dataset is separated into train and test datasets. The whole analysis is done in two phases, with the first being a full model that is trained on 70% of the train data and the second being observed on 30% of the test data. The purpose of this study is to see how objective characteristics influence borrowers to default on loans, to identify the most common reasons for default, and to predict which customers would default. There are two evaluations we did for the research, wherein, first we took overall train set and make predictions using predictive modeling. The Adaboost predictive model delivers the greatest results, with a recall rate of 0.384, classification accuracy of 59.2 percent, true-positive rate of 69.74 percent. Second, we performed feature selection and discovered that Credit History with 31 percent had the utmost impact on loan default detection. By partitioning the dataset into Credit_History 1 and 0, we discovered that Credit History 1 produces superior results, with a rate of 0.444, 60.5 percent classification accuracy, and a true-positive rate of 68.7%.Downloads
Downloads
Published
How to Cite
Issue
Section
References
Alojail, M., & Bhatia, S. (2020). A Novel Technique for Behavioral Analytics Using Ensemble Learning Algorithms in E-Commerce. IEEE Access, 8, 150072–150080. https:// doi.org/10.1109/ACCESS.2020.3016419
Al-qerem, A., Al-Naymat, G., & Alhasan, M. (2019). Loan Default Prediction Model Improvement through Comprehensive Preprocessing and Features Selection. 2019 International Arab Conference on Information Technology (ACIT), 235– 240. https://doi.org/10.1109/ACIT47987.2019.8991084
Arutjothi, G., & Senthamarai, C. (2017). Prediction of loan status in commercial bank using machine learning classifier. 2017 International Conference on Intelligent Sustainable Systems (ICISS), 416–419. https://doi.org/10.1109/ISS1.2017.8389442
Blöchlinger, A., & Leippold, M. (2006). Economic benefit of powerful credit scoring. Journal of Banking & Finance, 30(3), 851–873. https://doi.org/10.1016/j.jbankfin.2005.07.014
Chopra, Y., Subramanian, K., & Tantri, P. L. (2020). Bank Cleanups, Capitalization, and Lending: Evidence from India. The Review of Financial Studies, hhaa119. https://doi.org/10.1093/rfs/hhaa119
Dastile, X., Celik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing, 91, 106263. https:// doi.org/10.1016/j.asoc.2020.106263
Einav, L., Jenkins, M., & Levin, J. (2013). The impact of credit scoring on consumer lending. The RAND Journal of Economics, 44(2), 249–274. https://doi.org/10.1111/17562171.12019
Ghosh, S. (2021). Wilful defaults took a turn for the worse in Apr-Dec amid pandemic. Mint. https://www.livemint.com/industry/banking/wilful-defaults-took-a-turn-for-theworsein-apr-dec-amid-pandemic-11619030170683.html
Hassan, A. K. I., & Abraham, A. (2013). Modeling consumer loan default prediction using neural netware. 2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE), 239–243. https://doi.org/10.1109/ICCEEE.2013.6633940
Jia, H. (2018, April 10). Credit Scoring with Machine Learning. Medium. https://medium.com/henry-jia/how-to-scoreyourcredit-1c08dd73e2ed
Krichene, A. (2017). Using a naive Bayesian classifier methodology for loan risk assessment:Evidence from a Tunisian commercial bank. Journal of Economics, Finance and Administrative Science, 22(42), 3–24. https://doi.org/10.1108/JEFAS-02-2017-0039
Microsoft. (2020). What is the Team Data Science Process? https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview
Moneycontrol. (2020). How Machine Learning Is Reducing Loan Defaults And Easing Debt Recovery. Moneycontrol. https://www.moneycontrol.com/news/technology/fintech-how-machine-learning-is-reducing-loan-defaultsandeasing-debt-recovery-4798461.html
Press, G. (2016). Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes. https://www.forbes.com/sites/gilpress/2016/03/23/ data-preparation-most-time-consuming-least-enjoyabledata-science-task-survey-says/
PTI. (2021). HDFC, ICICI Bank, SBI, among top-10 lenders in 2020; Google Pay, PhonePe top wallets: Report - Times of India. The Times of India. https://timesofindia.indiatimes.com/business/india-business/hdfc-icici-bank-sbi-amongtop10-lenders-in-2020-google-pay-phonepe-top-walletsreport/articleshow/79844080.cms
RBI. (2021). Need list of top the 10 Banks with lowest NPA. https://tradingqna.com/t/need-list-of-top-the-10-bankswithlowest-npa/100231
Reddy, M. V. J., & Kavitha, B. (2010). Neural Networks for Prediction of Loan Default Using Attribute Relevance Analysis. 2010 International Conference on Signal Acquisition and Processing, 274–277. https://doi.org/10.1109/ICSAP.2010.10
Redman, T. C. (2018, April 2). If Your Data Is Bad, Your Machine Learning Tools Are Useless. Harvard Business Review. https://hbr.org/2018/04/if-your-data-is-bad-yourmachinelearning-tools-are-useless
Shoumo, S. Z. H., Dhruba, M. I. M., Hossain, S., Ghani, N. H., Arif, H., & Islam, S. (2019). Application of Machine Learning in Credit Risk Assessment: A Prelude to Smart Banking. TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), 2023–2028. https://doi.org/10.1109/TENCON.2019.8929527
Shukla, S. (2021). Payment defaults rise 50% in May for NBFCs—The Economic Times. https://economictimes.indiatimes.com/industry/banking/finance/payment-defaults-rise-50-in-may-for-nbfcs/articleshow/82725399.cms?from=mdr
Statista. (2021). India: Gross non-performing loan ratio 2021.
Statista. https://www.statista.com/statistics/1013267/ non-performing-loan-ratio-scheduled-commercial-banksindia/
Sunitha, T., M, C., M, R., G, S. sri, T. V.s., J., & A, T. (2021). Predicting the Loan Status using Logistic Regression and Binary Tree (SSRN Scholarly Paper ID 3769854). Social Science Research Network. https://doi.org/10.2139/ssrn.3769854
Wu, M., Huang, Y., & Duan, J. (2019). Investigations on Classification Methods for Loan Application Based on Machine Learning. 2019 International Conference on Machine Learning and Cybernetics (ICMLC), 1–6. https:// doi.org/10.1109/ICMLC48188.2019.8949252
Zhao, S. (2021, March 8). Predicting Loan Defaults Using Logistic Regression. Medium. https://selenaezhao.medium.com/predicting-loan-defaults-using-logistic-regression71b7482a8cf7
Zhu, L., Qiu, D., Ergu, D., Ying, C., & Liu, K. (2019). A study on predicting loan default based on the random forest algorithm. Procedia Computer Science, 162, 503–513. https://doi.org/10.1016/j.procs.2019.12.017