Skip to main content

2024 | OriginalPaper | Buchkapitel

HPO-LGBM-DRI: Dynamic Recognition Interval Estimation for Imbalanced Fraud Call via HPO-LGBM

verfasst von : Xiliang Liu, Xiaoying Zhi, Qiang Mei, Peng Wang, Haoru Su, Jiayi Wang

Erschienen in: Spatial Data and Intelligence

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The prevention and crackdown of fraud calls have been paid more and more attention by industrial and academic societies. Most current researches based on machine learning ignore the imbalanced data distribution characteristic between normal and fraudulent call users, and the outputs neglect the probability fluctuation range of the suspected fraudulent calls. To overcome these limitations, we first construct user behavioral feature vector by a random forest method. Secondly, we propose a novel hierarchical sampling method to overcome the class imbalance problem. Thirdly, we propose a novel fraud call recognition method based on HPO-LGBM (the Bayesian hyper parameter optimization based on random forest and Light Gradient Boosting Machine). Finally, we further evaluate the method’s performance with a DRI (dynamic recognition interval) model. Experimental results on public datasets show that the proposed HPO-LGBM holds a 92.90% F1 value, a 91.90% AUC, a 92.92% G-means, and a 92.37% MCC in fraud call recognition. In addition, the proposed HPO-LGBM model can further give the dynamic recognition interval of the output result, behaving more robust than other models (i.e., LR, RF, MLP, GBDT, XGBOOST, LGBM).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat An, M.: Fraud telephone characteristics analysis and prevention. China Inf. Secur. 5, 86–89 (2014) An, M.: Fraud telephone characteristics analysis and prevention. China Inf. Secur. 5, 86–89 (2014)
2.
Zurück zum Zitat Zhou, C., Lin, Z.: Study on fraud detection of telecom industry based on rough set. In: Proceedings of the IEEE Annual Computing and Communication Workshop and Conference, Las Vegas, United states, pp. 15–19, January 2018 Zhou, C., Lin, Z.: Study on fraud detection of telecom industry based on rough set. In: Proceedings of the IEEE Annual Computing and Communication Workshop and Conference, Las Vegas, United states, pp. 15–19, January 2018
3.
Zurück zum Zitat Naveen, P., Dlwan, B.: Relative analysis of ML algorithm QDA, LR and SVM for credit card fraud detection dataset. In: Proceedings of the International Conference on IoT in Social, Mobile, Analytics and Cloud, Palladam, India, pp. 976–981, October 2020 Naveen, P., Dlwan, B.: Relative analysis of ML algorithm QDA, LR and SVM for credit card fraud detection dataset. In: Proceedings of the International Conference on IoT in Social, Mobile, Analytics and Cloud, Palladam, India, pp. 976–981, October 2020
4.
Zurück zum Zitat Wu, S., Li, J.: IDD fraud detection model based on decision tree and random forest. Commun. Technol. 51(12), (2018) Wu, S., Li, J.: IDD fraud detection model based on decision tree and random forest. Commun. Technol. 51(12), (2018)
5.
Zurück zum Zitat Pehlivanli, D., Eken, S., Ayan, E.: Detection of fraud risks in retailing sector using MLP and SVM techniques. Turk. J. Electr. Eng. Comput. Sci. 27, 3633–3647 (2019)CrossRef Pehlivanli, D., Eken, S., Ayan, E.: Detection of fraud risks in retailing sector using MLP and SVM techniques. Turk. J. Electr. Eng. Comput. Sci. 27, 3633–3647 (2019)CrossRef
6.
Zurück zum Zitat Lenka, S.R., Pant, M., Barik, R.K., Patra, S.S., Dubey, H.: Investigation into the efficacy of various machine learning techniques for mitigation in credit card fraud detection. In: Bhateja, V., Peng, S.L., Satapathy, S.C., Zhang, Y.D. (eds.) Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing, vol. 1176. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-5788-0_24 Lenka, S.R., Pant, M., Barik, R.K., Patra, S.S., Dubey, H.: Investigation into the efficacy of various machine learning techniques for mitigation in credit card fraud detection. In: Bhateja, V., Peng, S.L., Satapathy, S.C., Zhang, Y.D. (eds.) Evolution in Computational Intelligence. Advances in Intelligent Systems and Computing, vol. 1176. Springer, Singapore (2021). https://​doi.​org/​10.​1007/​978-981-15-5788-0_​24
7.
Zurück zum Zitat Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, United States, pp. 785–794, August 2016 Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, United States, pp. 785–794, August 2016
8.
Zurück zum Zitat Ke, G., Meng, Q., Finley, T., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the Advances in Neural Information Processing Systems, Long Beach, United States, pp. 3147–3155, December 2017 Ke, G., Meng, Q., Finley, T., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the Advances in Neural Information Processing Systems, Long Beach, United States, pp. 3147–3155, December 2017
9.
Zurück zum Zitat Olszewski, D.: A probabilistic approach to fraud detection in telecommunications. Knowl. Based Syst. 26, 246–258 (2012)CrossRef Olszewski, D.: A probabilistic approach to fraud detection in telecommunications. Knowl. Based Syst. 26, 246–258 (2012)CrossRef
10.
Zurück zum Zitat Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 06(06), 448–452 (1976) Tomek, I.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 06(06), 448–452 (1976)
11.
Zurück zum Zitat Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39, 539–550 (2009)CrossRef Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 39, 539–550 (2009)CrossRef
12.
Zurück zum Zitat Mani, I., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the Workshop on Learning from Imbalanced Datasets, vol. 126 (2003) Mani, I., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the Workshop on Learning from Imbalanced Datasets, vol. 126 (2003)
13.
Zurück zum Zitat Liu, Z., Cao, W., Gao, Z., et al.: Self-paced ensemble for highly imbalanced massive data classification. In: Proceedings of the International Conference on Data Engineering, pp. 841–852, April 2020 Liu, Z., Cao, W., Gao, Z., et al.: Self-paced ensemble for highly imbalanced massive data classification. In: Proceedings of the International Conference on Data Engineering, pp. 841–852, April 2020
14.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef
15.
Zurück zum Zitat He, H., Bai, Y., Garcia, E.A., et al.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, Hongkong, pp. 1322–1328, June 2008 He, H., Bai, Y., Garcia, E.A., et al.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the International Joint Conference on Neural Networks, Hongkong, pp. 1322–1328, June 2008
16.
Zurück zum Zitat Batista, G.E., Bazzan, A.L., Monard, M.C.: Balancing training data for automated annotation of keywords: a case study. In: WOB, pp. 10–18 (2003) Batista, G.E., Bazzan, A.L., Monard, M.C.: Balancing training data for automated annotation of keywords: a case study. In: WOB, pp. 10–18 (2003)
17.
Zurück zum Zitat Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)CrossRef Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)CrossRef
18.
Zurück zum Zitat Zheng, Y., Li, G., Zhang, T.: An improved over-sampling algorithm based on iForest and SMOTE. In: Proceedings of the ACM International Conference on Software and Computer Applications, Penang, Malaysia, pp. 75–80, February 2019 Zheng, Y., Li, G., Zhang, T.: An improved over-sampling algorithm based on iForest and SMOTE. In: Proceedings of the ACM International Conference on Software and Computer Applications, Penang, Malaysia, pp. 75–80, February 2019
20.
Zurück zum Zitat Yin, X., Yu, X., Sohn, K., et al.: Feature transfer learning for face recognition with under-represented data. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, pp. 5697–5706, June 2019 Yin, X., Yu, X., Sohn, K., et al.: Feature transfer learning for face recognition with under-represented data. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, pp. 5697–5706, June 2019
21.
Zurück zum Zitat Fayoll, J., Moreau, F., Raymond, C., et al.: CRF-based combination of contextual features to improve a posteriori word-level confidence measures. In: Proceedings of the Annual Conference of the International Speech Communication Association, Makuhari, Japan, pp. 1942–1945 (2010) Fayoll, J., Moreau, F., Raymond, C., et al.: CRF-based combination of contextual features to improve a posteriori word-level confidence measures. In: Proceedings of the Annual Conference of the International Speech Communication Association, Makuhari, Japan, pp. 1942–1945 (2010)
Metadaten
Titel
HPO-LGBM-DRI: Dynamic Recognition Interval Estimation for Imbalanced Fraud Call via HPO-LGBM
verfasst von
Xiliang Liu
Xiaoying Zhi
Qiang Mei
Peng Wang
Haoru Su
Jiayi Wang
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2966-1_24

Premium Partner