Addressing Class Imbalance in IoT: A Comparative Analysis of Resampling Techniques

Authors

  • Yousef Qawqzeha University of Fujairah, 48CP+J4P - E89, Mraisheed, Fujairah, United Arab Emirates

DOI:

https://doi.org/10.54536/ajsts.v4i1.2912

Keywords:

Multi-Class Classification, Resampling Techniques, Class Imbalance, Hyperparameter Tuning, Fraud Detection

Abstract

In modern times, automated task processing and sophisticated algorithm design are important tools for using cutting-edge technologies and approaches to extract insights from data and practical solutions. The machine learning models powered by data have produced outputs that were either more or less worthy when the input datasets were balanced. An uneven distribution of classes in the input datasets has resulted in imbalanced data. Class imbalance has been a significant challenge in machine learning applications, particularly when working with substantially disparate distributions like those found in Internet of Things datasets. This study addressed the class imbalance issue in IoT data by comparing various resampling strategies. The study aimed to find efficient ways to realign class distributions and enhance the functionality of machine learning models implemented in Internet of Things systems. A predictive model built on an unbalanced data set appeared to have high accuracy, but it struggled to generalise new data from the minority class. Resampling techniques, including Over-sampling, Under-sampling, SMOTE (Synthetic Minority Over-Sampling Technique), and ADASYN (Adaptive Synthetic Sampling), were evaluated using an extensive variety of IoT datasets spanning different classes and domains. The functionality of each technique was assessed using performance metrics such as the area covered by AUC, F1-score, precision, and recall. This study advanced the understanding of class imbalance mitigation in IoT data processing by providing insights into creating more durable and trustworthy models for IoT scenarios. CCS CONCEPTS • Class Imbalance • Applied Computing • Machine Learning • Internet of Things (IoT)

Downloads

Download data is not yet available.

References

Abdi, L., & Hashemi, S. (2015). To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE transactions on Knowledge and Data Engineering, 28(1), 238-251.

Atuhurra, J., Hara, T., Zhang, Y., Sasabe, M., & Kasahara, S. (2024). Dealing with Imbalanced Classes in Bot-IoT Dataset. arXiv preprint arXiv:2403.18989.

Azlim Khan, A. K., & Ahamed Hassain Malim, N. H. (2023). Comparative Studies on Resampling Techniques in Machine Learning and Deep Learning Models for Drug-Target Interaction Prediction. Molecules, 28(4), 1663.

Coelho, D., Costa, D., Rocha, E. M., Almeida, D., & Santos, J. P. (2022). Predictive maintenance on sensorized stamping presses by time series segmentation, anomaly detection, and classification algorithms. Procedia Computer Science, 200, 1184-1193.

Collell, G., Prelec, D., & Patil, K. R. (2018). A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data. Neurocomputing, 275, 330-340.

Dogra, V., Verma, S., Verma, K., Jhanjhi, N. Z., Ghosh, U., & Le, D.-N. (2022). A comparative analysis of machine learning models for banking news extraction by multiclass classification with imbalanced datasets of financial news: challenges and solutions.

Fahim, M., & Sillitti, A. (2019). Anomaly detection, analysis and prediction techniques in iot environment: A systematic literature review. IEEE Access, 7, 81664-81681.

Fisher, A., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177), 1-81.

Hanskunatai, A. (2018). A new hybrid sampling approach for classification of imbalanced datasets. 2018 3rd International Conference on Computer and Communication Systems (ICCCS),

Huang, P. J. (2015). Classification of imbalanced data using synthetic over-sampling techniques. University of California, Los Angeles.

Jiang, X., Wang, J., Meng, Q., Saada, M., & Cai, H. (2023). An adaptive multi-class imbalanced classification framework based on ensemble methods and deep network. Neural Computing and Applications, 35(15), 11141-11159.

Johnson, J. M., & Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. Journal of Big Data, 6(1), 1-54.

Khodadadi, F., Dastjerdi, A. V., & Buyya, R. (2016). Internet of things: an overview. Internet of things, 3-27.

Koziarski, M., Krawczyk, B., & Woźniak, M. (2019). Radial-based oversampling for noisy imbalanced data classification. Neurocomputing, 343, 19-33.

Koziarski, M., Woźniak, M., & Krawczyk, B. (2020). Combined cleaning and resampling algorithm for multi-class imbalanced data with label noise. Knowledge-Based Systems, 204, 106223.

Kumar, A., Sharma, S., Singh, A., Alwadain, A., Choi, B.-J., Manual-Brenosa, J., Ortega-Mansilla, A., & Goyal, N. (2021). Revolutionary strategies analysis and proposed system for future infrastructure in internet of things. Sustainability, 14(1), 71.

Nixon, C., Sedky, M., & Hassan, M. (2019). Practical application of machine learning based online intrusion detection to internet of things networks. 2019 IEEE Global Conference on Internet of Things (GCIoT).

Nord, J. H., Koohang, A., & Paliszkiewicz, J. (2019). The Internet of Things: Review and theoretical framework. Expert Systems with Applications, 133, 97-108.

Obaid, W., & Nassif, A. B. (2022). The effects of resampling on classifying imbalanced datasets. 2022 Advances in Science and Engineering Technology International Conferences (ASET).

Paisitkriangkrai, S., Shen, C., & van den Hengel, A. (2013). A scalable stagewise approach to large-margin multiclass loss-based boosting. IEEE transactions on neural networks and learning systems, 25(5), 1002-1013.

Pal, D., Funilkul, S., Charoenkitkarn, N., & Kanthamanon, P. (2018). Internet-of-things and smart homes for elderly healthcare: An end user perspective. IEEE Access, 6, 10483-10496.

Peng, H., Wu, C., & Xiao, Y. (2023). CBF-IDS: Addressing Class Imbalance Using CNN-BiLSTM with Focal Loss in Network Intrusion Detection System. Applied Sciences, 13(21), 11629.

Powroźnik, P., Szcześniak, P., & Piotrowski, K. (2021). Elastic energy management algorithm using IoT technology for devices with smart appliance functionality for applications in smart-grid. Energies, 15(1), 109.

Pramanik, P. K. D., Upadhyaya, B. K., Pal, S., & Pal, T. (2019). Internet of things, smart sensors, and pervasive systems: Enabling connected and pervasive healthcare. In Healthcare data analytics and management (pp. 1-58). Elsevier.

Qawqzeh, Y. K., Alourani, A., & Ghwanmeh, S. (2023). An improved breast cancer classification method using an enhanced AdaBoost classifier. International Journal of Advanced Computer Science and Applications, 14(1).

Qawqzeh, Y. K., & Ashraf, M. (2023). A Fraud Detection System Using Decision Trees Classification in An Online Transactions. Proceedings of the 2023 12th International Conference on Software and Computer Applications.

Qawqzeh, Y. K., Bajahzar, A. S., Jemmali, M., Otoom, M. M., & Thaljaoui, A. (2020). Classification of diabetes using photoplethysmogram (PPG) waveform analysis: logistic regression modeling. BioMed Research International, 2020.

Rezvani, S., & Wang, X. (2023). A broad review on class imbalance learning techniques. Applied Soft Computing, 110415. https://doi.org/10.1016/j.asoc.2023.110415

Rose, K., Eldridge, S., & Chapin, L. (2015). The internet of things: An overview. The internet society (ISOC), 80(15), 1-53.

Sáez, J. A., Krawczyk, B., & Woźniak, M. (2016). Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recognition, 57, 164-178.

Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: an experimental review. Journal of Big Data, 7, 1-47.

Tarawneh, A. S., Hassanat, A. B., Almohammadi, K., Chetverikov, D., & Bellinger, C. (2020). Smotefuna: Synthetic minority over-sampling technique based on furthest neighbour algorithm. IEEE Access, 8, 59069-59082.

Tyagi, S., & Mittal, S. (2020). Sampling approaches for imbalanced data classification problem in machine learning. Proceedings of ICRIC 2019: Recent innovations in computing,

Ullah, I., & Mahmoud, Q. H. (2021). A framework for anomaly detection in IoT networks using conditional generative adversarial networks. IEEE Access, 9, 165907-165931.

Varotto, G., Susi, G., Tassi, L., Gozzo, F., Franceschetti, S., & Panzica, F. (2021). Comparison of resampling techniques for imbalanced datasets in machine learning: application to epileptogenic zone localization from interictal intracranial EEG recordings in patients with focal epilepsy. Frontiers in Neuroinformatics, 15, 715421.

Wanasinghe, T. R., Gosine, R. G., James, L. A., Mann, G. K., De Silva, O., & Warrian, P. J. (2020). The internet of things in the oil and gas industry: a systematic review. IEEE Internet of Things Journal, 7(9), 8654-8673.

Wang, S., & Yao, X. (2012). Multiclass imbalance problems: Analysis and potential solutions. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(4), 1119-1130.

Welvaars, K., Oosterhoff, J. H., van den Bekerom, M. P., Doornberg, J. N., van Haarst, E. P., OLVG Urology Consortium, & R, t. M. L. C. v. d. Z. J. v. A. G. L. B. H. M. K. P. B. L. v. d. K. A. M. W. P. (2023). Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data. JAMIA open, 6(2), ooad033.

Zhou, X., Hu, Y., Wu, J., Liang, W., Ma, J., & Jin, Q. (2022). Distribution bias aware collaborative generative adversarial network for imbalanced deep learning in industrial IoT. IEEE Transactions on Industrial Informatics, 19(1), 570-580.

Downloads

Published

2025-02-18

How to Cite

Qawqzeha, Y. (2025). Addressing Class Imbalance in IoT: A Comparative Analysis of Resampling Techniques. American Journal of Smart Technology and Solutions, 4(1), 16–24. https://doi.org/10.54536/ajsts.v4i1.2912