Risk Prediction of Thalassemia Using Data Mining Classifiers
Medical Data Mining, Thalassemia, J48 Decision Tree, Naïve Bayesian Network, Multilayer Perceptron Neural NetworkAbstract
Medical data mining is concerned with prediction knowledge, which is a useful method for extracting hidden patterns from given data for specific purposes. Thalassemia is one of the most common inherited blood hematological disorders, and this paper adopted data mining classification techniques to generate results with high performance and accuracy for risk prediction of thalassemia. The dataset for this purpose was collected from NIBD (National Institute of Blood Diseases), a well-known institute and hospital for blood diseases in Karachi, Pakistan. They provided 301 records of CBC test reports containing positive and negative statuses of diagnosis of thalassemia traits. There were many instances in the report, of which 6 were used for our research purpose, i.e. Gender, MCV, HGB, HCT, MCHC, and RDW. The dataset was divided into training and test data using the WEKA tool. Four algorithms of data mining classification, namely J48 Decision Tree, Naïve Bayesian Network, SMO algorithm, and Multilayer Perceptron Neural Network were adopted to train the model and classify the patient having traits of thalassemia from normal persons with the use of the WEKA tool. Results revealed that out of all four algorithms, Naïve Bayes provided results with the highest accuracy of 99%.
Abdullah, M., & Al-Asmari, S. (2016). Anemia types prediction based on data mining classification algorithms. In Communication, management and information technology (pp. 629-636). CRC Press.
Alaa, M., & Shurrab, A. H. (2017). Blood tumor prediction using data mining techniques. Health Informatics—An International Journal, 6, 23-30.
AlAgha, A. S., Faris, H., Hammo, B. H., & Ala’M, A.-Z. (2018). Identifying β-thalassemia carriers using a data mining approach: The case of the Gaza Strip, Palestine. Artificial intelligence in medicine, 88, 70-83.
Alam, B. R., Khatun, M. S., Taslim, M., & Hossain, M. A. (2022). Handling Class Imbalance in Credit Card Fraud Using Various Sampling Techniques. American Journal of Multidisciplinary Research and Innovation, 1(4), 160-168.
Amin, M. N., & Habib, M. A. (2015). Comparison of different classification techniques using WEKA for hematological data. American Journal of Engineering Research, 4(3), 55-61.
An, Y., Sun, S., & Wang, S. (2017). Naive Bayes classifiers for music emotion classification based on lyrics. 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS),
Asere, G. F., & Botson, D. E. (2021). Data Mining technology as a tool for supporting analytical decision making process in Health Information Management System (HIMS). American Journal of Agricultural Science, Engineering, and Technology, 5(2), 139-147.
Drazin, S., & Montag, M. (2012). Decision tree analysis using weka. Machine Learning-Project II, University of Miami, 1-3.
Egejuru, N. C., Olusanya, S. O., Asinobi, A. O., Adeyemi, O. J., Adebayo, V. O., & Idowu, P. A. (2019). Using data mining algorithms for thalassemia risk prediction. International Journal of Biomedical Science and Engineering, 7(2), 33-44.
Elshami, E. H., & Alhalees, A. M. (2012). Automated diagnosis of thalassemia based on datamining classifiers. The international conference on informatics and applications (ICIA2012)
Granik, M., & Mesyura, V. (2017). Fake news detection using naive Bayes classifier. 2017 IEEE first Ukraine conference on electrical and computer engineering (UKRCON)
Hasani, M., & Hanani, A. (2017). Automated diagnosis of iron deficiency anemia and thalassemia by data mining techniques. International Journal of Computer Science and Network Security (IJCSNS), 17(4), 326.
Herbert, L., Muncie, J., & Campbell, J. (2009). Alpha and beta thalassemia. Am Fam Physician, 80(4), 339-344.
Iyer, A., Jeyalatha, S., & Sumbaly, R. (2015). Diagnosis of diabetes using classification mining techniques. arXiv preprint arXiv:1502.03774.
Jabbar, M., & Samreen, S. (2016). Heart disease prediction system based on hidden naïve bayes classifier. 2016 international conference on circuits, controls, communications and computing (I4C).
Jameel, T., Baig, M., Ahmed, I., Hussain, M. B., & bin Doghaim Alkhamaly, M. (2017). Differentiation of beta thalassemia trait from iron deficiency anemia by hematological indices. Pakistan journal of medical sciences, 33(3), 665.
Jatoi, S., Panhwar, M. A., Memon, M. S., Baloch, J. A., & Saddar, S. (2018). Mining complete blood count reports for disease discovery. International Journal of Computer Science and Network Security, 18(1), 121-127.
Jothi, N., & Husain, W. (2015). Data mining in healthcare–a review. Procedia computer science, 72, 306-313.
Jovic, A., Brkic, K., & Bogunovic, N. (2014). An overview of free software tools for general data mining. 2014 37th International convention on information and communication technology, electronics and microelectronics (MIPRO)
Kamil, S., Kousar, S., Rafique, S., Qadir, H., Farooqui, W., Tauheed, M., Kamil, N., & Liaquat, A. (2021). Frequency of carrier state of thalassemia and various hemoglobinopathies in tertiary care hospital of Pakistan. IJEHSR-International Journal of Endorsing Health Science Research, 9(2), 195-200.
Khaliq, S. (2022). Thalassemia in Pakistan. Hemoglobin, 46(1), 12-14.
Kwon, K., Kim, D., & Park, H. (2017). A parallel MR imaging method using multilayer perceptron. Medical physics, 44(12), 6209-6224.
Luo, Y., Xiong, Z., Xia, S., Tan, H., & Gou, J. (2016). Classification noise detection based SMO algorithm. Optik, 127(17), 7021-7029.
Mdaghri, Z. A., El Yadari, M., Benyoussef, A., & El Kenz, A. (2016). Study and analysis of data mining for healthcare. 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt),
Meena, K., Tayal, D. K., Gupta, V., & Fatima, A. (2019). Using classification techniques for statistical analysis of Anemia. Artificial intelligence in medicine, 94, 138-152.
Mekić, M. S., Pedišić, I., Šobat, H., Boras, V. V., Kirac, I., Štefančić, L., Šekerija, M., Vrdoljak, B., & Vrdoljak, D. V. (2018). The role of complete blood count parameters in patients with colorectal cancer. Acta Clinica Croatica, 57(4), 624.
Ogasawara, A., Matsushita, H., Tanaka, Y., Shirasugi, Y., Ando, K., Asai, S., & Miyachi, H. (2019). A simple screening method for the diagnosis of chronic myeloid leukemia using the parameters of a complete blood count and differentials. Clinica Chimica Acta, 489, 249-253.
Origa, R. (2017). β-Thalassemia. Genetics in Medicine, 19(6), 609-619.
Ramchoun, H., Ghanou, Y., Ettaouil, M., & Janati Idrissi, M. A. (2016). Multilayer perceptron: Architecture optimization and training. International Journal of Interactive Multimedia and Artificial Intelligence, 4,, 26-30. https://doi.org/http://doi.org/10.9781/ijimai.2016.415
Sahu, S., & Mehtre, B. M. (2015). Network intrusion detection system using J48 Decision Tree. 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI),
Saichanma, S., Chulsomlee, S., Thangrua, N., Pongsuchart, P., & Sanmun, D. (2014). The observation report of red blood cell morphology in Thailand teenager by using data mining technique. Advances in hematology, 2014.
Sharma, N., Bajpai, A., & Litoriya, M. R. (2012). Comparison the various clustering algorithms of weka tools. facilities, 4(7), 78-80.
Singh, P., Singh, S., & Pandi-Jain, G. S. (2018). Effective heart disease prediction system using data mining techniques. International journal of nanomedicine, 13(sup1), 121-124.
Singhal, S., & Jena, M. (2013). A study on WEKA tool for data preprocessing, classification and clustering. International Journal of Innovative technology and exploring engineering (IJItee), 2(6), 250-253.
Sultana, M., Haider, A., & Uddin, M. S. (2016). Analysis of data mining techniques for heart disease prediction. 2016 3rd international conference on electrical engineering and information communication technology (ICEEICT),
Tomar, D., & Agarwal, S. (2013). A survey on Data Mining approaches for Healthcare. International Journal of Bio-Science and Bio-Technology, 5(5), 241-266.
Wang, R., & Li, J. (2019). Bayes test of precision, recall, and F1 measure for comparison of two natural language processing models. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,
Xu, S. (2018). Bayesian Naïve Bayes classifiers to text classification. Journal of Information Science, 44(1), 48-59.
Yin, Y., Zhang, Y., Wang, D., Han, X., Chu, X., Shen, M., & Zeng, X. (2020). Complete blood count reflecting the disease status of giant cell arteritis: A retrospective study of Chinese patients. Medicine, 99(39).
Zengin, K., Güngör, C., & Eşgi, N. (2017). Heart Rate Signal Classification By Smo Algorithm. International Research Journal of Mathematics, Engineering and IT , 4(12).
Zhang, Q., Wang, J., Lu, A., Wang, S., & Ma, J. (2018). An improved SMO algorithm for financial credit risk assessment–evidence from China’s banking. Neurocomputing, 272, 314-325.
How to Cite
Copyright (c) 2023 Khizra Ali, Muhammad Saqib

This work is licensed under a Creative Commons Attribution 4.0 International License.