Leakage-Free Early Prediction of Issue Resolution Time in Agile Software Projects: A Comparative Study on Dataset Quality

Authors

  • Subash Kunwar Nepal Open University, Manbhawan, Lalitpur, Nepal
  • Bhoj Raj Ghimire Nepal Open University, Manbhawan, Lalitpur, Nepal

DOI:

https://doi.org/10.54536/ajdsai.v2i1.7792

Keywords:

Agile Estimation, Issue Resolution Time, Leakage-Free Prediction, Random Forest, Temporal Validation, XGBoost

Abstract

Inaccurate estimation of Issue Resolution Time (IRT) remains a persistent challenge in agile software development, leading to overflow sprints, misallocation of resources, and frequent violations of Service Level Agreement (SLA). Advances in machine learning and predictive analytics offer significant potential to address this challenge by leveraging historical issue-tracking data to generate accurate, data-driven predictions as soon as an issue is created. Early prediction of Issue Resolution Time (IRT) supports sprint planning, risk mitigation, and Service Level Agreement (SLA) compliance in agile software projects. This study proposes a leakage-free prediction framework that restricts features strictly to issue creation time and evaluates Random Forest (RF) and XGBoost under temporal split validation on two datasets: TAWOS (a tracker-native, multi-project dataset) and a Kaggle Jira dataset. A systematic review of 30 papers contextualizes the experimental design by identifying common themes: (i) creation-time metadata strongly predicts resolution duration; (ii) tree ensembles provide dependable baselines; and (iii) temporal and cross-project validation yield more realistic performance estimates than random splits. Experimental results show that XGBoost achieves the best holdout performance on the Kaggle dataset (MAE = 2.278 hours, RMSE = 2.877 hours), while both models perform comparably on TAWOS (MAE ≈ 6.4–6.6 hours). Feature stability analysis across 5-fold temporal cross-validation confirms that creation-time features—particularly Weekday, Hour, Priority, Project_ID, and Assignee_Past_Issues—are consistently the most important predictors, validating the leakage-free design. Residual analysis reveals that XGBoost produces tighter error distributions (Std = 1.93 hours on Kaggle) compared to RF (Std = 5.73 hours), indicating superior prediction consistency.

Author Biographies

  • Subash Kunwar, Nepal Open University, Manbhawan, Lalitpur, Nepal

    Student at Nepal Open University

  • Bhoj Raj Ghimire, Nepal Open University, Manbhawan, Lalitpur, Nepal

    Assistant Professor at Nepal Open University

References

Abdelali, Z., et al., (2019). Investigating the use of random forest in software effort estimation. Procedia Computer Science, 343–352.

Abid, M., & Ali, M. L. (2025). Enhancing software effort estimation: A comparative analysis of machine learning models with correlation-based feature selection. Sustainable Machine Intelligence Journal, 12,1–19.

Abid, M., et al., (2025). Enhancing software effort estimation in healthcare informatics. Sustainable Machine Intelligence Journal, 10, 50–66.

Alatawi, M. N., et al., (2023). A data-driven artificial neural network approach to software project risk assessment. IET Software, 2023(1).

Alzeyani, E. M. M., & Szabó, C. (2024). Comparative evaluation of model accuracy for predicting selected attributes in agile project management. Mathematics, 12(16).

Ardimento, P., et al., (2025). A novel LLM-based classifier for predicting bug-fixing time in bug tracking systems. Journal of Systems and Software, 230.

Aversano, L., et al., (2025). Time series forecasting for bug resolution using machine learning and deep learning models. Frontiers in Big Data, 8.

Bauskar, S. R., et al., (2024). Predictive analytics for project risk management using machine learning. Journal of Data Analysis and Information Processing, 12(4), 566–580.

Ben Kraiem, I., et al., (2023). A comparative study of machine learning algorithm for predicting project management methodology. Procedia Computer Science, 665–675.

Burga, R., et al., (2022). Examining the transition to agile practices with information technology projects. International Journal of Project Management, 40(1), 76–87.

Fernández-Diego, M., et al., (2020). An update on effort estimation in agile software development: A systematic literature review. IEEE Access, 8, 166768–166800.

ForouzeshNejad, A. A., et al., (2025). Data-driven predictive modelling of agile projects using explainable artificial intelligence. Electronics, 14(13).

Haque, E., & Fahad, F. M. (2025). Artificial intelligence in project management: Enhancing decision-making, efficiency and risk management. Strategic Data Management and Innovation, 2(1), 62–77.

Iftikhar, A., et al., (2021). Risk prediction by using artificial neural network in global software development. Computational Intelligence and Neuroscience, 2021.

Jadhav, A., et al., (2023). Effective software effort estimation leveraging machine learning for digital transformation. IEEE Access, 11, 83523–83536.

Komala, C. R., et al., (2023). Innovative cost estimation for agile technology: A novel energy storage technique incorporating modified planning poker. International Journal of Renewable Energy Research, 13(4).

Kula, E., et al., (2022). Factors affecting on-time delivery in large-scale agile software development. IEEE Transactions on Software Engineering, 48(9), 3573–3592.

Lishner, I., & Shtub, A. (2022). Using an artificial neural network for improving the prediction of project duration. Mathematics, 10(22).

Litoriya, R., & Kothari, A. (2013). An efficient approach for agile web based project estimation: AgileMOW. Journal of Software Engineering and Applications, 6(6), 297–303.

Meharunnisa, et al., (2023). Analysis of software effort estimation by machine learning techniques. Ingenierie des Systemes d’Information, 28(6), 1445–1457.

Pargaonkar, S. (2023). A comprehensive research analysis of software development life cycle (SDLC) agile & waterfall model. International Journal of Scientific and Research Publications, 13(8), 120–124.

Pasuksmit, J., et al., (2024). A systematic literature review on reasons and approaches for accurate effort estimations in agile. ACM Computing Surveys, 56(11).

Polu, O. R. (2024). Machine learning for predicting software project failure risks. International Journal of Computer Engineering and Technology, 15(4), 950–959.

Poudel, S., Maharjan, S., & Luitel, K. (2026). Vibe coding in Nepal: Opportunities and challenges in leveraging AI tools for software development. American Journal of Applied Research and AI, 1(1), 44–52. https://doi.org/10.54536/ajarai.v1i1.7057

Poudel, S., Maharjan, S., & Subedi, B. (2024). Session based recommender system using recurrent neural network. International Journal of Research Publications. https://doi.org/10.47119/IJRP1001551820247093

Priya Varshini, A. G., et al., (2021). Estimating software development efforts using a random forest-based stacked ensemble approach. Electronics, 10(10).

Rivera Ibarra, J. G., et al., (2024). Early estimation in agile software development projects: A systematic mapping study. Informatics, 11(4).

Satapathy, S. M., et al., (2016). Early stage software effort estimation using random forest technique based on use case points. IET Software, 10(1), 10–17.

Sousa, A. O., et al., (2023). Applying machine learning to estimate the effort and duration of individual tasks in software projects. IEEE Access, 11, 89933–89946.

Uddin, S., et al., (2025). Machine learning and deep learning in project analytics: Methods, applications and research trends. Production Planning and Control, 36(7), 873–892.

Vyas, M., & Hemrajani, N. (2021). Predicting effort of agile software projects using linear regression, ridge regression and logistic regression. International Journal of Technical and Physical Problems of Engineering, 47, 14–19.

Zakrani, A., et al., (2018). Software development effort estimation using random forests: An empirical study and evaluation. International Journal of Intelligent Engineering and Systems, 11(6), 300–311.

Downloads

Published

2026-06-13

How to Cite

Kunwar, S., & Ghimire, B. R. . (2026). Leakage-Free Early Prediction of Issue Resolution Time in Agile Software Projects: A Comparative Study on Dataset Quality. American Journal of Data Science and Artificial Intelligence, 2(1), 58-68. https://doi.org/10.54536/ajdsai.v2i1.7792

Similar Articles

You may also start an advanced similarity search for this article.