Deep Learning for Predicting University Academic Fees in a Semi-Urban Area

ABSTRACT


INTRODUCTION
The Université de l'Assomption au Congo (UAC) comprises various offices, including the management committee, the budget administration, the general academic secretariat and the general administrative secretariat.Of all these departments, the finance administration center was the one that interested the authors in carrying out this research.This office in a university is often responsible for establishing the academic budget each year, as well as setting the tuition fees to be paid per promotion according to certain criteria (Scott, 2018).Academic fees are an annual amount that students must pay in return for their education.Academic fees at a university can change from year to year due to various factors (Wahyuddinet al., 2019).Higher education is an investment in human capital, and one that pays off handsomely.Greater responsibility on the part of students, a better perception of the true cost of their studies and greater flexibility for universities can lead to a better allocation of resources.This makes it possible to maintain and consolidate a high-quality university system without undermining its accessibility (Sulistiyo & Soegoto, 2018).The purpose of higher education in students' lives is their own personal development, the development of the community in which they live and the preparation for professional employment, leading to a better quality of life (Sailaja et al., 2022).Some South African universities, including the University of KwaZulu-Natal, have consistently witnessed student protests through the effect of rising tuition fees each year.In 2015-2016, the announcement of a 10.5% increase in academic fees had sparked higher protests among students (Chiliza et al., 2022).Around 61% of parents in the Democratic Republic of Congo (DRC) believe that academic fees are unaffordable given that 81% of households in the DRC earn $1100 a year as salaries (Mpia et al., 2023).Tuition fees are the first concern when it comes to university studies, and this can also influence the choice of university.This is also the case for students in the city of Butembo.Young people in this city tend to choose universities on the basis of tuition fees.Higher tuition fees and the commercialization of universities are reducing education to an essentially pragmatic commodity, increasingly expensive but lesser rich in content.The setting of tuition fees can give rise to complaints or even students failing to register or re-register (Yustanti et al., 2017).Moreover, tuition fees vary from year to year, depending on a number of factors.Among these factors, some can influence the rise in academic fees, while others can ensure that fees are moderate.In this study, the authors first analyzed the factors that predict the setting of these fees, which constitute the reputation of an institution or university, and then proposed a predictive model that uses these factors as predictors based on secondary data collected within the UAC (Deschenaux & Tardif, 2016).UAC is a private university which exists and operates under the moral responsibility of the Augustinians of the Assumption.This academic institution comprises the Faculty of Letters and Humanities, the Faculty of Economics and Management, the Faculty of Development Sciences, the Faculty of Psychology and Educational Sciences, the Faculty of Science and the Faculty of Applied Sciences (Kasambya et al., 2023).The main objective of this study was to implement a Deep Learning model that better predicts the amount Am.J. Educ.Technol. 3(1) 9-17, 2024 of tuition fees to be paid per promotion each year using secondary data collected within the UAC.Specifically, this research had three objectives: (i) to identify the factors that influence the setting of tuition fees, contextualizing them in the case of the UAC, (ii) to develop two regressive Deep Learning models in order to select the most efficient one that best predicts the university tuition fees, and (iii) to deploy the validated model in a web-based technology that is flexible for end-users.

LITERATURE REVIEW
Tuition fees are a mount of money charged to a person by the educational institution in which he or she enrolls, based on the number of course credits in which he or she is enrolled and the school budget for each period of study.In other words, it is an amount of money to support education within an institution in teaching and various activities (Iskandar, 2019).Higher education is characterized by significant private returns, both monetary and non-monetary.One of the reasons for legitimizing these fees lies in the necessary financing of higher education (Flacher et al., 2012).Today, thanks to these fees, two things have changed: (i) the university has returned to the heart of the knowledge society sought by governments, and (ii) a new, private, for-profit sector has emerged, growing rapidly in recent years by virtue of not weighing on public spending (Bietenbeck et al., 2022).

Artificial Intelligence
Artificial intelligence (AI) was coined by John McCarthy in 1956.The term refers to any system capable of adapting itself to respond appropriately to its environment (Virginia, 2014).AI is a science that seeks to solve complex problems logically or by using different, well-adapted algorithms.At present, AI is used in various fields, including robotics, military services, education, organizations, banking systems, medicine and more.AI is also used in video games and in industrial computing to optimize truck routes (Jungwirth & Haluza, 2023).AI has impacted various domains of our life, especially the field of education (Mohammed, 2023).

Data Mining
Data Mining is the process of discovering insightful and predictive models from massive data.It is the art of extracting useful information from large quantities of data.Data mining has been applied with considerable success in business, retail, telecommunications, intrusion detection, biological data analysis, healthcare as well as other fields (Sadiku et al., 2015).Due to its increased application in all fields, data mining is now being used for knowledge discovery in databases.Therefore, using the data mining process, factors can be discovered that are needed and applied to predict outcomes from the data (Nagaraj et al., 2020).

Deep Learning
Deep learning has become a buzzword of late.However, the existing literature has no unified definition of deep learning.However, it is a sub-domain of Machine Learning (Zhang et al., 2018;Dereje et al., 2022).The concept of Machine Learning dates back to the middle of the 20th century.In the 1950s, British mathematician Alan Turing imagined a machine capable of learning, a Learning Machine.Over the following decades, various Machine Learning techniques were developed to create algorithms capable of autonomous learning and improvement.Among these techniques are artificial neural networks.Deep Learning is based on these algorithms, as are technologies such as image recognition and robotic vision.Deep Learning is a new field of Machine Learning research, introduced to bring Machine Learning closer to its main objective, which is AI (Taye, 2023).

Artificial Neural Networks
Artificial Neural networks are made up of a set of neurons or nodes connected to each other by links that allow signals to be propagated from one neuron to another.They are used to discover relationships between many variables without external intervention, thanks to their ability to learn.This is why they are used in classification, estimation and forecasting problems such as the stock market, sales forecasting, etc. (Sulistiyo & Soegoto, 2018).Neural networks are subdivided into different layers including the input layer, the hidden layer(s) and the output layer.The hidden layers constitute the abstraction layer required to move from the input layer to the output layer.The number of hidden layers defines the type of system: a surface learning system (with one to three hidden layers) or a deep learning system (with more than three hidden layers) (Badillo et al., 2020).Thus, starting from the three main parts of an artificial neural network, which are the input layer, the hidden layer and the output layer, a neural network can have a multitude of layers at each level, in which case we speak of Deep Learning (Kumar et al., 2023).The neural network was used in this research to train the model to predict the amount of academic fees to be paid per promotion.

Empirical Literature Review
Research on the prediction of school fees has already been initiated by several scientific frameworks from different fields of study, each approaching it from a different perspective.Thus, this study has inserted itself into the dynamics of these precursors by presenting different results obtained from their research in order to draw the underpinning.Yustanti et al. (2017) developed a classification model to predict tuition fees a student would pay in order to help academic authorities make the decision in classifying the student's customized tuition fee.These authors collected data from registers of new students from the years 2016 to 2017 at a public university.The variables they used were parental salary, home electricity, household size, parental education level, etc.The results of their study gave an accuracy of 66.52%.After Correlation feature selection (CFS) had Am.J. Educ.Technol.3(1) 9-17, 2024 been used to select variables useful in determining costs, accuracy rose to 81.78%.The combination of SVM and CFS gave a better result.These authors observed that their model could be used to solve other similar problems (Yustanti et al., 2017).Budiharjo et al. (2018) proposed an artificial neural network model to predict factors causing tuition payment problems at a private university, namely Akademi Manajemen Informatik.Six variables were identified, including parental income, parental occupation, number of siblings, residence status, misuse of money and external factors.The models had given an accuracy of 80% and could be used to predict factors in the problem of late payment of tuition fees in other higher education institutions (Budiharjo et al., 2018).Rohmayani (2020) had carried out an analysis to predict tuition payment reliability using the Naïve Bayes algorithm based on Particle Swarm Optimization (PSO).The aim of his research was to determine the exact classification model that detects the most influential indicators in predicting tuition payment delays.To carry out his study, the author used a number of variables, including study programs, parents' occupations, parents' monthly income, number of dependents, parents' monthly expenses and monthly allowances.For data collection, this author used a questionnaire and the target population was students at Politeknik Tedc Bandung.The results of his research showed that the application of PSO with the Naïve Bayes algorithm had achieved better accuracy scores of 78.50%, while without using PSO, the accuracy was 65.30%.Using only Naïve Bayes and the most influential attribute, accuracy was 73.62%.Thus, this author had concluded that optimizing the Naïve Bayes method based on PSO can help predict late payments at Bandung's Tedc Polytechnic with better accuracy.In addition, this can also uncover the most influential attributes.From the above, the most influential attributes had been parents' income, parents' number of dependents, revenue per month, financial services, academic services, study programs, and tuition payment methods (Rohmayani, 2020).

METHODS AND MATERIALS Research Design
This research was based on secondary quantitative data.The quantitative method is based on collecting and relating information and facts that can be quantified and measured or social facts that can be converted into figures, statistics, and graphical data (Giordano & Jolibert, 2016).The authors began by collecting secondary data recorded in academic fee payment registers within the UAC.Only information on the academic fees paid per promotion was selected to constitute a csv file of the raw data.As part of a modeling project, such as classification, or regression, the raw data cannot generally be used directly, as some machine learning algorithms place demands on the data, statistical noise and errors in the data may need to be corrected, complex non-linear relationships may be extracted from the data (Kivuyirwa et al, 2023).
This justifies the data pre-processing phase in this study to detect possible errors such as missing or null values and outliers.After this check, the authors noticed that the dataset had both numerical and categorical values.This led to the conversion of categorical values into numerical values.From the processed data, the authors proceeded with Exploratory data Analysis (EDA), which is a process of sifting through the data and trying to make sense of individual columns and the relationships between them.Although this is a time-consuming activity, it has important benefits, as a good understanding of the data leads to better-performing models and an understanding of why predictions are made (Matt & Theodore, 2020).After EDA, the dataset was subdivided into training and test data.The training dataset consisted of 75% data, while the test dataset took 25%.The authors then proceeded to create models using artificial neural networks.At the end of the evaluation of the models created, the best model was considered.This model was then saved using the pickle library.This model was then deployed in web technology using Flask.Flask is a Python micro-frame.The fact that it is a micro Framework doesn't make it any less functional, but rather very simple and highly extensible.This gives developers the power to choose the configuration they want, making it easier to write applications or plugins (Kunal, 2018).The evaluation metrics used were mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE) and coefficient of determination (R2).Use case scenario was used as a technique to evaluate the performance of the deployed model in the real world.The image below illustrates the processes followed in order to deploy the proposed model:

Population Target
Taking courses at a university is conditional on paying tuition fees, enabling the school managers to establish and operate the institution properly (Abbas et al., 2021).Based on this research on the prediction of university tuition fees, the target population was UAC students.The units of analysis included all information related to academic fees covering a six-year period, from the 2017-2018 academic year to 2022-2023.

Data Collection Procedure
The authors collected data from the UAC registers and documents.They used the quantitative method based on secondary data.Secondary data exist, and their use has many advantages because they are at least structured (Ellram & Tate, 2016).Secondary data is information that has already been collected or produced by an organization for a purpose other than that of the study being conducted, and which is available for a second use (Joannidès & Berland, 2008).The authors collected data from UAC's finance office documents.The data collection process was carried out quasi-manually, i.e. authors obtained data from an Excel file from which they selected relevant data.Once the data was collected, the authors compiled a dataset, which was saved in a .csvfile.

Characteristics of the Collected Data
Prior to the processing phase, the data collected at UAC comprised 17 variables, namely academic year, promotion, national insurance company (SONAS) fees, tuition fees, unit fees, workshop and lab fees, thesis supervision fees, library fees, fees for university projects management, fees for supporting visiting lecturers, fees for university websites management, fees for acquiring lab equipment, culture and sports fees, internet fees, student card fees, Ishango edition fees, and academic fees.The collected data was in French.As the algorithms selected in this research do not recognize certain special characters and are unable to process categorical data, the authors converted the variables into terms that could be understood by these algorithms and the categorical values into numerical values using certain Python functions, including map(), which enabled the authors to convert the categorical values into numerical ones (Mpia et al., 2022).After data conversion, the size of the dataset remained the same.The table below illustrates the different variables and how their categorical values were encoded to make them suitable: Frais_etude Fees to pay for units taken per year 215,225,305,275,380,315,420,365,220,230,310,280,260,250,320,205,235,255,210,285,340,245,360,270,300,355

Data Processing and Analysis
The technique used to identify the best variables was Feature Selection.Feature Selection technique refers to obtaining a set of best features from the research variables in accordance with different criteria for selecting relevant variables from the dataset.This technique reduces the data by removing redundant and irrelevant variables (Cai et al., 2018).This technique is also important in the case of many input features because it reduces the dimension of the original problem, which sometimes leads to improved model performance (Remeseiro & Bolon-Canedo, 2019).
To process and analyze the collected data, the study used tools such as Pandas, Matplotlib, Numpy, Seaborn, Scikit Learn, Flask and others.The authors were able to analyze and process the data by checking for null or missing values, verifying variable types and converting categorical variables to numerical, deleting columns of less importance to the dataset and many other operations.

RESULTS AND DISCUSSION
In this section, the researchers focus on data analysis based on various statistics, such as the correlation between variables, and data distribution, and then the presentation of the various results obtained, and, finally, the discussion of these results with reference to those of predecessors.

Descriptive Statistics
Descriptive statistics were used to describe and summarize data.They were used to illustrate data with graphs, plots, histograms, and other graphics.They were used to summarize data by describing the relationship between variables, types of variables, how data are organized or dispersed, etc. (Kaur et al., 2018).This study was conducted to determine the best variables predicting the amount of academic fees to be paid per promotion.The authors first checked the distribution for the promotion column and then explored the correlation between the target variable (Frais_academiques) and the Promotion variable.Hence, the graph figure 2.
In order to determine the fees to be paid per promotion, nine predictor variables were identified.In the process of descriptive statistics for these variables, the authors studied the correlation between the target variable (Frais_ Academique) and the Promotion variable, as summarized in the following figure 3 14  350,500,435,600,540,700,345,360,305,440,605,390, 52 5,450,630,505,365,385,340,495,430,595,480,650, 355,5 20,445,625,380,545,475,535,485,575,490,510, 474,524, 339,344,429,544,354,479,434,384,499,444, 574   From the figure 3 results, the authors observe that academic fees are correlated to promotion.This means that a student moves up the promotion ladder, academic fees to be payed increase.

RESULTS
This section focuses on the presentation of the different results obtained according to the predefined methodology and specific objectives of this research.

Results to Achieve the First Research Objective
The first objective of this study was to identify variables that predict academic fees, or simply the variables that best predict academic fees.The researchers used the Feature Selection technique to achieve this objective, which seemed suitable for determining the factors that best predict (Soledad, 2022).The variables identified are shown in the figure 4. The filtering method, and especially the Mutual Information technique, stipulates that if the value of the variable is equal to 0, then this variable is not important and does not predict the target value.Consequently, there is no correlation between this characteristic variable and the target variable.In the opposite case, when the variable has a value greater than 0, this shows that there is an association or relationship between this variable and the target variable.Consequently, this characteristic variable predicts the target variable or Target (Soledad, 2022).From Figure 4, the variables that predict academic fees and have values greater than 1 are Annee_Acad, Promotion, Frais_ etude, TFC_memoire, Enseignants and Equipement.These are the six variables or features retained for the construction of the Deep Learning models thanks to the Feature Selection technique applied to the data collected during this investigation.The descriptions of these six variables are presented in Table 1.

Results to Achieve the Second Research Objective
The second objective of this study was to develop two regressive Deep Learning models in order to select the best-performing one that best predicts academic fees.To achieve this objective, the study developed two multi-layer neural network models and evaluated their performance.
The researchers developed a model with three hidden layers (RNA3C) and another with 4 layers (RNA4C).
The model test consisted of 25%.Table 2 shows the evaluation results for these two models:

CONCLUSIONS
This research dealt with Deep Learning to predict the amount of academic fees within Universities.The UAC was chosen as the sample of Universities in which the authors collected secondary data used in the training and testing phases of the models developed in this study.The authors' main objective was to develop and deploy a Deep-learning regression model that best predicts academic costs.Through the use of quantitative methods, the authors were able to collect secondary data from the UAC database.To arrive at the final results, this research involved a number of steps.In addition, the target population was UAC students.The analysis used python tools such as pandas, scikit learn, numpy and matplotlib.
In terms of contributions, compared to previous studies, this research has added the aspect of web deployment through the use of Flask technology, thus enabling endusers to have a user-friendly graphical interface when making predictions.Moreover, the variables used in previous studies were found to have no direct influence on the way academic fees are set in DRC universities, especially in UAC.As a result, the authors propose to continue improving this research.This study used only the multi-layer neural network.It is suggested that future researchers build models with other algorithms and increase the size of the training and test data in order to build a more robust model.

Figure 1 :
Figure 1: Flowchart of the used methodology of the study

Figure 2 :
Figure 2: Distribution of Promotion variable

Figure 3 :
Figure 3: Correlation between academic fees and promotion

Figure 4 :
Figure 4: Selection of variables that best predict academic fees

Table 1 :
Description of the used Data

Table 2 :
Performance measurement results for the models developed Makaya completes the form with the information shown in figure 6.In the first text field, Makaya put 2022-2023 as the academic year (encoded in Table1 by 5), the second text field receives the promotion L2SDM, encoded in 1.The tuition fees are set at 225 US dollars, the dissertation direction fees are set at 60 US dollars, the travel management deposit for visiting teachers is set at 20 and the laboratory equipment maintenance fees are set