Application of Three Convolutional Neural Network Algorithms for Occluded Face Identification and Recognition for System Security

Deep Learning techniques in computer vision have become indispensable elements in bio- metric systems, especially face recognition. Facial recognition can be reliably used as an identification and authentication tool for premises or network access security. The masks wearing, which is one of the problems of concealment, are nowadays part of our habits for preventing COVID-19 disease, which leads to an obstruction of facial recognition. Occulted face recognition is one of the most challenging problems biometrics deals with. This paper presents convolution neural network algorithms for occluded face recognition. Our study presents a robust method using algorithms such as ResNet-50, VGG-19, and DenseNet-201 to contribute to occluded face recognition. Various parameters are used for this experiment, such as the cross-entropy used as a loss function and optimization algorithms adapted to deep learning. These include the SGD, Adam, and RMSProp optimizers. The convolution neural network algorithms were evaluated on the AR database. This experiment gave results that ranged from 94.81 to 99.81% for SGD, from 0 to 96.92 for Adam, and finally from 0 to 96.92 for RMSProp. DenseNet-201 algorithm using the SGD optimizer obtained the best score with 99.81%, and all the performance metrics used such as accuracy, MSE, F-score, recall, and MCC were used to confirm this good performance.


INTRODUCTION
The significant increase in network security breaches, data breaches, and identity theft requires the design of robust security systems including biometrics. To circumvent biometric face authentication, some fraudsters are turning to face occlusion. Because biological and physical characteristics are unique to each individual, biometric security consists of measuring these characteristics before accessing an environment or a computer tool. Face recognition is a key research issue in computer vision. In recent years, researchers have proposed many algorithms; most previous biometric-based research exploiting physiological and behavioral characteristics, including human emotion signals and expression, has achieved satisfactory recognition performance under uniform lighting conditions with frontal face images (Jiang et al., 2020). However, illumination, facial expression, pose, occlusions, and facial recognition methods are still affected. The development of research in facial recognition has led to a high level of performance in many applications. It is a field of study that remains very challenging because images of the same person seem to differ due to several phenomena, including occlusion (Wu et al., 2019). In addition to existing security methods, face biometrics, especially with occlusion, can be used to protect cyberspace from hackers and malicious people among users of networks, the internet, connected devices, etc. Among the various problems associated with a face recognition system, occlusion management is one of the most difficult problems to solve. Due to objects or elements such as sunglasses, scarves, or masks, the occlusion problem becomes eminent. One of the most recent problems is the wearing of the mask recommended by the health measures against the coronavirus disease . Cloaked face images mainly degrade the performance of face recognition systems, thus the need for a robust cloaked face system is necessary for realworld applications. In this perspective, we will use new approaches based on machine learning, more precisely on deep learning, which extracts hierarchical and semantic structures existing in images; these are convolutional neural networks used. Convolutional neural networks, which are multilayer perceptrons coupled with convolutional layers, are part of deep learning approaches and have become indispensable for detection and recognition in computer vision (Siegmund et al., 2021). They can extract landmarks by themselves . Pre-trained convolutional neural network algorithms allow for transfer learning, which transfers the skill learned on one dataset to adapt it to a new dataset it will be faced with (Arnia et al., 2021). The algorithms used were trained on the image database, namely ImageNet. The study contributes to recognizing hidden faces by comparing three deep learning algorithms, , based on pre-trained convolutional neural network algorithms. These algorithms will be used for the face recognition of occluded faces from the reference AR face database, and it will be a question of comparing the performances by playing on the following parameters: • All parameters will evaluate the performance of the previous algorithm and select the one with the best accuracy on the reference data used. The challenges of face occlusion and deep learning could bring innovation to the security of computer networks and cybersecurity. This article is organized as follows: Section 2 presents some previous work. Section 3 offers convolution neural networks and details our proposed methodology and hardware. Section 4 presents the analysis of our experiments and results, evaluates and compares our used algorithms, also discusses the obtained results with other works, and finally, we will conclude.

LITERATURE REVIEW
In this section, we will review some face recognition work with occlusion. Methods addressing face recognition with occlusion include finding features or classifiers that tolerate corruption. For example, Aleix M. Martinez proposed a probabilistic approach that can compensate partially occluded faces (Martinez, 2002). Park et al. proposed a deep convolution neural network architecture for efficient multi-person and multi-angle face recognition, this achieved the identity confidence by using a classifier for these features. The experimental results showed that the accuracy of identity recognition could reach 90.61% (Tsai et al., 2018). Montera et Al. proposed a face recognition method performed using a hybrid process that combines Haar Cascades and Eigenface methods, which can detect multiple faces (55 faces) in a single detection process with an accuracy level of 91.67% (Mantoro et al., 2018); Lu et al. proposed a partial occlusion face recognition algorithm based on a recurrent neural network that yields a result that ranges from 88.49 to 98.45% (Zhang et al., 2020), Wu et al. proposed a method based on deep learning for occluded face recognition with the result that reaches 98.6% (Wu et al., 2019).

MATERIALS AND METHOD
Machine learning is a branch of artificial intelligence (AI) that uses algorithms to enable computer systems to infer patterns from data. It has many applications, including bioinformatics, fraud detection, finance, human resource and risk management, market analysis, image recognition, and natural language processing (Praseetha et al., 2019). New face recognition methods extract the best features from images and tend to learn these features using deep convolution neural network architectures (Idelette Kambi Beli & Guo, 2017). This has led to the extraordinary success of famous convolutional architectures such as VGGNet, GoogleNet, ResNet, etc. (Zhou et al., 2018).
Deep learning is one of the most widely used machine learning techniques that has been hugely successful in applications such as anomaly detection, image detection, pattern recognition, and natural language processing (Praseetha et al., 2019). But training deeper neural networks is challenging due to the vanishing gradient and degradation problems (Reddy & Juliet, 2019). However, there are four (4) significant families of deep learning algorithms, deep neural network, convolutional neural network, recurrent neural network, and deep belief network (Singh et al., 2020). Our study will use convolutional neural networks such as ResNet-50, VGG-19, and Dense-Net-201.

ResNet-50 model
ResNet-50 model is a convolutional neural network 50 layers deep; Microsoft built and trained it in 2015 (He et al., 2015). This model was trained on more than one million images from the ImageNet database, it can classify up to 1000 objects, and the network was trained on 224x224 pixel-colored images. It contains 33 623 012 parameters.

VGG-19 model VGG-19 model is a convolutional neural network 19 layers deep; it was developed by the Visual Geometry
Group of the Department of Engineering Sciences at Oxford University. This model has been trained on over a million images from the ImageNet database, it can classify up to 1000 objects, and the network was trained on 224x224 pixel-colored images. It contains 21 560 484 parameters.

DenseNet-201 model
DenseNet-201 model is a convolutional neural network of 201 layers of depth. It was implemented by Huang et al. (Siegmund et al., 2021). This model was trained on more than one million images from the ImageNet database, it can classify up to 1000 objects, and the network was trained on 224x224 pixel-colored images. It contains 21 202 084 parameters.

AR faces database
The database used is the AR face database (AR Face Database Webpage, s. d.). It contains more than 4,000 colored faces of 126 persons, namely 70 men and 56 women. Our method is as follows: Step 1: • Preprocessing • Splitting data Step 2: • Loading of models • Collecting extraction features • Flatten data • Activation function • Model training Step 3: • Classification • Face recognition

Preprocessing
• In this step, we will retrieve each image from our database to add it to a list and each label to another.
• Then, we will get the number of categories in our database to transform our list of labels into a matrix of size corresponding to the number of classes.
• Finally, we will normalize our images in the value interval [0;1].

Splitting data
• We will divide our dataset into two subsets: • The first subset will be called the training dataset, which will be used to allow our model to do its learning.
• The second subset will be called the test dataset, which will be used to evaluate the learning of our model by testing the results obtained with the expected results

Loading models
We proceed to the loading of our model by passing the size of our images as a parameter without the fully connected layers of the model.

Collection of extraction features
We will use the convolution and pooling already trained in our model for the feature extraction of our images.

Flatten data
We will reduce the input dimensions of our data by adapting it to the input dimensions of the model.

Activation function
We used the softmax activation function because we have multiple categories, and softmax is efficient for multiclass classification. The mathematical representation of the softmax activation is :

Model training
We proceed to the training phase of the model in 15 epochs with different optimizers and batch sizes of 4,8, and 16.

Classification
We classify our test data according to the categories. Our data contains 100 categories that range from 0 to 99. The loss function used for our work is the cross-entropy to evaluate the loss during classification, its equation is as follows: With test(x): vector containing the values of labels to be predicted and pred(x) is the vector containing values of labels provided by our softmax activation function. We will use the accuracy to evaluate the model. Its formula is as follows:

Face recognition
We proceed to recognize each image according to its classification in a category.

EXPERIMENTAL AND RESULTS
We trained the models on a Windows 10 system with an Intel(R) Core™ i7-8650U processor, 16 GB of randomaccess memory (RAM), and an NVIDIA GeForce MX150 graphics processing unit (GPU). The models are configured in Python using the Keras version 2.4 API with the TensorFlow version 2.4 backend and CUDA/ CuDNN dependencies for GPU acceleration (Artificial Neural Networks. Pt. 3, 2010).

Setting
We used a batch size of 4,8, and 16 for 15 epochs for each method. Our study will use cross-entropy as a loss function and optimization algorithms suitable for deep learning to train the chosen models. These algorithms will directly affect the efficiency of the models in our study. The optimizers we will use are: • SGD • Adam • RMSProp SGD SGD implements the stochastic gradient descent optimizer with a learning rate and momentum. The stochastic gradient algorithm is a gradient descent method that minimizes an objective function written as a sum of differentiable functions. The learning rate was set to 0.0001 with a momentum of 0.9 (Team, s. d.).

Adam
Adam is a stochastic gradient descent method based on adaptive estimation of first and second-order moments. Its implementation is quite simple and computationally efficient, and its memory usage is optimized and well adapted to significant data volume problems (Kingma & Ba, 2014).

RMSProp
Root Mean Squared Propagation, or RMSProp, is an extension of gradient descent using a decreasing average of partial gradients to adapt the step size for each parameter. Using a decreasing moving average allows

Summary of results
The results of the different models are shown in the table below: SGD  Table 2 show the results obtained from our models on the different parameters; the DenseNet-201 model got the best results for optimizer SGD  Table 2 show that the Res Net-50 model has a better score on batch sizes 4 and 8, while Dense Net-201 has the best result on batch size 16 for optimizer Adam  Table 3 show the stability of the ResNet-50 model with a better score on all parameters used.

Evaluation Metrics
To validate the performance of the pre-trained models in our study, we will use the following metrics: Precision is intuitively the ability of the classifier not to label as positive a sample that is negative An estimator's mean square error (MSE) measures the average of the squared errors, i.e., the mean square difference between the estimated and actual values. It is a risk function corresponding to the expected value of the squared error loss. It is always non-negative, and values close to zero are better. The following equation defines it With Yi: the observed data and: the predicted values Recall is the ability of a classifier to determine actual positive results F1 score can be interpreted as a weighted average of precision and recall, where an F1 score reaches its best value at one and its worst score at 0. Matthews Correlation Coefficient (MCC) is used in machine learning to measure the quality of classifications. Its value is essentially between -1 and +1. A coefficient of (+1) represents a perfect prediction, 0 represents a random prediction, and (-1) represents an inverse prediction. The statistic is also known as the phi coefficient.    Comparative results of occluded and not occluded faces with the best model We will use the model that provides the best results to observe the effect of occultation on the dataset images. We contact here that the occultation has impacted the result.

State-of-the-art comparison
The comparison of our study with the literature methods shows that we perform better using the SGD optimizer with epochs of 4. The table below displays this comparison.  (Wu et al., 2019) 98.60 WAN and CHEN    Figure 4. SGD optimizer is the most optimal for our study. In the paper, Wu et Al. proposed a POOA (positioning the optimal occlusion area) algorithm for solving the occluded face detection problem with the use of a robust principal component analysis method to obtain a result of 98.60% [2], while WAN and CHEN proposed a MaskNet plan coupled with convolutional neural network to get a result of 93.8% on AR face database [29] used in our study. Our study used three convolutional neural network methods for face recognition with occluded. We found

CONCLUSION
From the results obtained in this study, we can conclude: First, a comparison between different models was used to show that the most optimal result was obtained with DenseNet-201 using SGD optimizer with a batch size of 4. We find that VGG-19 failed to adapt to Adam and RMSProp optimizers with its poor results obtained during experiments. Finally, as a robust security tool, occlusion face recognition can be improved by using pre-trained convolutional neural network models. We can see that results from our study produce better results than studies in the discussion. Thus, in future studies, we can use other convolutional neural network models using techniques that will allow us to increase data for more accurate results.