Autonomic Face Mask Detection with Deep Learning: an IoT Application

A new and deadly virus known as SARS-CoV-2, which is responsible for the coronavirus disease (COVID-19), is spreading rapidly around the world causing more than 4 million deaths. Hence, there is an urgent need to find new and innovative ways to reduce the likelihood of infection. One of the most common ways of catching the virus is by being in contact with droplets delivered by a sick person. The risk can be reduced by wearing a face mask as sugges-ted by the World Health Organization (WHO), especially in closed environments such as classrooms, hospitals, and supermarkets. However, people hesitate to use a face mask leading to an increase in the risk of spreading the disease, moreover when the face mask is used, sometimes it is worn in the wrong way. In this work, an autonomic face mask detection system with deep learning and powered by the image tracking technique used for the augmented reality development is proposed as a mechanism to request the correct use of face masks to grant access to people to critical areas. To achieve this, a machine learning model based on Convolutional Neural Networks was built on top of an IoT framework to enforce the correct use of the face mask in required areas as it is requested by law in some regions.


Infectious diseases such as Influenza and Coronavirus
Disease 19  cause millions of deaths around the world [1] [2] . The pathogens of such diseases are mainly spread by droplets or aerosols as a result of cough, sneeze, etc [3] . Nowadays, due to the pandemic situation caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) virus, there is an urgent need to limit airborne transmission of COVID-19. The target is to develop and implement effective methods or mechanisms to reduce the number of particles such as viruses from the air.
Dissemination of infectious pathogens in crowded areas can be significant and, in many scenarios, the requirement is to implement mechanisms to protect people from being exposed to pathogens [4] . One of the most popular mechanisms is the use of a face mask which in some regions is required by law [5] . The World Health Organization (WHO) issued a guide to the use of face masks as a mechanism to reduce the risk of exposure to the COVID-19 [6] . In the document, the WHO states: "Place the mask carefully, ensuring it covers the mouth and nose, and tie it securely to minimize any gaps between the face and the mask". The guide aims to help people understand the benefits of using a face mask and the risks associated with not wearing or misusing. Despite the requirements and regulations, people hesitate to wear a mask, or they wear it in the wrong way.
The proliferation of SARS CoV-2 has affected all the countries over the world, and technology has an important role to play in this matter. Today's technology has enabled some areas such as schools to continue in operation, but there are some other areas or jobs that still require face-to-face contact, for instance, hospitals. To reactivate the economy, a certain level of on-site or face-to-face activity is needed [7] , but always observing healthcare regulations such as wearing a face mask.
The IoT together with AI techniques could work to provide interesting solutions for the COVID-19 pandemic.
Internet of Things (IoT) techniques has been crucial against this pandemic, especially for detecting and tracking infected people. In [8] authors proposed a system using IoT for collecting vital signs from different users. With this system, important data can be collected and analyzed for a better understanding of the symptoms and from the virus. On the other hand, artificial intelligence (AI) has been very important to fight this pandemic. Examples of applications are the algorithms to detect if a person is infected or not with COVID-19. An image classification algorithm is proposed in [9] using deep learning to detect infections in X-ray images. With these algorithms, the images could be processed and improved to help the doctors to have better results in their diagnosis.
To keep track of the people wearing face masks, a surveillance camera could be used for detecting in realtime when someone is using a mask or not, this is possible thanks to the development of AI. In [10] the authors proposed a method for detecting anomalies in surveillance videos using deep learning techniques. One of the advantages of using AI is that a single person does not need to be aware of the place in every moment. This paper shows the implementation of a face mask detection system, using augmented reality as tracking mechanism to trigger a screen projection on a mobile device which is used to request access to critical areas where the correct use of face mask is a requirement. To achieve this, a machine learning model based on Convolutional Neural Networks is built on top of an IoT framework to enforce the correct use of the face mask in required areas.

Cyber-Physical Systems
Cyber-Physical Systems (CPS) refers to a combination or integration between the physical part and the computations of a system, mainly focused on their interaction [11] . Although this integration is not new, as embedded systems have been around for a while [12] , the term CPS is kind of new, in 2006 Helen Gill presented this term and associated this relation with another concept called cybernetics [11] .
CPS is growing very fast, and its growth is closely related to the growth of other technologies such as the internet of things and cloud computing. The applications of this kind of system are very wide, some important ones are health care, smart cities, industrial processes, and machine connectivity just to mention a few.

Deep learning
Deep Learning is one of the main subjects of machine learning. Deep learning algorithms are composed of multiple layers to represent learning at different levels; this representation is inspired by biological neural networks [13] . Deep learning uses this Artificial Neural Networks (ANN) to feed a machine with information and generate knowledge without human interaction.
Over the last few years, Deep Learning has been a trend in AI and Machine Learning systems. It's widely used in several applications such as speech recognition, object detection, natural language processing (NLP), image classification, and many more [14] .
An important asset for Deep Learning is data; a lot of data is needed to give the machine enough information to make good decisions. These algorithms use the new information to change the internal parameters in the ANN for better future performance [14] .

Convolutional Neural Network
Convolutional Neural Networks (CNN) has been widely used in recent years for real-time application such as face detection [15] . This class of networks can automatically extract some features from the input data and assign relevant data, such as weight. This is called the Convolutional layer. Once the features are extracted, then the next layer processes the data in different layers to apply filters and reduce the number of parameters, this is the Pooling layers [16] .
A basic example of CNN is shown in Figure 1.

Image Classification
Image recognition and classification are difficult tasks for machines [17] , deep learning methods are used to process the images to get better data and perform the classification, this process could be: noise reduction, slight improvement, color correction, etc. Multiple images are needed to feed the algorithms to get better results. There are techniques to improve the training data on an algorithm; these techniques are responsible for improving the quality and quantity of the data so that the algorithms work better in different types of environments; this is called data augmentation [18] .

Internet of Things
The Internet of Things (IoT) also well known as the Internet of Objects or the Internet of Everything, refers to the interconnected network of all kinds of objects, which are often equipped with data processing technology [19] . Experts estimate that by the end of the year 2025 there will be approximately 75 billion devices connected to the internet [20] .

MATERIALS AND METHODS
In this project, it is fundamental to integrate several technologies in which the communication will persist and be consistent from the diffuse to the receptor, this is, from the physical machine to the digital information visualization system. A messaging protocol for sensors and mobile devices, MQTT (Message Queue Telemetry Transport) is a well-known lightweight messaging protocol for IoT systems [21] , widely used to communicate and manage message transport from publishers to clients. This protocol must relate to other technologies to get all its potential. This section describes each one of these components that are part of all the systems to be a bridge on each layer from the overall framework.
An algorithm to explain each of the steps followed in this project is presented in Figure 2. Each step will be described in the following subsections. Algorithm that presents the system implementation. Figure 3 shows the IoT framework for this project.

Methodology
This framework is based on the architecture proposed and explained by the authors in [22] . In the center, it is

Face Mask Detection Algorithm
To detect in real-time whether the person in front of the camera is wearing a mask or not, a detector model was made. To train the model, the images were taken from the Kaggle Face Mask Detection Dataset [23] . This    [23] .
The model uses CNN and deep learning to extract and process the data to give a classification output. The CNN is designed using Keras and Tensorflow libraries from Python, and the MobileNetV2 architecture. The architecture shows an acceptable performance with low computational power [14] , this makes this model suitable for embedded. Once the model was trained, it was deployed to the raspberry pi and camera to start the real-time detection, as shown in Figure 5.

System Modelling
Cyber-physical systems, like the one presented in this paper, can be modeled using state machines to represent their behavior. For the design of the dynamics of the access control system, MATLAB's Simulink was used. Figure 6 shows the layout where its operation is described. The first state is searching for an access attempt that will be made from the mobile app. Once an access attempt is detected, the system jumps to the next state which is in charge of checking the connection with the MQTT server, if the connection is successful, it goes to the other state, otherwise, it returns to the first state, and the user must retry the access until there is a successful connection; in this state, user information is sent to a database. The third state checks whether the person who wants to enter has a mask; in case of using one, the access is granted, and the door lock state is sent to be opened, in case of not having the mask, entry will be denied.
The database subsystem is receiving the user's information, store and transmit it to a dashboard designed in HTML where all the access attempts can be visualized. The door lock subsystem is responsible for controlling the servo motor or any other lock mechanism that may be selected.

Mobile Application
The Android-based mobile application runs the Image Tracking Detection developed with the Graphical Motor Unity 3D and the Vuforia SDK Engine.
Unity is a very popular video game engine to create the most sophisticated video games and a wide range of interactive apps for several kinds of users and industries.
Vuforia Engine is a straightforward integration software development kit (SDK), that uses the newer techniques in Computer Vision for tracking or recog-nizing images and objects for Augmented Reality applications [24] . It consists of controlling a camera sensor that captures the frames and passes them to the computer vision algorithms that analyze the datasets that detect and track real-world objects and compare them with the Vuforia web-based developer registered targets [25] .
The Vuforia Engine SDK and Unity Engine's advantages to track and display content on the handheld device are applied to this work.

Detection Model Performance
The metrics used to evaluate the performance of the detection model are Precision, Recall, F1-score, and Accuracy. The explanation of these metrics is described below.     The obtained metrics for the face mask detector model after the training are presented in Figure 9.  The behavior of the model after 20 epochs of training can be seen in Figure 10. As it can be seen, the training loss decreases as the model is being trained, while the accuracy of the model increases. The total training time was close to 40 minutes.
The Confusion Matrix presented in Figure 11 can lead to a better understanding of the model's results and shows where it gets confused.

Limitations and Future work
One limitation of the current model presented in this paper is that it was only trained with people wearing or not wearing a mask. Cases, where the person may be using the mask incorrectly, were not taken into consideration, although these cases usually classify them as not wearing masks (see Figure 5). A third class could be added with cases when the person is wearing the mask incorrectly, this would help the detector to perform better. with False Positives, which can be interpreted as good, since the system will make fewer mistakes when given access to people not wearing a mask.
The integration and connection between all the devices are made possible thanks to the application of IoT. User access is requested with their mobile device through image tracking (with the Vuforia app), this access reaches a Mosquitto server with MQTT which is also responsible for sending this request to the embedded device (raspberry pi 4) in charge of granting the access using the camera and the face detection model.
This occurs in a time between 2 to 2.5 seconds, which can be reduced if a higher graphics processing device is used.
The use of face masks is essential in times of pandemic, and measures must be taken to ensure that people who leave their homes always use one when entering public places or where there is a lot of contact with other people, conditions of high risk for the infec-tion of COVID-19. This project shows how technologies such as the IoT, artificial intelligence, and augmented reality can be integrated to help with this problem.
With this system, a healthy culture can also be educated where the use of the mask is mandatory and essential to the "new normal" life.
The access system has the potential to be installed in different areas and adapted according to the needs of the establishment. The results shown in this work revealed an efficient system to control and collect information remotely, without the need for face-toface monitoring.
A face mask detection system using artificial intelligence and powered by IoT technologies, like the one shown in this paper, has a wide application potential.
Everything seems to indicate that the use of face masks will be a measure that should be adopted in different work centers and crowded places. The experience with COVID-19 should be used for the next health contingencies that could potentially occur in the following years.