Introduction
Advances in computer vision (CV) aim at automating the process of detecting features in human faces for a myriad of applications. CV enables facial recognition for biometric security of devices5, changes in facial characterizations due to developing disease, gauging a person’s level of alertness while driving6, predicting a pedestrian’s intent to enter the road as interpreted by an autonomous vehicle7, and many other critical applications. These advances however come with a price—the users’ privacy is at risk. In this new age, where data is extremely important as well as valuable and potentially very dangerous in the wrong hands, it is paramount that privacy concerns be addressed fully. Privacy is almost always an afterthought in a CV solution design, only addressed at the end to close the security gaps that should have been addressed earlier in the design phase. Between 1999 and 2014, the U.S. government fined companies more than USD 130 million for privacy violations1—a symptom suggesting that companies still lack the necessary measures and processes to comply with privacy regulations.
We advocate that privacy needs to be considered early on as part of a system design requirement. This strategy enables:
- better integration of privacy measures to guide the development process and design decisions,
- data minimax principle; that is, using as much data as is required for an application and nothing more,
- eliminating the need for computationally expensive privacy by encryption techniques by employing better privacy-preserving methods customized to each application.
In this article, we discuss how to design for privacy preservation in a face detection framework. The design approach enables the extraction of facial features and does not compromise the user’s identity.
Privacy Preservation Approach for Features Detection
To preserve the privacy of users, some applications detect and blur faces in pictures and videos taken; for example, by an autonomous vehicle. Obviously, blurring a face makes learning/training from it much more difficult but not impossible, depending on the application. Our approach is to couple the photo data with the features learned by a face detection neural network. These features can be attached to a photo, similar to embedding geotags within photo metadata.
Figure 1. The architecture of the O-Net stage of the MTCNN.
For our discussion, we use a multitask convolutional neural network (MTCNN)3 to detect features of a human face. The MTCNN generates intermediary feature maps that provide an interesting perspective in terms of what the MTCNN architecture sees and eventually learns. For example, after the first two convolution and pooling layers, the third cascade learns what closely resembles an eigenface2 (an average image of what a face should look like).
Figure 2a
Figure 2b
Figure 2c
Examples of learned feature maps from the O-Net stage
The MTCNN works as a function of three cascaded convolutional neural networks (CNNs). The first CNN proposes facial regions, via P-Net, and then the second CNN refines those results, via R-Net. The last CNN is an output network, O-Net, that determines the best bounding box regressions for faces and five facial attributes per face. As shown in Figure 1, these filters/feature maps could most definitely be passed along to work in scenarios where one needs to do a binary face classification. Not only does the MTCNN learn general faces, later feature activations pulled from the O-Net can be used to glean how it targets specific facial attributes such as the nose and the corners of the eyes, mouth, and so on.
Figure 3. Estimated facial attributes pulled from last convolution in O-Net.
Unfortunately, given that CNNs typically generate a hierarchy of features, the more complex they become, the harder they are to intuitively recognize via their feature maps. In both instances, each subfigure is reminiscent of Haar-like features used in the Viola-Jones face detector4. Figure 2a resembles a feature map or Haar-like features that could be seen as the bridge of a nose due to the relatively bright activations in the middle and starkly dark non-activations on each side. Also, it is worth noting that the activations gradually grow darker, insinuating some kind of illumination or special coloration of the nose. Figure 2b resembles a feature map that could easily be associated with the corner of one of the key facial attributes such as the eye or the mouth. However, packaging all of these feature maps as metadata allows consumers to leverage what has already been learned from a dataset, thus saving them precious compute time.
Conclusions
In this article, we advocate the need for embedding privacy measures early in designing CV methodologies. We show, with examples, a process to detect features in human faces while preserving their identities and hence privacy. We believe that similar techniques can be applied to different machine learning algorithms and we pursue that path in our future research.
About the Authors
Cory Ilo is a computer vision engineer in the Automotive Solutions group at Intel. He helps prototype and research the feasibility of various computer vision solutions in relation to privacy, ethics, deep learning, and autonomous vehicles. In his spare time, Cory focuses on his passion for fitness, video games, and wanderlust, in addition to finding ways on how they tie into computer vision.
Iman Saleh is a research scientist with the Automotive Solutions group at Intel. She holds a Ph.D. from the Computer Science department at Virginia Tech, a master’s degree in Computer Science from Alexandria University, Egypt, and, a master’s degree in Software Engineering from Virginia Tech. Dr. Saleh has 30+ technical publications in the areas of big data, formal data specification, service-oriented computing, and privacy-preserving data mining. Her research interests include machine learning, privacy-preserving solutions, software engineering, data modeling, web services, formal methods, and cryptography.
References
- Top 20 Government-imposed Data Privacy Fines Worldwide, 1999-2014.
- M. Turk and A. Pentland (1991). Face recognition using eigenfaces (PDF). Proc. IEEE Conference on Computer Vision and Pattern Recognition pp. 586–591.
- K. Zhang, Z. Zhang, Z. Li and Y. Qiao. Joint Face Detection and Alignment Using Multi-Task Cascaded Convolutional Networks. IEEE Winter Conference on Applications of Computer Vision, 2014.
- P. Viola and M. Jones. Robust Real-time Object Detection. IJCV 2001 pp. 1, 3.
- IPhone X. Apple*.
- W. Zhang, et al. Driver Drowsiness Recognition Based on Computer Vision Technology. Tsinghua Science and Technology, vol. 17, no. 3, 2012, pp. 354–362.
- M. Garzon, D. G. Ramos, A. Barrientos and J. Del Cerro. (2016). Pedestrian Trajectory Prediction in Large Infrastructures - A Long-term Approach based on Path Planning., pp. 381-389.