Face detection algorithm based on improved S3FD network

LI Yuhao; LÜ Xiaoqi; GU Yu; ZHANG Ming; LI Jing

doi:10.7510/jgjs.issn.1001-3806.2021.06.008

Volume 45 Issue 6

Nov. 2021

Article Contents

Turn off MathJax

Article Navigation > LASER TECHNOLOGY > 2021 > 45(6): 722-728

Citation:

Face detection algorithm based on improved S3FD network

1.
Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, College of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China
2.
Institute of Information Engineering, Inner Mongolia University of Technology, Hohhot 010051, China
3.
School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China

Corresponding author: LÜ Xiaoqi, lxiaoqi@imut.edu.cn ;
Received Date: 2020-12-22
Accepted Date: 2021-01-23

Abstract

In face detection, the small target face carries less feature information and is relatively fuzzy, which leads to higher detection difficulty. In order to solve this problem, a novel algorithm was designed. The network that combines the single shot scale-invariant face detector (S3FD) network with the channel and the spatial attention mechanism was used as the backbone, and the channel and the spatial establish the weight relationship between the features, which strengthens the feature extraction ability. Then, the receptive field of the original S3FD output feature map was expanded and then up-sampled, so that the output of the feature map of the previous layer includes the features of the feature map of the next layer. Result: The average precision (AP) values of this algorithm on the three different levels of widerface verification datasets are 95.0%, 93.7%, and 86.4%, respectively, which are increased by 1.3%, 1.2%, and 0.5% compared with the original S3FD. The algorithm proposed in this paper has a better detection effect in face detection.
- image processing,
- face detection,
- small target,
- attention mechanism,
- deep learning

References

[1]	ROWLEY H A, BALUJA S, KANADE T. Neural network-based face detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002, 20(1): 23-38.
[2]	ROWLEY H A, BALUJA S, KANADE T. Rotation invariant neural network-based face detection[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 1998: 38-44.
[3]	VIOLA P, JONES M J. Robust real-time face detection[J]. International Journal of Computer Vision, 2004, 57(2): 137-154. doi: 10.1023/B:VISI.0000013087.49260.fb
[4]	MATHIAS M, BENENSON R, PEDERSOLI M, et al. Face detection without bells and whistles[C]// European Conference on Computer Vision. Zurich, Switzerland: ECCV, 2014: 720- 735.
[5]	YAN J, LEI Z, WEN L, et al. The fastest deformable part model for object detection[C]//Computer Vision and Pattern Recognition. New York, USA: IEEE, 2014: 2497-2504.
[6]	ZHU X, RAMANAN D. Face detection, pose estimation, and landmark localization in the wild[C]//Computer Vision and Pattern Re-cognition. New York, USA: IEEE, 2012: 2879-2886.
[7]	KRIZHEVSKY A, SUTSKEVER I, HINTON G, et al. ImageNet classification with deep convolutional neural networks ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386
[8]	LEI H L, ZHANG B H. Crowd count algorithm based on multi-model deep convolution network integration[J]. Laser Technology, 2019, 43(4): 476-481(in Chinese).
[9]	CHEN Q X, WU W Ch, ASKAR H. Detection algorithm based on multi-scale spotted target modeling[J]. Laser Technology, 2020, 44(4): 520-524(in Chinese).
[10]	LI H, LIN Z, SHEN X, et al. A convolutional neural network cascade for face detection[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2015: 5325-5334.
[11]	QIN H, YAN J, LI X, et al. Joint training of cascaded CNN for face detection[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2016: 3456-3465.
[12]	YANG S, LUO P, LOY C C, et al. From facial parts responses to face detection: A deep learning approach[C]//International Confe-rence on Computer Vision. New York, USA: IEEE, 2015: 3676-3684.
[13]	ZHANG K, ZHANG Z, LI Z, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503. doi: 10.1109/LSP.2016.2603342
[14]	JIANG H, LEARNED-MILLER E. Face detection with the faster R-CNN[C]//Automatic Face and Gesture Recognition. New York, USA: IEEE, 2017: 650-657.
[15]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149.
[16]	ZHU C, ZHENG Y, LUU K, et al. CMS-RCNN: Contextual multi-scale region-based CNN for unconstrained face detection[EB/OL]. (2016-06-16)[2020-12-22]. https://arxiv.org/pdf/1606.05413.pdf.
[17]	WAN S, CHEN Z, ZHANG T, et al. Bootstrapping face detection with hard negative examples[EB/OL]. (2016-08-07)[2020-12-22]. https://arxiv.org/pdf/1608.02236.pdf.
[18]	ZHANG S, ZHU X, LEI Z, et al. S3fd: Single shot scale-invariant face detector[C]//International Conference on Computer Vision. New York, USA: IEEE, 2017: 192-201.
[19]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]// European Conference on Computer Vision. Amsterdam, The Netherlands: Springer International Publishing, 2016: 21-37.
[20]	ZHU C, TAO R, LUU K, et al. Seeing small faces from robust anchor's perspective[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2018: 5127-5136.
[21]	LI J, WANG Y, WANG C, et al. DSFD: Dual shot face detector[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2019: 5055-5064.
[22]	GU Y, LU X Q, YANG L D, et al. Automatic lung nodule detection using a 3D deep convolutional neural network combined with a multi-scale prediction strategy in chest CTs[J]. Computers in Biology and Medicine, 2018, 103: 220-231. doi: 10.1016/j.compbiomed.2018.10.011
[23]	LI Z, TANG X, HAN J, et al. PyramidBox^{+ +}: High performance detector for finding tiny face[EB/OL]. (2019-08-07)[2020-12-22]. https://arxiv.org/pdf/1904.00386.pdf.
[24]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2020-12-22]. https://arxiv.org/pdf/1409.1556.pdf.
[25]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]// European Conference on Computer Vision. Munich, Germany: ECCV, 2018: 3-19.
[26]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Computer Vision and Pattern Recognition. New York, USA: IEEE, 2017: 936-944.
[27]	LIU S, HUANG D, WANG Y. Receptive field block net for accurate and fast object detection[J]. Lecture Notes in Computer Science, 2018, 11215: 404-419.
[28]	SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Computer Vision and Pattern Recognition. New York, USA: IEEE, 2014: 1-9.
[29]	BA J, MNIH V, KAVUKCUOGLU K. Multiple object recognition with visual attention[EB/OL]. (2015-04-23)[2020-12-22]. https://arxiv.org/pdf/1412.7755.pdf.
[30]	YANG S, LUO P, LOY C C, et al. Wider face: A face detection benchmark[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2016: 5525-5533.
[31]	ZHANG H Sh. Research on key algorithms of face detection and face recognition in video surveillance[D]. Chengdu: University of Electronic Science and Technology of China, 2019: 11-48(in Chinese).
[32]	WANG M, SU H S, LIU G H, et al. Classroom face detection algorithm based on convolutional neural network[J]. Laser & Optoelectronics Progress, 2019, 56(21): 211501(in Chinese).

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(11) / Tables(2)

Get Citation

PDF

XML

Article views(4548) PDF downloads(29) Cited by()

Proportional views

HTML

引言

人脸检测是计算机视觉领域主要关注点，在人脸对齐、人脸分析、人脸识别和人脸跟踪等领域有着广泛的应用。给定一幅图像，人脸检测的目标是确定是否存在任何人脸。如果有，则返回每个人脸的边界框。早期人脸检测采用模板匹配技术与手工特征，其代表性成果是ROWLEY等人提出的方法^[1-2]。ROWLEY的方法有不错的精度，但由于分类器的设计和密集滑动窗口的采样设计导致速度太慢。2001年, VIOLA和JONES^[3]设计了Viola-Jones(VJ)框架。之后部分可变形模型(deformable part models，DPM)被一些工作^[4-6]用来处理人脸检测任务。随着机器学习以及深度学习在计算机视觉领域的不断发展，且卷积神经网络(convolutional neural networks，CNN)在ImageNet分类任务^[7]取得进展，利用神经网络进行目标检测逐步成为主流技术方法^[8-9]。级联网络(CascadeCNN)^[10]延续了VIOLA和JONES的想法, 实现了不错的检测效果。QIN等人^[11]提出整体训练CascadeCNN，从而实现优化。Faceness利用人脸属性分类的卷积神经网络来检测部分遮挡的人脸^[12]。多任务卷积神经网络(multi-task convolutional neural networks, MTCNN)^[13]进一步拓展了级联CNN的思想, 还有部分采用目标检测领域的思想，将整体网络结构分为P-Net, R-Net和O-Net。JIANG等人^[14]将faster R-CNN^[15]应用到人脸检测取得较好的效果。基于上下文的多尺度区域的卷积神经网络(contextual multi-scale region-based CNN, CMS-R-CNN)^[16]采用上下文信息帮助进行人脸检测从而提升了性能，将网络整体分为上部分采用区域生成网络(region proposal network，RPN)进行检测, 下部分结合人体的人脸长宽等信息进行检测。WAN等人^[17]将faster R-CNN与困难负样本优化结合取得了不错的效果。在含有大场景的少量的人脸图像中准确率已取得较高水平, 但在小场景的大量的人脸图像中准确率较低。针对多尺度人脸检测，2017年, ZHANG提出尺度不变人脸检测器(single shot scale-invariant face detector，S3FD)，结合了faster R-CNN中的RPN和SSD^[19](single shot multibox detector)中的锚点机制。2018年, ZHU等人^[20]、LI等人^[21]将人脸检测准确性进一步提高，同年GU等人^[22]利用多尺度的目标检测在3-D方面取得了不错的效果。2019年, LI和TANG等人^[23]提出PyramidBox⁺⁺。

针对人脸检测中因目标特征信息较少、检测困难的问题，本文中提出一种基于S3FD的人脸检测算法。使用以视觉几何组(visual geometry group, VGG)VGG16^[24]网络为主干的S3FD^[18]进行特征提取，在整体网络结构的中间加入卷积注意力模块^[25](convolutional block attention module，CBAM)，利用空间和通道注意力来进行不同特征通道和特征图位置的权重系数，对后续的网络机构进行特征强化，最后利用特征金字塔网络^[26](feature pyramid network，FPN)结构的形式，将其中FPN网络结构的卷积部分替换成为感受野模块(receptive field block，RFB)网络结构，利用RFB^[27]中扩大感受野的效果作用于不同尺度下的特征图，从而减少目标特征信息的丢失，并且在不大量增加额外参数量的情况下完成人脸检测。

3. 结论

为解决在小目标人脸的检测准确率相对较差的问题，本文中利用S3FD网络结构为主干进行针对小目标人脸的检测算法研究。为解决在卷积过程中目标特征丢失以及图像清晰度不够的问题，该方法将S3FD、通道和空间注意力机制、RFB扩大感受野模块和多尺度特征金字塔相结合，减少目标特征在卷积过程中的损失，使得网络的整体准确率得到提高。本文中模型在参量方面没有进行大范围的增加，使得网络模型在检测过程中具有较好的速度，在一定程度上满足了对目标检测快速处理的需求。人脸检测中检测速度和检测准确率是一对矛盾体，如何在提高速度的同时提高人脸检测的准确率一直是人脸检测这一领域的重点，随着网络结构的不断优化以及硬件设备的不断提高，各种理论的不断发展和成熟，在不久的将来更快更准的人脸检测将会出现，本文中模型在保证一定速度的基础之上依旧保持较好的准确度，可促进更快更准的人脸检测技术的发展。

Reference (32)

[1]	ROWLEY H A, BALUJA S, KANADE T. Neural network-based face detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002, 20(1): 23-38.
[2]	ROWLEY H A, BALUJA S, KANADE T. Rotation invariant neural network-based face detection[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 1998: 38-44.
[3]	VIOLA P, JONES M J. Robust real-time face detection[J]. International Journal of Computer Vision, 2004, 57(2): 137-154. doi: 10.1023/B:VISI.0000013087.49260.fb
[4]	MATHIAS M, BENENSON R, PEDERSOLI M, et al. Face detection without bells and whistles[C]// European Conference on Computer Vision. Zurich, Switzerland: ECCV, 2014: 720- 735.
[5]	YAN J, LEI Z, WEN L, et al. The fastest deformable part model for object detection[C]//Computer Vision and Pattern Recognition. New York, USA: IEEE, 2014: 2497-2504.
[6]	ZHU X, RAMANAN D. Face detection, pose estimation, and landmark localization in the wild[C]//Computer Vision and Pattern Re-cognition. New York, USA: IEEE, 2012: 2879-2886.
[7]	KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386
[8]	LEI H L, ZHANG B H. Crowd count algorithm based on multi-model deep convolution network integration[J]. Laser Technology, 2019, 43(4): 476-481.
[9]	CHEN Q X, WU W Ch, ASKAR H. Detection algorithm based on multi-scale spotted target modeling[J]. Laser Technology, 2020, 44(4): 520-524.
[10]	LI H, LIN Z, SHEN X, et al. A convolutional neural network cascade for face detection[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2015: 5325-5334.
[11]	QIN H, YAN J, LI X, et al. Joint training of cascaded CNN for face detection[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2016: 3456-3465.
[12]	YANG S, LUO P, LOY C C, et al. From facial parts responses to face detection: A deep learning approach[C]//International Confe-rence on Computer Vision. New York, USA: IEEE, 2015: 3676-3684.
[13]	ZHANG K, ZHANG Z, LI Z. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503. doi: 10.1109/LSP.2016.2603342
[14]	JIANG H, LEARNED-MILLER E. Face detection with the faster R-CNN[C]//Automatic Face and Gesture Recognition. New York, USA: IEEE, 2017: 650-657.
[15]	REN S, HE K, GIRSHICK R. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149.
[16]	ZHU C, ZHENG Y, LUU K, et al. CMS-RCNN: Contextual multi-scale region-based CNN for unconstrained face detection[EB/OL]. (2016-06-16)[2020-12-22]. https://arxiv.org/pdf/1606.05413.pdf.
[17]	WAN S, CHEN Z, ZHANG T, et al. Bootstrapping face detection with hard negative examples[EB/OL]. (2016-08-07)[2020-12-22]. https://arxiv.org/pdf/1608.02236.pdf.
[18]	ZHANG S, ZHU X, LEI Z, et al. S3fd: Single shot scale-invariant face detector[C]//International Conference on Computer Vision. New York, USA: IEEE, 2017: 192-201.
[19]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]// European Conference on Computer Vision. Amsterdam, The Netherlands: Springer International Publishing, 2016: 21-37.
[20]	ZHU C, TAO R, LUU K, et al. Seeing small faces from robust anchor's perspective[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2018: 5127-5136.
[21]	LI J, WANG Y, WANG C, et al. DSFD: Dual shot face detector[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2019: 5055-5064.
[22]	GU Y, LU X Q, YANG L D. Automatic lung nodule detection using a 3D deep convolutional neural network combined with a multi-scale prediction strategy in chest CTs[J]. Computers in Biology and Medicine, 2018, 103(): 220-231. doi: 10.1016/j.compbiomed.2018.10.011
[23]	LI Z, TANG X, HAN J, et al. PyramidBox^{+ +}: High performance detector for finding tiny face[EB/OL]. (2019-08-07)[2020-12-22]. https://arxiv.org/pdf/1904.00386.pdf.
[24]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2020-12-22]. https://arxiv.org/pdf/1409.1556.pdf.
[25]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]// European Conference on Computer Vision. Munich, Germany: ECCV, 2018: 3-19.
[26]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Computer Vision and Pattern Recognition. New York, USA: IEEE, 2017: 936-944.
[27]	LIU S, HUANG D, WANG Y. Receptive field block net for accurate and fast object detection[J]. Lecture Notes in Computer Science, 2018, 11215(): 404-419.
[28]	SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Computer Vision and Pattern Recognition. New York, USA: IEEE, 2014: 1-9.
[29]	BA J, MNIH V, KAVUKCUOGLU K. Multiple object recognition with visual attention[EB/OL]. (2015-04-23)[2020-12-22]. https://arxiv.org/pdf/1412.7755.pdf.
[30]	YANG S, LUO P, LOY C C, et al. Wider face: A face detection benchmark[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2016: 5525-5533.
[31]	ZHANG H Sh. Research on key algorithms of face detection and face recognition in video surveillance[D]. Chengdu: University of Electronic Science and Technology of China, 2019: 11-48(in Chinese).
[32]	WANG M, SU H S, LIU G H. Classroom face detection algorithm based on convolutional neural network[J]. Laser & Optoelectronics Progress, 2019, 56(21): 211501-.

methods	AP/%
methods	easy	medium	hard
S3FD	93.7	92.5	85.9
S3FD-CBAM	93.8	92.7	86.4
S3FD-double-stage	92.9	91.3	82.1
S3FD-double-stage-max	92.8	91.1	82.6
S3FD-CBAM-RFBNet	95.0	93.7	86.4

methods	AP/%
methods	easy	medium	hard
S3FD^[18]	93.7	92.5	85.9
ZHU^[20]	94.9	93.3	86.1
IS3FD^[31]	94.4	93.5	87.9
IS3FD-fast^[31]	93.1	92.4	86.4
WANG^[32]	93.0	87.3	58.3
S3FD-CBAM-RFB(proposed method)	95.0	93.7	86.4

Face detection algorithm based on improved S3FD network

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Proportional views

Face detection algorithm based on improved S3FD network

Corresponding author: LÜ Xiaoqi, lxiaoqi@imut.edu.cn;

HTML

1.1. 网络架构设计

1.2. RFB网络模块

1.3. CBAM网络

2.1. 数据集

2.2. 实验环境

2.3. 评估标准

2.4. 实验结构与结果分析

2.4.1. 实验结构分析

2.4.2. 实验结果分析

2.5. 对比分析

Catalog

Face detection algorithm based on improved S3FD network

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Proportional views

Face detection algorithm based on improved S3FD network

Corresponding author: LÜ Xiaoqi, lxiaoqi@imut.edu.cn;

HTML

1.1. 网络架构设计

1.2. RFB网络模块

1.3. CBAM网络

2.1. 数据集

2.2. 实验环境

2.3. 评估标准

2.4. 实验结构与结果分析

2.4.1. 实验结构分析

2.4.2. 实验结果分析

2.5. 对比分析

Catalog

Export File

Citation

Format

Content