An improved infrared object detection algorithm based on YOLOv5

LIU Haojiao; LIU Lishuang; ZHANG Mingchun

doi:10.7510/jgjs.issn.1001-3806.2024.04.011

To address the issues of low recognition accuracy, lack of infrared image features, and poor contrast affecting object detection, several improvements to the original YOLOv5 model were proposed. Firstly, an additional prediction feature layer was introduced to enhance the detection capability for small objects in infrared images. Additionally, a coordinate attention mechanism was employed to enhance the extraction of strong features from infrared targets, thereby improving the detection accuracy of the model. Secondly, the feature fusion network was optimized by using a bidirectional feature pyramid network to improve the model's expressive power and reduce redundant computation. Lastly, to tackle the problem of sample imbalance in detection localization and bounding box regression tasks, the focal-EIOU as the loss function was adopted. This accelerates convergence speed and focuses the regression process on high-quality anchor boxes. Experimental results demonstrate that the improved YOLOv5 achieves an accuracy of 85.3% on the FLIR dataset, which is a 4.2% improvement over the original network model. It not only exhibits high detection accuracy but also provides feasibility for deployment on embedded devices.

HTML

0. 引言

目标识别技术在计算机视觉领域扮演关键角色，这项技术不仅提升了智能监控系统的性能，还推动了自动驾驶车辆的创新发展，同时在工业和智能家庭领域中发挥着实际作用。红外图像由于具有穿透迷雾、烟雾和低光环境的特点，在军事和安全侦察^[1]以及监测方面非常有用。尽管在具体场景下，红外图像拥有较长的红外波长，但在相同成像条件下，红外图像的空间分辨率通常较低。与可见光图像不同，红外图像是通过探测器捕捉物体表面的热辐射来完成成像的。这种热辐射受到外部环境和气候等多种因素的影响，目标的探测和识别在这种情境下变得更加具有挑战性。

目标检测算法可分为传统和机器学习两大类。主流的传统方法在红外图像目标检测中运用了边缘检测、模板匹配以及霍夫变换等相关技术。一些算法利用边缘、轮廓^[2]和纹理进行目标检测，例如使用方向梯度直方图(histogram of oriented gradient，HOG)^[3]特征进行行人检测。然而，这些传统方法需要手动提取图像特征，依赖先验知识且表达能力有限，从而限制了准确性。

近些年，将深度学习应用于目标检测算法中已取得显著的成果。2014年，GIRSHICK等人提出了第1个基于深度学习的目标检测算法区域卷积神经网络(regional convolutional neural network，R-CNN)^[4]，该算法将图像分割成多个感兴趣区域(region of interest，ROI)，对每个ROI执行卷积神经网络，并使用支持向量机(support vector machine，SVM)进行分类。为了提高速度，2015年，GIRSHICK等人提出了fast R-CNN算法^[5]，在整个图像上执行卷积神经网络，然后使用ROI池化对每个感兴趣的区域进行处理。相比之前的两阶段目标检测算法，一阶段方法如单次多边框检测(single shot multibox detector, SSD)和RetinaNet, 在速度上有较大提升^[6]。2016年，REDMON等人提出了你只需看一次(you only look once, YOLO)算法^[7]，利用单个卷积神经网络同时进行边界框和类别的预测，对整个图像进行处理，实时运行。尽管初始版本的YOLO在准确率和对小目标的检测能力上存在一些不足^[8]，但经过改进，目前在工业领域得到广泛应用^[9]。

与基于候选区域的方法相比，YOLOv1利用全局图像信息来预测边界框和类别^[10]，消除了候选区域的问题。同时，YOLOv1采用单一网络，在端到端的训练中同时学习检测和分类任务。YOLOv2引入了批量归一化来减少协变量的偏移^[11]，并使用锚框来处理不同大小的目标，它采用高分辨率的特征图进行检测，以最小化物体检测定位误差。YOLOv3使用特征金字塔网络(feature pyramid networks, FPN)来实现多尺度目标检测，FPN通过顶层向底层进行边界融合, 构建一个由大到小尺度的特征金字塔；基于这个特征金字塔, YOLOv3采用统一大小的全卷积网络头进行预测, 以检测不同尺寸的目标，这种设计融合了不同语义级别的特征, 增强了对小目标的检测效果。此外，YOLOv3对训练数据进行了增强，如随机旋转、裁剪等操作，以更好地适应不同的场景。YOLOv4使用了空间金字塔池化和路径聚合网络等技术来实现多尺度特征融合^[12]，并通过替换原来的损失函数为Mish激活函数，YOLOv4增强模型的非线性化能力和稳定性水平，引入了一系列改进策略，包括在输入端应用Mosaic数据增强技术和DropBlock正则化等。

本文作者希望是在红外图像对比度低、成像模糊，且目标尺寸小的情况下，通过深度学习方法实现高性能的红外目标技术同时控制模型参数，以适应嵌入式设备的部署。为了实现这一目标，在YOLOv5s的基础上，采用迁移学习进行训练，并通过引入坐标注意力(coordinate attention，CA)机制模块、添加额外的预测层来提升小目标的检测能力，同时修改特征融合网络和损失函数。

4. 结论

本文作者针对红外图像特征少、尺度变化大的问题^[21]，提出了一种改进的YOLOv5s网络。首先增加了用于小目标的额外检测层，用于提高模型对红外小目标的检测能力；在主干网络增加了CA机制模块，让模型更好地关注重要的特征，抑制一些无关紧要的通道；然后使用BiFPBN替换原始YOLOv5s中使用的特征融合网络，减少冗余计算，大大提升模型的计算效率；最后使用了focal-EIOU损失函数。改进后的模型在红外图像数据集上测试，MAP相比于faster R-CNN、SSD、YOLOv5s以及YOLOv5s-p2分别提高了4.9%、13.5%、4.2%和3.4%。本文作者提出的红外目标检测算法，具有较强的实用价值和鲁棒性。

Reference (21)

[1]	李其昌, 李兵伟, 王宏臣. 非制冷红外成像技术发展动态及其军事应用[J]. 军民两用技术与产品, 2016, 42(21): 54-57. doi: 10.3969/j.issn.1009-8119.2016.21.029	LI Q Ch, LI B W, WANG H Ch. Development trends and military applications of uncooled infrared imaging technology[J]. Dual Use Technologies & Products, 2016, 42(21): 54-57. doi: 10.3969/j.issn.1009-8119.2016.21.029
[2]	侯春萍, 张倩文, 王晓燕. 轮廓匹配的复杂背景中目标检测算法[J]. 哈尔滨工业大学学报, 2020, 52(5): 121-128.	HOU C P, ZHANG Q W, WANG X Y. Object detection algorithm in complex background based on contour matching[J]. Journal of Harbin Institute of Technology, 2020, 52(5): 121-128.
[3]	BILAL M, HANIF M S. Benchmark revision for HOG-SVM pedestrian detector through reinvigorated training and evaluation methodologies[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 16(52): 1277-1287.
[4]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hie-rarchies for accurate object detection and semantic segmentation[C]// Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE Press, 2014: 277-127.
[5]	LI Y, PANG Y, CAO J. Improving single shot object detection with feature scale unmixing[J]. IEEE Transactions on Image Processing, 2021, 30(): 2708-2721. doi: 10.1109/TIP.2020.3048630
[6]	CHENG G, YUAN X, YAO X W. Towards large-scale small object detection: Survey and benchmarks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 23(76): 34-46.
[7]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE Press, 2016: 779-788.
[8]	张明淳, 牛春晖, 刘力双. 用于无人机探测系统的红外小目标检测算法[J]. 激光技术, 2024, 48(1): 114-120. doi: 10.7510/jgjs.issn.1001-3806.2024.01.018	ZHANG M Ch, NIU Ch H, LIU L Sh. Infrared small target detection algorithm for unmanned aerial vehicle detection system[J]. Laser Technology, 2024, 48(1): 114-120. doi: 10.7510/jgjs.issn.1001-3806.2024.01.018
[9]	王云杰, 王艳林, 夏润秋. 大视场红外告警系统中目标高精度方位提取[J]. 激光技术, 2023, 47(2): 200-204. doi: 10.7510/jgjs.issn.1001-3806.2023.02.007	WANG Y J, WANG Y L, XIA R Q. High precision azimuth extraction of targets in a large field of view infrared warning system[J]. Laser Technology, 2023, 47(2): 200-204. doi: 10.7510/jgjs.issn.1001-3806.2023.02.007
[10]	JIANG P, DAJI E, LIU F. A review of YOLO algorithm deve-lopments[J]. Procedia Computer Science, 2022, 199(): 1066-1073. doi: 10.1016/j.procs.2022.01.135
[11]	TERVEN R, CORDOVA-ESPARAZA D M. A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond[J]. arXiv Computer Science, 2023, 4(): 2304-.
[12]	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 75(23): 2004-10934.
[13]	ZHANG Y, GUO Zh Y, WU J Q. Real-time vehicle detection based on improved YOLOv5[J]. Sustainability, 2022, 19(): 12274-15427.
[14]	FANGBO Z, ZHAO H L, NIE Z. Safety helmet detection based on YOLOv5[J]. IEEE International Conference on Power Electronics, Computer Applications, 2021, 34(56): 6-11.
[15]	ZHU X K, LYU Sh Ch, WANG X, et al. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//International Conference on Computer Vision. Québec, Canada: IEEE Press, 2021: 11539.
[16]	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE Press, 2021: 13731-13722.
[17]	WOO S H, PARK J C, LEE J Y, et al. CBAM: Convolutional block attention module[C]//European Conference on Computer Vision. Munich, Germany: Springer Science Press, 2018: 3-9.
[18]	HU J, LI S, SUN G. Squeeze-and-excitation networks[C]//Confe-rence on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Press, 2018: 7132-7141.
[19]	TAN M X, PANG R M, LE Q V. Efficientdet: Scalable and efficient object detection[C]//Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE Press, 2020: 10781-10790.
[20]	ZHANG Y F, REN W Q, ZHANG Z. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506(): 146-157.
[21]	陈旭, 彭冬亮, 谷雨. 基于改进YOLOv5s的无人机图像实时目标检测[J]. 光电工程, 2022, 49(3): 210372-.	CHEN X, PENG D L, GU Y. Real-time objeet detection for UAV images based on improved YOLOv5s[J]. Opto-Electronic Engineering, 2022, 49(3): 210372-.

name	configuration information
CPU(central processing unit)	Intel(R)Core i9-10900X
GPU(graphics processing unit)	NVIDIA RTX 3090 ×2
framework	Pytorch 1.12.1
environments	CUDA11.6 CUDNN8.3.2

model	+head	BiFPN	CA	EIOU	MAP/%
YOLOv5s					81.1
A	√				83.9
B	√	√			84.5
C	√	√	√		84.8
D	√	√	√	√	85.4

model	P/%	R/%	MAP/%	parameter/10⁶	size/Mbyte	speed/(frame·s^-1)	BFLOP
faster R-CNN	63.9	53.7	80.4	99.2	330.6	33	440.3
SSD	71.8	34.7	71.8	91.7	182.2	64	190.7
YOLOv3-tiny	72.1	52.4	58.9	8.6	17.4	205	12.9
YOLOv4	79.3	66.5	74.9	9.1	18.7	101	20.6
YOLOv5s	82.6	71.0	81.1	7.0	14.4	116	15.8
YOLOv5s-p2	85.3	72.8	81.9	7.1	15.5	113	18.6
our	86.9	74.4	85.3	7.2	15.8	106	19.0

model	MAP/%
model	small	medium	large
YOLOv5s	71.3	95.2	94.4
our method	79.8	96.3	95.2

An improved infrared object detection algorithm based on YOLOv5

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Proportional views