Abstract:
To address the issues of low recognition accuracy, lack of infrared image features, and poor contrast affecting object detection, several improvements to the original YOLOv5 model were proposed. Firstly, an additional prediction feature layer was introduced to enhance the detection capability for small objects in infrared images. Additionally, a coordinate attention mechanism was employed to enhance the extraction of strong features from infrared targets, thereby improving the detection accuracy of the model. Secondly, the feature fusion network was optimized by using a bidirectional feature pyramid network to improve the model's expressive power and reduce redundant computation. Lastly, to tackle the problem of sample imbalance in detection localization and bounding box regression tasks, the focal-EIOU as the loss function was adopted. This accelerates convergence speed and focuses the regression process on high-quality anchor boxes. Experimental results demonstrate that the improved YOLOv5 achieves an accuracy of 85.3% on the FLIR dataset, which is a 4.2% improvement over the original network model. It not only exhibits high detection accuracy but also provides feasibility for deployment on embedded devices.