A Review of Object Detection Based on Convolutional Neural Networks and Deep Learning

Main Article Content

MingYuan Wang
Watis Leelapatra

Abstract

   Object detection, as one of the three main tasks of computer vision, is of great importance for the development of artificial intelligence in the future. The rapid advancement of convolutional neural networks (CNNs) and deep learning have provided a broader arena for object detection. From traditional methods to state-of-the-art algorithms, numerous innovative technologies and methods have been proposed. This paper reviews the one-stage and two-stage object detection algorithms and compares their advantages and shortcomings from various aspects. Some applications in real life, such as self-driving, weeding robots are illustrated in this paper. Finally, current issues and future studies direction are prospected.

Article Details

How to Cite
Wang, M., & Leelapatra, W. (2022). A Review of Object Detection Based on Convolutional Neural Networks and Deep Learning. INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND TECHNOLOGY (ISJET), 6(1), 1–7. Retrieved from https://ph02.tci-thaijo.org/index.php/isjet/article/view/245969
Section
Research Article

References

G. Bingtao, W. Xiaorui, C. Yujiao et al., “High-Accuracy Infrared Simulation Model Based on Establishing the Linear Relationship between the Outputs of Different Infrared Imaging Systems,” Infrared Physics & Technology, vol. 69, pp. 155-163, Mar. 2015.

N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” in Proc. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition(CVPR’05), 2005, pp. 886-893.

S. R. Gunn, “Support Vector Machines for Classification and Regression,” ISIS Technical Report, vol. 14, pp. 5-16, May. 1998.

P. F. Felzenszwalb, R. B. Girshick, D. McAllester et al., “Object Detection with Discriminatively Trained Part-Based Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, pp. 1627-1645, Sep. 2010.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems, vol.

, pp. 1097-1104, 2012.

K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv Preprint arXiv:1409.1556, vol. 1, pp. 1-14, Sep. 2014.

C. Szegedy, W. Liu, Y. Jia et al., “Going Deeper with Convolutions,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1-9.

K. He, X. Zhang, S. Ren et al., “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, pp. 1904-1916, Jan. 2015.

K. He, X. Zhang, S. Ren et al., “Deep Residual Learning for Image Recognition,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.

R. Girshick, J. Donahue, T. Darrell et al., “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.

R. Girshick, “Fast r-cnn,” in Proc. IEEE International Conference on Computer Vision, 2015, pp. 1440-1448.

S. Ren, K. He, R. Girshick et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Advances in Neural Information Processing Systems, vol. 28, pp. 1137-1149, Jun. 2015.

J. Redmon, S. Divvala, R. Girshick et al., “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition,

, pp. 779-788.

W. Liu, D. Anguelov, D. Erhan et al., “Ssd: Single Shot Multibox Detector,” European Conference on Computer Vision, 2016, pp. 21-37.

T. Lin, M. Maire, S. Belongie et al., “Microsoft COCO: Common Objects in Context,” in Proc. European Conference on Computer Vision, 2014, pp. 740-755.

J. Deng, W. Dong, R. Socher et al., “ImageNet: A Large-Scale Hierarchical Image Database,” IEEE, pp. 248-255, Jun. 2009.

J. R. Uijlings, K. E. Van De Sande, T. Gevers et al., “Selective Search for Object Recognition,” International Journal of Computer Vision, vol. 104, pp. 154-171, Sep. 2013.

P. Arbeláez, J. Pont-Tuset, J. T. Barron et al., “Multiscale Combinatorial Grouping,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 328-335.

C. L. Zitnick and P. Dollár, “Edge Boxes: Locating Object Proposals from Edges,” in Proc. European Conference onComputer Vision, 2014, pp. 391-405.

J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7263-7271.

J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arXiv Preprint arXiv:1804.02767, pp. 1-6, Apr. 2018.

T. Lin, P. Dollár, R. Girshick et al., “Feature Pyramid Networks for Object Detection,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.

A. Bochkovskiy, C. Wang, and H. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv Preprint arXiv:2004.10934, pp. 1-17, Apr. 2020.

J. Jeong, H. Park, and N. Kwak, “Enhancement of SSD by

Concatenating Feature Maps for Object Detection,” arXiv

Preprint Arxiv:1705.09587, pp. 1-12, May. 2017.

C. Fu, W. Liu, A. Ranga et al., “DSSD: Deconvolutional Single Shot Detector,” arXiv Preprint arXiv:1701.06659, Jan. 2017.

Z. Li and F. Zhou, “FSSD: Feature Fusion Single Shot Multibox Detector,” arXiv Preprint arXiv:1712.00960, Dec.2017.

T. Lin, P. Goyal, R. Girshick et al., “Focal Loss for Dense Object Detection,” in Proc. IEEE International Conference on Computer Vision, 2017, pp. 2980-2988.

Y. LI, X. Zhang, M. Zhang et al., “Object Detection in Autonomous Driving Scene Based on Improved Efficient Diet,” Computer Engineering and Applications, vol. 58, pp. 183-191, Mar. 2022.

M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and Efficient Object Detection,” in Proc. EE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10781- 10790.

J. Zhang, Y. Xiong, X. Sun et al., “License Plate Detection Using Siamese Feature Pyramid and Cascaded Positioning,” Computer Engineering and Applications, vol. 12, no. 12, pp. 1-15, Nov. 2021.

J. Sun, X. He, W. Tan et al., “Recognition of Crop Seedling and Weed Recognition Based on Dilated Convolution and Global Pooling In CNN,” Transactions of the Chinese Society of Agricultural Engineering, vol. 34, pp. 159-165, 2018.

L. Zhang, X. Jin, L. Fu et al., “Recognition Method for Weeds in Rapeseed Field Based on Faster R-Cnn Deep Network,” in Proc. IEEE International Conference on Automatic Face & Gesture Recognition, 2020, pp. 304-312.

H. Jiang and L. Erik, “Face Detection with The Faster R-CNN,” in Proc. IEEE International Conference on Automatic Face & Gesture Recognition, 2017, pp. 650-657.

W. Li, Y. Liu, and G. Xing, “Illumination Analysis of Deep Face Recognition,” Journal of Computer-Aided Design & Computer Graphics, vol. 34, pp. 74-83, Nov. 2021.