An Adversarial Perturbation Technique against reCaptcha Image Attacks

Main Article Content

Lawankorn Mookdarsanit
Pakpoom Mookdarsanit


Deep learning has a great success in object recognition accuracy since 2012. Along with the dark world, deep learning can be misleading as the threat of reCaptcha attacks. A hacker demonstrated to generate the AI-based bots using Convolutional Neural Network (CNN) to recognize the reCaptcha images as human’s perception; and be authorized to access the business operation of information system. This activity shows that an AI-based bot (or non-human) can easily break the Challenge-Response authentication protocol. In this paper, “CNN-based object recognition” meets “cyber security”. The reCaptha attack defense is proposed by adding some adversarial perturbation (or noise) to the image. The perturbation can fool those AI-based bots to misclassify the objects within reCaptcha images that the bots cannot access the system. From the adversarial perturbation test, one-stage detection has more robust than two-stage one. Furthermore, the ResNet overcomes other architectures in overall score that can be used in ether one-stage or two-stage detection.

Article Details



Ahn, L. v., Blum, M., Hopper, N. J., & Langford, J. (2003). CAPTCHA: Using Hard AI Problems for Security. In “International Conference on the Theory and Applications of Cryptographic Techniques” (pp. 294-311). Warsaw, Poland: Springer .

Ahn, L. v., Maurer, B., McMillen, C., Abraham, D., & Blum, M. (2008). reCAPTCHA: Human-Based Character. Pittsburgh, Pennsylvania, USA.

Akhtar, N., & Mian, A. (2018). “Threat of adversarial attacks on deep learning in computer vision: A survey,” IEEE Access. 6, 14410-14430.

Alom, M. Z., Taha, T. M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M. S., et al. (2018). The History Began from AlexNet: A Comprehensive Survey on Deep Learning Approaches. arXiv:1803.01164 .

Cortes, C., & Vapnik, V. (2001). “Support-Vector Networks,” Machine Learning , 273–297.

Dai, J., Li, Y., He, K., & Sun, J. (2016). R-FCN: Object Detection via Region-based Fully Convolutional Networks. In “The 30th Annual Conference on Neural Information Processing Systems” (pp. 379–387). Barcelona.

de Luna, R. G., Baldovino, R. G., Cotoco, E. A., de Ocampo, A. L., Valenzuela, I. C., Culaba, A. B., et al. (2018). Identification of philippine herbal medicine plant leaf using artificial neural network. In “The 9th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management” (pp. 1-8). Manila, the Philippines: IEEE.

Girshick, R. (2015). Fast R-CNN. In “The 2015 IEEE International Conference on Computer Vision” (pp. 1440-1448). Santiago, Chile: IEEE.

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In “The 2014 IEEE Conference on Computer Vision and Pattern Recognition” (pp. 580-587). Columbus, Ohio: IEEE.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In “The 2016 IEEE Conference on Computer Vision and Pattern Recognition” (pp. 770-778). Las Vegas, NV: IEEE.

He, K., Zhang, X., Ren, S., & Sun, J. (2015). “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence. 1904 - 1916.

Ioffe, S., & Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. In “The 32nd International Conference on International Conference on Machine Learning” (pp. 448–456). Lille, France: ACM.

Jakimovski, G., & Davcev, D. (2018). Lung cancer medical image recognition using Deep Neural Networks. In “The 13th International Conference on Digital Information Management” (pp. 1-5). Berlin, Germany: IEEE.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In “The 26th Conference on Neural Information Processing Systems” (pp. 1-9). Lake Tahoe, Nevada.

LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. Holmdel, New Jersey, USA.

LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1989). “Backpropagation applied to handwritten zip code recognition,” Neural computation. 541-551.

Lee, H. J., Kim, S. T., Lee, H., & Ro, Y. M. (2019). “Lightweight and Effective Facial Landmark Detection using Adversarial Learning with Face Geometric Map Generative Network,” IEEE Transactions on Circuits and Systems for Video Technology. 1-1.

Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., & Sun, J. (2017). “Light-Head R-CNN,” In Defense of Two-Stage Object Detector. arXiv:1711.07264 .

Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature Pyramid Networks for Object Detection. In “The 2017 IEEE Conference on Computer Vision and Pattern Recognition” (pp. 936-944). Honolulu, Hawaii, USA: IEEE.

Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., et al. (2019). “Deep Learning for Generic Object Detection,” A Survey. International Journal of Computer Vision. 1-58.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., et al. (2016). SSD: Single Shot MultiBox Detector. In “The 14th European Conference on Computer Vision” (pp. 21-37). Amsterdam, The Netherlands: Springer.

Marini, S., Fanelli, E., Sbragaglia, V., Azzurro, E., Fernandez, J. D., & Aguzzi, J. (2018, September 13). Tracking Fish Abundance by Underwater Image Recognition. Nature .

Mookdarsanit, L., & Mookdarsanit, P. (2019). “SiamFishNet: The Deep Investigation of Siamese Fighting Fishes,” International Journal of Applied Computer Technology and Information Systems. 40-46.

Mookdarsanit, L., & Mookdarsanit, P. (2019). “Thai Herb Identification with Medicinal Properties Using Convolutional Neural Network,” Suan Sunandha Science and Technology Journal. 34-40.

Mookdarsanit, P., & Gertphol, S. (2013). Light-weight operation of a failover system for Cloud computing. In “The 5th International Conference on Knowledge and Smart Technology” (pp. 42-46). Chonburi, Thailand: IEEE.

Mookdarsanit, P., & Mookdarsanit, L. (2018). “A Content-based Image Retrieval of Muay-Thai Folklores by Salient Region Matching,” International Journal of Applied Computer Technology and Information Systems. 21-26.

Mookdarsanit, P., & Mookdarsanit, L. (2018). An Automatic Image Tagging of Thai Dance’s Gestures. In “Joint Conference on ACTIS and NCOBA” (pp. 76-80). Ayutthaya, Thailand.

Mookdarsanit, P., & Mookdarsanit, L. (2018). “Contextual Image Classification towards Metadata Annotation of Thai-tourist Attractions.” ITMSoc Transactions on Information Technology Management. 32-40.

Mookdarsanit, P., & Mookdarsanit, L. (2018). “Name and Recipe Estimation of Thai-desserts beyond Image Tagging,” Kasem Bundit Engineering Journal. 193-203.
Mori, G., & Malik, J. (2003). Recognizing objects in adversarial clutter: breaking a visual CAPTCHA. 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. I-I). Madison, WI, USA: IEEE.

Naor, M. (1996). Verification of a human in the loop or Identification via the Turing Test. Rehovot, Israel.

Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, Faster, Stronger. arXiv:1612.08242 .

Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. arXiv:1804.02767 .

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. In “The 2016 IEEE Conference on Computer Vision and Pattern Recognition” (pp. 779-788). Lasvegas, Nevada: IEEE.

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection. In “The 29th Conference on Neural Information Processing Systems” (pp. 91–99). Montreal, Canada.

Sarkar, S., Bansal, A., Mahbub, U., & Chellappa, R. (2017). UPSET and ANGRI : Breaking High Performance Image Classifiers. arXiv:1707.01159 .

Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 .

Soimart, L., & Mookdarsanit, P. (2016). Gender Estimation of a Portrait: Asian Facial-significance Framework. In “The 6th International Conference on Sciences and Social Sciences”. Mahasarakham, Thailand.

Soimart, L., & Mookdarsanit, P. (2017). “Ingredients Estimation and Recommendation of Thai-foods,” SNRU Journal of Science and Technology. 509-520.

Soimart, L., & Mookdarsanit, P. (2016). Multi-factor Authentication Protocol for Information Accessibility in Flash Drive. In “The 9th Applied Computer Technology and Information Systems” (pp. 10-13). Nakhon Pathom, Thailand.

Soimart, L., & Mookdarsanit, P. (2017). Name with GPS Auto-tagging of Thai-tourist Attractions from An Image. In “The 2nd Technology Innovation Management and Engineering Science International Conference” (pp. 211-217). Nakhon Pathom, Thailand.

Sutthaluang, N. (2018). The impact of asbestos exposure on lung disease. In “The 15th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology” (pp. 353-355). Chiang Rai, Thailand: IEEE.

Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, inception-ResNet and the impact of residual connections on learning. In “The 31st AAAI Conference on Artificial Intelligence” (pp. 4278–4284). San Francisco, California: ACM.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In “The 2015 IEEE Conference on Computer Vision and Pattern Recognition” (pp. 1-9). Boston, Massachusetts.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. In The 2016 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2818-2826). Las Vegas, Nevada: IEEE.

van de Sande, K. E., Uijlings, J. R., Gevers, T., & Smeulders, A. W. (2011). Segmentation as selective search for object recognition. In “The 2011 International Conference on Computer Vision” (pp. 1879-1886). Barcelona: IEEE.

Wang, M., & Deng, W. (2018). “Deep visual domain adaptation,” A survey. Neurocomputing. 135-153.
Zhang, L. (2019). Transfer Adaptation Learning: A Decade Survey. arXiv:1903.04687 .

Zheng, L., Yang, Y., & Tian, Q. (2018). “SIFT Meets CNN: A Decade Survey of Instance Retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence. 1224-1244.

Zou, Z., Shi, Z., Guo, Y., & Ye, J. (2019). Object Detection in 20 Years: A Survey. arXiv:1905.05055.