Face Photo and Sketch Synthesis Using CycleGAN: The Role of Residual Blocks and Spectral Normalization

Main Article Content

Nuntipat Phisutthangkoon
Patikorn Anchuen
Thiansiri Luangwilai
Phummipat Daungklang

Abstract

This study investigates the problem of face photo and sketch synthesis using CycleGAN, with a particular focus on architectural choices that affect model performance. A major challenge in this field is the limited availability of paired face photo-sketch datasets, which makes it difficult to develop and evaluate effective models. To address this limitation, this research demonstrates how to generate face sketches and photos from small datasets, offering a practical solution for expanding existing data collections in future applications, particularly in law enforcement and forensic sketch analysis. Four CycleGAN configurations were designed by varying the number of residual blocks in the generator and applying spectral normalization to the discriminator. Experiments were conducted on the CUHK Face Sketch Database, with quantitative evaluations using SSIM and PSNR, as well as qualitative assessments via human visual inspection. The results show that increasing generator depth and incorporating spectral normalization in the discriminator both improve image quality and training stability, although these benefits come at the cost of increased training time. Notably, the configuration with ResNet9 and spectral normalization consistently achieved the highest scores, with SSIM and PSNR values of 0.702 and 18.53 for sketch-to-photo translation, and 0.702 and 17.35 for photo-to-sketch translation, and produced the most visually convincing results in both tasks. These findings highlight the importance of architectural design choices for effective face photo-sketch translation and provide guidance for future work in related applications.

Article Details

How to Cite
Phisutthangkoon, N., Anchuen, P. ., Luangwilai, T. ., & Daungklang, P. . (2026). Face Photo and Sketch Synthesis Using CycleGAN: The Role of Residual Blocks and Spectral Normalization. INTERNATIONAL SCIENTIFIC JOURNAL OF ENGINEERING AND TECHNOLOGY (ISJET), 10(1), 84–95. retrieved from https://ph02.tci-thaijo.org/index.php/isjet/article/view/260920
Section
Research Article

References

T. Valeeprakhon, K. Orkphol, and P. Chaihuadjaroen, “Deep convolutional neural networks based on VGG-16 transfer learning for abnormalities peeled shrimp classification,” Int. Sci. J. Eng. Technol., vol. 6, no. 2, pp. 13-23, Dec. 2022.

N. Hongcharoen, P. Sanguansat, and S. Marukatat, “Fluke eggs detection and classification using deep convolutional neural network,” Int. Sci. J. Eng. Technol., vol. 6, no. 2, pp. 40-53, Dec. 2022.

J. -Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired imageto-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2017, pp. 2223-2232, https://doi.org/10.1109/ICCV.2017.244.

T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in Proc. Int. Conf. Learn. Representations (ICLR), 2018, pp. 1-4.

M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cogn. Neurosci., vol. 3, no. 1, pp. 71-86, 1991.

X. Wang and X. Tang, “Face photo–sketch synthesis and recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 11, pp. 1955-1967, Nov. 2009.

A. Irfan, R. Mudassar, S. Muhammad, K. Seifedine, and R. Seungmin, “Union is strength: Improving face sketch synthesis by fusing outcomes of fully convolutional networks and random sampling locality constraint,” Alexandria Eng. J., vol. 61, no. 12, pp. 10727-10741, Apr. 2022.

M. Anwesha, S. Alistair, B. Marija, and J. Hossein, “Rhi3DGen: Analyzing rhinophyma using 3D face models and synthetic data,” Intell.-Based Med., vol. 8, Art. no. 100124, Nov. 2023.

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 1125-1134.

X. Gao, N. Wang, D. Tao, and W. Liu, “Face sketch-photo synthesis and retrieval using sparse representation,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 8, pp. 1213-1226, Aug. 2012.

L. Wang, V. Sindagi, and V. Patel, “High-quality facial photo-sketch synthesis using multi-adversarial networks,” in Proc. IEEE Int. Conf. Autom. Face Gesture Recognit. (FG), 2018, pp. 83-90.

Y. Fu, W. Deng, J. Du, and J. Han, “Identity-aware CycleGAN for face photo-sketch synthesis and recognition,” Pattern Recognit., vol. 102, Art. no. 107249, Jun. 2020, https://doi.org/10.1016/j.patcog.2020.107249

B. Cao, N. Wang, J. Li, Q. Hu, and X. Gao, “Face photosketch synthesis via full-scale identity supervision,” Pattern Recognit., vol. 124, Art. no. 108446, Apr. 2022, https://doi.org/10.1016/j.patcog.2021.108446

C. Chaofeng, L. Wei, T. Xiao, and K. W. K. Kwan-Yee, “Semi-supervised Cycle-GAN for face photo-sketch translation in the wild,” Comput. Vis. Image Understand., vol. 235, Art. no. 103775, Oct. 2023, https://doi.org/10.1016/j.cviu.2023.103775

Y. Peng, C. Zhang, H. Xu, T. Fukumoto, and K. Maeno, “DiffFaceSketch: High-fidelity face image synthesis with sketch-guided latent diffusion model,” arXiv, 2023, [Online]. Available: https://doi.org/10.48550/arXiv.2302.06908 [Accessed: Aug. 19, 2025].

T. Wang, Z. Hu, and Y. Guo, “An eficient lightweight network for image denoising using progressive residual and convolutional attention feature fusion,” Sci. Rep., vol. 14, Art. no. 9544, Apr. 2024, https://doi.org/10.1038/s41598-024-60139-x

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp. 600-612, Apr. 2004, https://doi.org/10.1109/TIP.2003.819861