Stroke Calibration and Completion for High-Quality Face Image Generation

Weihao Xia1      Yujiu Yang1      Jing-Hao Xue2


Image-to-image translation aims to translate an image of one domain to a given reference image of another domain. When applied to specific tasks, e.g, given edges and target images that follow strict alignment, supervised image-to-image translation methods can produce good translation results. However, when it comes to a poorly drawn sketch created by non-artists, the images generated by these methods are unacceptable. Furthermore, these free-hand sketches are still expressive in conveying facial features or emotion, and existing sketch-to-image generation methods fail to preserve this kind of information. In this paper, we propose a badly-drawn-sketch to face image generation method, named Cali-Sketch. It explicitly models stroke calibration and image generation using two components: Stroke Calibration Network (SCN), which calibrates strokes of facial features and enriches facial details while preserving the original intent features, and Image Synthesis Network (ISN), which translates the modified sketches after calibration and completion to face photos. Thus we decouple a difficult cross-domain translation problem into two easier steps. Extensive experiments show that face photos generated by our method are both photo-realistic and faithful to the input sketches, compared to state-of-the-art methods.


  • arxiv

  • github (coming)


We decompose this translation into two stages: 1) Stroke Calibration Network named SCN, and 2) Image Synthesis Network named ISN. Let G1 and D1 be the generator and discriminator of SCN, G2 and D2 be the generator and discriminator of ISN, respectively. As shown in the Framework, the input sketch S is first put into SCN to get the refined sketch R after stroke calibration and detail completion, which is then fed into ISN to generate a photo-realistic face image P. We first train Stroke Calibration Network and Image Synthesis Network separately until the losses plateau, and then train them jointly in an end-to-end way until convergence.


Qualitative comparison with baselines. We compare our methods with pix2pix [1], CycleGAN [2], DRIT [3], MUNIT [4]. Our approach generates high-uality images. The generated human face images are more photo-realistic. The corresponding image can be recognised easily from a batch of mixed sketches, which means crucial components and drawing intention of original sketches like facial contours, hair styles are well-preserved in the synthesized images. (Best viewed in with zoom-in.)

Comparision with state-of-the-art methods.


    author = {Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao},
    title = {Cali-Sketch: Stroke Calibration and Completion for High-Quality 
       Face Image Generation from Poorly-Drawn Sketches}, journal={Neurocomputing}, year={2021} }


  1. Jun-Yan Zhu*, Taesung Park*, Phillip Isola, Alexei A. Efros "Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks". ICCV, 2017.
  2. Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman "Toward Multimodal Image-to-Image Translation". NIPS, 2017.
  3. Hsin-Ying Lee*, Hung-Yu Tseng*, Jia-Bin Huang, Maneesh Kumar Singh, Ming-Hsuan Yang, "Diverse Image-to-Image Translation via Disentangled Representations". ECCV, 2018.
  4. Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz "MUNIT: Multimodal UNsupervised Image-to-image Translation". ECCV, 2018.