Controllable Continuous Gaze Redirection

Weihao Xia1      Yujiu Yang1      Jing-Hao Xue2      Wensen Feng3


In this work, we present interpGaze, a novel framework for controllable gaze redirection that achieves both precise redirection and continuous interpolation. Given two gaze images with different attributes, our goal is to redirect the eye gaze of one person into any gaze direction depicted in the reference image or to generate continuous intermediate results. To accomplish this, we design a model including three cooperative components: an encoder, a controller and a decoder. The encoder maps images into a well-disentangled and hierarchically-organized latent space. The controller adjusts the magnitudes of latent vectors to the desired strength of corresponding attributes by altering a control vector. The decoder converts the desired representations from the attribute space to the image space. To facilitate covering the full space of gaze directions, we introduce a high-quality gaze image dataset with a large range of directions, which also benefits researchers in related areas. Extensive experimental validation and comparisons to several baseline methods show that the proposed interpGaze outperforms state-of-the-art methods in terms of image quality and redirection precision.


  • arxiv

  • github (coming)


Our proposed model contains (a) an Encoder $\boldsymbol{{E}}$, (b) a Controller $\boldsymbol{\mathcal{C}}$ and (c) a Decoder $\boldsymbol{G}$. The Encoder $\boldsymbol{E}$ maps images $\boldsymbol{x}_s$ and $\boldsymbol{x}_t$ into feature space $F_{s}=\boldsymbol{E}\left(x_{s}\right)$ and $F_{t}=\boldsymbol{E}\left(x_{t}\right)$. Then the feature difference is fed into four branches of the controller $\boldsymbol{\mathcal C}$ to produce morphing results of two samples $\mathcal {C}_{\boldsymbol{v}}(F_{s},F_{t}) =F_{s}+\sum_{k=1}^{c+1}{\boldsymbol{v}}^{k} \mathcal{T}^{k}(F_{t}-F_{s})$. The abbreviations P, H, V and O are head pose (P), vertical gaze direction (pitch, V), horizontal gaze direction (yaw, H) and miscellaneous attributes. The ``O'' branch is designed for other secondary attributes like glass, eyebrow, skin color, hair and illumination. The control vector $\boldsymbol{v} \in[0,1]^{(c+1) \times 1}$ adjusts the strength of each attribute, where $c=3$ in current setting. The Decoder $\boldsymbol{G}$ maps the latent features back to the image space.


This picture is illustration of interpolation between two given samples (green and blue). It can be seen that other attributes like eyebrow, glass, hair and skin color are well-preserved in the redirected gaze images, which means our model works consistently well in generating person-specific gaze images. Furthermore, since the encoder actually unfolds the natural image manifold, leading to a flat and smooth latent space that allows interpolation and even extrapolation, as shown in the last column.


If you find our work, code or pre-trained models helpful for your research, please consider to cite:

  title={Controllable Continuous Gaze Redirection},
  author={Xia, Weihao and Yang, Yujiu and Xue, Jing-Hao and Feng, Wensen},
  booktitle={ACM MM},