- Compared results with reference-based methods in Table 1 from the main paper.
- Compared results with no-reference methods in Table 2 from the main paper.
Face restoration has achieved remarkable advancements through the years of development. However, ensuring that restored facial images exhibit high fidelity, preserve authentic features, and avoid introducing artifacts or biases remains a significant challenge. This highlights the need for models that are more "honest" in their reconstruction from low-quality inputs, accurately reflecting original characteristics. In this work, we propose HonestFace, a novel approach designed to restore faces with a strong emphasis on such honesty, particularly concerning identity consistency and texture realism. To achieve this, HonestFace incorporates several key components. First, we propose an identity embedder to effectively capture and preserve crucial identity features from both the low-quality input and multiple reference faces. Second, a masked face alignment method is presented to enhance fine-grained details and textural authenticity, thereby preventing the generation of patterned or overly synthetic textures and improving overall clarity. Furthermore, we present a new landmark-based evaluation metric. Based on affine transformation principles, this metric improves the accuracy compared to conventional L2 distance calculations for facial feature alignment. Leveraging these contributions within a one-step diffusion model framework, HonestFace delivers exceptional restoration results in terms of facial fidelity and realism. Extensive experiments demonstrate that our approach surpasses existing methods, achieving superior performance in both visual quality and quantitative assessments.
We propose HonestFace, a novel one-step diffusion model for face restoration. First, the LQ input \(x_L\) is encoded into \(z_L\) by the VAE encoder \(E_\phi\). Meanwhile, \(x_L\) and HQ references \(R = \{r_i\}\) pass through IDE and VRE, then fused to form the prompt embedding \(p\). Next, the UNet predicts \(\varepsilon_\theta\) to estimate \(\hat{z}_H\). Finally, the VAE decoder \(D_\phi\) reconstructs the output \(\hat{x}_H\). Generator and discriminator are trained alternately.
@article{wang2025honestface,
title={HonestFace: Towards Honest Face Restoration with One-Step Diffusion Model},
author={Wang, Jingkai and Miao, Wu and Gong, Jue and Chen, Zheng and Liu, Xing and Gu, Hong and Liu, Yutong and Zhang, Yulun},
journal={arXiv preprint arXiv:2505.18469},
year={2025}
}