This paper1 proposed a method for high-resolution images inpainting with large missing areas. The authors suggested the challenge of the task is the lack of large receptive fields of conventional convolution neural networks. Three contributions were claimed but I find the fast Fourier Convolutions (FFC)2 to be the most essential component, which was claimed by this paper that it allows for the image-wide receptive field that covers an entire image. The experiment results were very good. It’s funny that the FFC paper came out in 2020 but with only a few citations.
Summary
Goal: Develop modern image inpainting system
Challenges:
- large missing areas
- complex geometric structures
- high-resolution images
Reasons: lack of an effective receptive field in both the inpainting network and the loss function
Solutions:
- fast Fourier convolution
- high receptive field perceptual loss
- large training masks
Methodology
Fast Fourier Convolutions2
Loss Functions
Final loss: \(\mathcal{L}_{final}=\kappa L_{Adv}+\alpha \mathcal{L}_{HRFRL} + \beta \mathcal{L}_{DiscPL} + \gamma R_1\)
-
$L_{Adv}$: Adversarial loss
-
$\mathcal{L}_{HRFRL}$: High receptive field perceptual loss
\(\mathcal{L}_{HRFRL}(x,\hat{x})=\mathcal{M}([\phi_{HRF}(x)-\phi_{HRF}(\hat{x})]^2)\),
where $\phi_{HRF}$ is a pre-trained network, $\mathcal{M}$ is the sequential two-stage mean operation (interlayer mean of intra-layer means).
-
$\mathcal{L}_{DiscPL}$: discriminator-based perceptual loss (or feature matching loss)3
-
$R_1$: Gradient penalty4
Generation of Masks
Experiments
Ablation Study on Fast Fourier Convolutions
Ablation Study on Losses
Ablation Study on Masks
-
Suvorov, Roman, et al. “Resolution-robust Large Mask Inpainting with Fourier Convolutions.” arXiv preprint arXiv:2109.07161 (2021). ↩
-
Chi, Lu, Borui Jiang, and Yadong Mu. “Fast fourier convolution.” Advances in Neural Information Processing Systems 33 (2020). ↩ ↩2
-
Mescheder, Lars, Andreas Geiger, and Sebastian Nowozin. “Which training methods for GANs do actually converge?.” International conference on machine learning. PMLR, 2018. ↩
-
Wang, Ting-Chun, et al. “High-resolution image synthesis and semantic manipulation with conditional gans.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. ↩