In this work, we want to generate a 3D scene from data. The neural renderer will use

Image inpainting

Deep learning techniques

context encoder

trained an encoder-decoder model to fill in a central square hole in an image, using a combination of l2 regression on pixel values, and an adversarial loss [30]
- [17] adds a local discriminator loss to the original global discriminator loss; the local discriminator focuses on the realism of the synthetic content, while the global discriminator encourages global semantic coherence.
- ⁴⁸ improves further by introducing a coarse-to-fine approach. Instead of 1 encoder-decoder, there are 2! the second one learns the optimal locations in the unmasked regions from which the model should borrow texture patches
[45] minimizes the difference of nearest neighbor activation patches in deep layers of a pretrained ImageNet classification network, for improved synthesis of highly textured content.

Image based rendering

The idea of directly re-using the pixels from avail- able images to generate new views has been popular in computer graphics.

While these methods yield high-quality novel views, they do so by composting the corresponding input image rays for each output pixel and can therefore only generate already seen content, (e.g. they cannot create the rear-view of a car from available frontal and side-view images).

MPI paper

Why is this better than image-based disocclusion?

Generative image inpainting with contextual attention.
↩