Ben Mildenhall et al., ECCV 2020
Reviewer: Kunho Kim
arXiv: https://arxiv.org/abs/2003.08934
NeRF achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. This algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (the spatial location ($x, y, z$) and the view direction ($\theta, \phi$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. NeRF synthesizes images from novel views by querying 5D coordinates along camera rays and leverages classic volume rendering techniques to project the output colors and densities into an image.
Why NeRF?
Novel view image synthesis is one of the long-standing problems in computer vision and computer graphics fields, and its significance is becoming larger as demands on interactive media applications increase.
There were huge successes in representing highly detailed 3D shapes with neural networks, but these methods were not suitable for reproducing realistic images compared to discrete-representation-based methods using mesh or voxels. → In other words, no rendering technique for continuous geometry & radiance distributions.
What is NeRF?
NeRF represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (the spatial location ($x, y, z$) and the view direction ($\theta, \phi$)) and whose output is the volume density and view-dependent emitted radiance at that spatial location.
To render a neural radiance field (NeRF) from a particular viewpoint we:
Because this process is naturally differentiable, we can use gradient descent to optimize this model by minimizing the error between each observed image and the corresponding views rendered from NeRF representation.
Main Contribution
The continuous scene of interest can be represented as a 5D vector-valued function $F_\Theta$ whose input & output are:
Thus, the relation between input and output in terms of $F_\Theta$ is:
$$ F_\Theta: (\bold{x},\bold{d}) \rightarrow (\bold{c}, \sigma) $$
NeRF Overview
5D input vector is obtained by casting ray though scene, sampling points along each ray. Also, the viewing direction $(\theta, \phi)$ are substituted to a unit vector d indicating the identical direction in practice.