Ben Mildenhall et al., ECCV 2020

Reviewer: Kunho Kim

arXiv: https://arxiv.org/abs/2003.08934

1. Abstract

NeRF achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. This algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (the spatial location ($x, y, z$) and the view direction ($\theta, \phi$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. NeRF synthesizes images from novel views by querying 5D coordinates along camera rays and leverages classic volume rendering techniques to project the output colors and densities into an image.

2. Introduction

3. Problem Statement

The continuous scene of interest can be represented as a 5D vector-valued function $F_\Theta$ whose input & output are:

Thus, the relation between input and output in terms of $F_\Theta$ is:

$$ F_\Theta: (\bold{x},\bold{d}) \rightarrow (\bold{c}, \sigma) $$

4. Method

NeRF Overview

NeRF Overview

Neural Radiance Field Scene Representation

5D input vector is obtained by casting ray though scene, sampling points along each ray. Also, the viewing direction $(\theta, \phi)$ are substituted to a unit vector d indicating the identical direction in practice.