We introduce a method to render Neural Radiance Fields
(NeRFs) in real time using PlenOctrees, an octree-based
3D representation which supports view-dependent effects.
Our method can render 800×800 images at more than 150
FPS, which is over 3000 times faster than conventional
NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering
of scenes with arbitrary geometry and view-dependent effects. Real-time performance is achieved by pre-tabulating
the NeRF into a PlenOctree. In order to preserve viewdependent effects such as specularities, we factorize the appearance via closed-form spherical basis functions. Specifically, we show that it is possible to train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. Furthermore, we show that PlenOctrees can be directly optimized to further minimize the reconstruction loss,
which leads to equal or better quality compared to competing methods. Moreover, this octree optimization step
can be used to reduce the training time, as we no longer
need to wait for the NeRF training to converge fully. Our
real-time neural rendering approach may potentially enable new applications such as 6-DOF industrial and product visualizations, as well as next generation AR/VR systems. PlenOctrees are amenable to in-browser rendering as
well; please visit the project page for the interactive online
demo, as well as video and code: https://alexyu.
Despite the progress of real-time graphics, interactive
3D content with truly photorealistic scenes and objects are
still time consuming and costly to produce due to the necessity of optimized 3D assets and dedicated shaders. Instead, many graphics applications opt for image-based solutions. E-commerce websites often use a fixed set of views to
showcase their products; VR experiences often rely on 360
video recordings to avoid the costly production of real 3D
scenes, and mapping services such as Google Street View
stitch images into panoramic views limited to 3-DOF.
Recent advances in neural rendering, such as neural volumes  and neural radiance fields (NeRFs) , open a
promising new avenue to model arbitrary objects and scenes
in 3D from a set of calibrated images. NeRFs in particular can faithfully render detailed scenes and appearances
with non-Lambertian effects from any view, while simultaneously offering a high degree of compression in terms of
storage. Partly due to these exciting properties, of late, there
has been an explosion of research based on NeRF.
Nevertheless, for practical applications, runtime performance remains a critical limitation of NeRFs: due to the
extreme sampling requirements and costly neural network
queries, rendering a NeRF is agonizingly slow. For illustration, it takes roughly 30 seconds to render an 800×800
image from a NeRF using a high performance GPU, making it impractical for real-time interactive applications.
In this work, we propose a method for rendering a NeRF
in real time, achieved by distilling the NeRF into a hierarchical 3D volumetric representation. Our approach preserves NeRF’s ability to synthesize arbitrarily complex geometry and view-dependent effects from any viewpoint and
requires no additional supervision. In fact, our method
achieves and in many cases surpasses the quality of the original NeRF formulation, while providing significant acceleration. Our model allows us to render an 800×800 image at
167.68 FPS on a NVIDIA V100 GPU and does not rely on
a deep neural network during test time. Moreover, our representation is amenable to modern web technologies, allowing interactive rendering in a browser on consumer laptops.
Naive NeRF rendering is slow because it requires dense
sampling of the scene, where every sample requires a neural
network inference. Because these queries depend on the
viewing direction as well as the spatial position, one cannot
naively cache these color values for all viewing directions.
We overcome these challenges and enable real-time
rendering by pre-sampling the NeRF into a tabulated
view-dependent volume which we refer to as a PlenOctree, named after the plenoptic functions of Adelsen and
Bergen . Specifically, we use a sparse voxel-based octree where every leaf of the tree stores the appearance and
density values required to model the radiance at a point in
the volume. In order to account for non-Lambertian materials that exhibit view-dependent effects, we propose to
represent the RGB values at a location with spherical harmonics (SH), a standard basis for functions defined on the
surface of the sphere. The spherical harmonics can be evaluated at arbitrary query viewing directions to recover the
view dependent color.
Although one could convert an existing NeRF into such
a representations via projection onto the SH basis functions,
we show that we can in fact modify a NeRF network to predict appearances explicitly in terms of spherical harmonics.
Specifically, we train a network that produces coefficients
for the SH functions instead of raw RGB values, so that
the predicted values can later be directly stored within the
leaves of the PlenOctree. We also introduce a sparsity prior
during NeRF training to improve the memory efficiency of
our octrees, consequently allowing us to render higher quality images. Furthermore, once the structure is created, the
values stored in PlenOctree can be optimized because the
rendering procedure remains differentiable. This enables
the PlenOctree to obtain similar or better image quality
compared to NeRF. Our pipeline is illustrated in Fig. 2.
Additionally, we demonstrate how our proposed pipeline
can be used to accelerate NeRF model training, making our
solution more practical to train than the original NeRF approach. Specifically, we can stop training the NeRF model
early to convert it into a PlenOctree, which can then be
trained significantly faster as it no longer involves any neural networks.
Our experiments demonstrate that our approach can accelerate NeRF-based rendering by 5 orders of magnitude
without loss in image quality. We compare our approach
on standard benchmarks with scenes and objects captured
from 360◦ views, and demonstrate state-of-the-art level performance for image quality and rendering speed.
Our interactive viewer can enable operations such as object insertion, visualizing radiance distributions, decomposing the SH components, and slicing the scene. We hope that
these real-time operations can be useful to the community
for visualizing and debugging NeRF-based representations.
To summarize, we make the following contributions:
• The first method that achieves real-time rendering of
NeRFs with similar or improved quality.
• NeRF-SH: a modified NeRF that is trained to output
appearance in terms of spherical basis functions.
• PlenOctree, a data structure derived from NeRFs
which enables highly efficient view-dependent rendering of complex scenes.
• Accelerated NeRF training method using an early
training termination, followed by a direct fine-tuning
process on PlenOctree values.