【英語学習 NeRF】ニューラル・ラディアンス・フィールドのリアルタイム・レンダリングに向けて

site:For Real-time Rendering of Neural Radiance Fields paper：PlenOctrees for Real-time Rendering of Neural Radiance Fields

Abstract

We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects. Our method can render 800×800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering of scenes with arbitrary geometry and view-dependent effects. Real-time performance is achieved by pre-tabulating the NeRF into a PlenOctree. In order to preserve viewdependent effects such as specularities, we factorize the appearance via closed-form spherical basis functions. Specifically, we show that it is possible to train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. Furthermore, we show that PlenOctrees can be directly optimized to further minimize the reconstruction loss, which leads to equal or better quality compared to competing methods. Moreover, this octree optimization step can be used to reduce the training time, as we no longer need to wait for the NeRF training to converge fully. Our real-time neural rendering approach may potentially enable new applications such as 6-DOF industrial and product visualizations, as well as next generation AR/VR systems. PlenOctrees are amenable to in-browser rendering as well; please visit the project page for the interactive online demo, as well as video and code: https://alexyu. net/plenoctrees.

Abstract

Neural Radiance Fields（ニューラル・ラディアンス・フィールド）をレンダリングする方法を紹介します。 (オクツリーベースであるPlenOctreesを用いてリアルタイムでビュー依存のエフェクトをサポートする3D表現。本手法では、800×800の画像を150 FPSと、従来の3,000倍以上の高速化を実現しました。 NeRF NeRFの自由視点レンダリング機能を維持したまま、品質を犠牲にすることなく、このようなレンダリングを実現しました。任意のジオメトリやビュー依存の効果を持つシーンのリアルタイム性能を実現するために、あらかじめ NeRFをPlenOctreeに変換します。スペキュラリティのようなビュー依存の効果を保持するために、閉形式の球面基底関数によって外観を因数分解する。具体的には、ニューラルネットワークへの入力として視線方向を削除し、輝度の球面調和表現を予測するNeRFを学習することが可能であることを示す。さらに、再構成損失をさらに最小化するために、PlenOctreesを直接最適化できることを示す。その結果、競合する手法と比較して同等以上の品質を実現することができました。さらに、このオクトリー最適化ステップを使用することで、学習時間を短縮することができます。 NeRFの学習が完全に収束するのを待つ必要があります。私たちのリアルタイム・ニューラルレンダリングは、6自由度の工業製品や製品の可視化、次世代AR/VRシステムなどの新しいアプリケーションを可能にする可能性があります。PlenOctreesはインブラウザでのレンダリングも可能です。プロジェクトページで、インタラクティブなオンラインデモ、ビデオ、コードもあります：https://alexyu。 net/plenoctrees.

1. Introduction

Despite the progress of real-time graphics, interactive 3D content with truly photorealistic scenes and objects are still time consuming and costly to produce due to the necessity of optimized 3D assets and dedicated shaders. Instead, many graphics applications opt for image-based solutions. E-commerce websites often use a fixed set of views to showcase their products; VR experiences often rely on 360 video recordings to avoid the costly production of real 3D scenes, and mapping services such as Google Street View stitch images into panoramic views limited to 3-DOF. Recent advances in neural rendering, such as neural volumes [24] and neural radiance fields (NeRFs) [30], open a promising new avenue to model arbitrary objects and scenes in 3D from a set of calibrated images. NeRFs in particular can faithfully render detailed scenes and appearances with non-Lambertian effects from any view, while simultaneously offering a high degree of compression in terms of storage. Partly due to these exciting properties, of late, there has been an explosion of research based on NeRF. Nevertheless, for practical applications, runtime performance remains a critical limitation of NeRFs: due to the extreme sampling requirements and costly neural network queries, rendering a NeRF is agonizingly slow. For illustration, it takes roughly 30 seconds to render an 800x800 image from a NeRF using a high performance GPU, making it impractical for real-time interactive applications. In this work, we propose a method for rendering a NeRF in real time, achieved by distilling the NeRF into a hierarchical 3D volumetric representation. Our approach preserves NeRF’s ability to synthesize arbitrarily complex geometry and view-dependent effects from any viewpoint and requires no additional supervision. In fact, our method achieves and in many cases surpasses the quality of the original NeRF formulation, while providing significant acceleration. Our model allows us to render an 800x800 image at 167.68 FPS on a NVIDIA V100 GPU and does not rely on a deep neural network during test time. Moreover, our representation is amenable to modern web technologies, allowing interactive rendering in a browser on consumer laptops. Naive NeRF rendering is slow because it requires dense sampling of the scene, where every sample requires a neural network inference. Because these queries depend on the viewing direction as well as the spatial position, one cannot naively cache these color values for all viewing directions. We overcome these challenges and enable real-time rendering by pre-sampling the NeRF into a tabulated view-dependent volume which we refer to as a PlenOctree, named after the plenoptic functions of Adelsen and Bergen [1]. Specifically, we use a sparse voxel-based octree where every leaf of the tree stores the appearance and density values required to model the radiance at a point in the volume. In order to account for non-Lambertian materials that exhibit view-dependent effects, we propose to represent the RGB values at a location with spherical harmonics (SH), a standard basis for functions defined on the surface of the sphere. The spherical harmonics can be evaluated at arbitrary query viewing directions to recover the view dependent color. Although one could convert an existing NeRF into such a representations via projection onto the SH basis functions, we show that we can in fact modify a NeRF network to predict appearances explicitly in terms of spherical harmonics. Specifically, we train a network that produces coefficients for the SH functions instead of raw RGB values, so that the predicted values can later be directly stored within the leaves of the PlenOctree. We also introduce a sparsity prior during NeRF training to improve the memory efficiency of our octrees, consequently allowing us to render higher quality images. Furthermore, once the structure is created, the values stored in PlenOctree can be optimized because the rendering procedure remains differentiable. This enables the PlenOctree to obtain similar or better image quality compared to NeRF. Our pipeline is illustrated in Fig. 2. Additionally, we demonstrate how our proposed pipeline can be used to accelerate NeRF model training, making our solution more practical to train than the original NeRF approach. Specifically, we can stop training the NeRF model early to convert it into a PlenOctree, which can then be trained significantly faster as it no longer involves any neural networks. Our experiments demonstrate that our approach can accelerate NeRF-based rendering by 5 orders of magnitude without loss in image quality. We compare our approach on standard benchmarks with scenes and objects captured from 360◦ views, and demonstrate state-of-the-art level performance for image quality and rendering speed. Our interactive viewer can enable operations such as object insertion, visualizing radiance distributions, decomposing the SH components, and slicing the scene. We hope that these real-time operations can be useful to the community for visualizing and debugging NeRF-based representations. To summarize, we make the following contributions: • The first method that achieves real-time rendering of NeRFs with similar or improved quality. • NeRF-SH: a modified NeRF that is trained to output appearance in terms of spherical basis functions. • PlenOctree, a data structure derived from NeRFs which enables highly efficient view-dependent rendering of complex scenes. • Accelerated NeRF training method using an early training termination, followed by a direct fine-tuning process on PlenOctree values.

1. はじめにリアルタイムグラフィックスの進歩にもかかわらず、インタラクティブな真にフォトリアリスティックなシーンやオブジェクトを持つ3Dコンテンツは最適化された3Dアセットと専用シェーダーが必要なため、制作にはまだ時間とコストがかかると言われています。そのため、多くのグラフィックスアプリケーションでは、画像ベースのソリューションが採用されています。電子商取引サイトでは、多くの場合、固定されたビューセットを使用しています。また、VR体験では、360 リアルな3D映像の制作コストを削減するためにまた、Googleストリートビューのような地図サービスもあります。は、3自由度に限定されたパノラマビューに画像をステッチします。ニューラルボリューム[24]やニューラルラジアンスフィールド（NeRF）[30]などのニューラルレンダリングにおける最近の進歩は、以下のような可能性を開いています。任意の物体やシーンをモデル化する新しい道として期待されています。較正された画像のセットから3Dで。特に、NeRFは、詳細なシーンや外観を忠実に表現することができる。非ランバートリアン効果で、どのような視点からも見ることができ、同時に、高い圧縮率を実現します。ストレージこのような特性もあり、近年では NeRFを用いた研究は爆発的に増えています。しかしながら、実用的なアプリケーションにおいては、NeRFのランタイム性能は依然として重要な制限となっています。極端なサンプリング要件と、コストのかかるニューラルネットのクエリを実行すると、NeRFのレンダリングに非常に時間がかかります。例えば、800×800の大きさのの画像は、高性能なGPUを使用しているため、リアルタイムのインタラクティブなアプリケーションには不向きです。本研究では、NeRFの画像を高性能GPUで描画する手法を提案します。 NeRFを階層的な3Dボリューム表現に分解することで、実時間で表現することができる。本アプローチは、任意の複雑な形状や視点に依存した効果を合成するNeRFの能力を維持しは、追加の監視を必要としない。実際、本手法はは、大幅な高速化を実現しながら、オリジナルのNeRF定式化の品質を達成し、多くの場合、それを上回っています。このモデルにより、800x800の画像のレンダリングを NVIDIA V100 GPUで167.68 FPSを達成しました。テスト時にディープニューラルネットワークをさらに、我々の表現は最新のウェブ技術に適合しており、消費者向けノートPCのブラウザでインタラクティブなレンダリングが可能です。ナイーブNeRFレンダリングは、高密度なレンダリングを必要とするためシーンのサンプリングが必要であり、そのサンプリングごとにニューラルネットワークの推論を行います。これらの問い合わせは空間的な位置だけでなく、見る方向も重要である。すべての視線方向に対して、色値を素直にキャッシュする。このような課題を克服し、リアルタイムにレンダリングは、NeRFをあらかじめ表ビューに依存するボリュームで、AdelsenとAndreのプレノプティック関数にちなんでPlenOctreeと呼ぶ。 Bergen [1]である。具体的には、ボクセルベースの疎なオクツリーを使用し、ツリーの各葉には外観とある点での放射輝度をモデル化するために必要な密度値。ボリュームになります。ビュー依存の効果を示す非ランバートリアン材料を考慮するために、我々は、以下の方法を提案する。ある位置のRGB値を、球面調和(SH)で表現する。球体の表面この球面高調波を任意のクエリ視線方向で評価することで、球面高調波を復元することができる。ビュー依存色既存のNeRFをこのように変換することは可能ですが SH基底関数への射影によって表現されます。本論文では、実際に球面調和で明示的に外観を予測するようにNeRFネットワークを修正することができることを示す。具体的には、球面高調波の係数を生成するネットワークを学習し SH関数は、RGBの生の値ではなく、SH関数の値です。予測された値は、後で直接の葉を使用することで、PlenOctreeのまた、スパース性事前分布を導入し NeRFの学習時に、メモリ効率を向上させるためにその結果、より高品質な画像のレンダリングが可能になりました。さらに、一度構造を作成すると PlenOctreeに格納される値は最適化されます。レンダリング手順は微分可能なままです。これにより PlenOctreeは同等以上の画質を得ることができます。は、NeRFと比較して図2に我々のパイプラインを示す。さらに、提案するパイプライン NeRFモデルの学習を加速させるために使用することができます。このソリューションは、オリジナルのNeRFアプローチよりも、より実用的なトレーニングが可能です。具体的には、NeRFモデルの学習を停止することができます。を早期にPlenOctreeに変換することで、より効率的な運用が可能になります。ニューラルネットワークを一切使用しないため、学習速度が大幅に向上します。我々の実験では、このアプローチにより、NeRFベースのレンダリングを5桁高速化できることが実証されました。画質を落とさずに本アプローチの比較標準的なベンチマークでは、シーンとオブジェクトをキャプチャして 360◦ビューから、画質とレンダリング速度で最先端レベルの性能を実証しました。インタラクティブなビューアにより、オブジェクトの挿入、輝度分布の可視化、SH成分の分解、シーンのスライスなどの操作が可能である。私たちはこのようなリアルタイムな操作により NeRFを用いた表現の可視化とデバッグのために。要約すると、我々は以下のような貢献をしています。 - のリアルタイムレンダリングを実現した最初の手法。 NeRFを同等の品質、もしくはそれ以上の品質で提供。 - NeRF-SH: NeRFを修正し、出力するように学習させたもの。の外観を球面基底関数で表現しています。 - NeRFから派生したデータ構造「PlenOctree 複雑なシーンのビュー依存のレンダリングを高効率に行うことができます。 - NeRFの高速学習法。学習終了後、直接微調整を行う。 PlenOctreeの値に対する処理。

＜おすすめ記事＞・【科学が証明】第二言語習得論このおすすめの学習ツールで英語をマスターする・【株式投資でマネーマシンを作る】管理人のポートフォリオ・スペック (管理人は米国株式に投資をしているので、英語学習をするようになりました。勉強をする意義があると継続できるし、苦痛が少なくて済むとおもいます) ・テスラの蓄電池（パワーウォール）について