pass over website

This commit is contained in:
kkreis 2022-10-08 02:53:39 -07:00
parent 29cdb2d73f
commit 284d678b30

View file

@ -387,12 +387,12 @@ pre {
<div style="clear: both"> <div style="clear: both">
<div class="paper-btn-parent"> <div class="paper-btn-parent">
<a class="paper-btn" href="https://arxiv.org/abs/2112.07068"> <a class="paper-btn" href="https://nv-tlabs.github.io/LION">
<span class="material-icons"> description </span> <span class="material-icons"> description </span>
Paper Paper
</a> </a>
<div class="paper-btn-coming-soon"> <div class="paper-btn-coming-soon">
<a class="paper-btn" href="https://github.com/nv-tlabs/LION"> <a class="paper-btn" href="https://nv-tlabs.github.io/LION">
<span class="material-icons"> code </span> <span class="material-icons"> code </span>
Code Code
</a> </a>
@ -422,7 +422,7 @@ pre {
<h2>News</h2> <h2>News</h2>
<div class="row"> <div class="row">
<div><span class="material-icons"> event </span> [Oct 2022] <a href="https://nv-tlabs.github.io/LION">Project page</a> released!</div> <div><span class="material-icons"> event </span> [Oct 2022] <a href="https://nv-tlabs.github.io/LION">Project page</a> released!</div>
<div><span class="material-icons"> event </span> [Oct 2022] Paper released on <a href="https://github.com/nv-tlabs/LION">arXiv</a>!</div> <div><span class="material-icons"> event </span> [Oct 2022] Paper released on <a href="https://nv-tlabs.github.io/LION">arXiv</a>!</div>
<div><span class="material-icons"> event </span> [Aug 2022] LION got accepted to <b>Advances in Neural Information Processing Systems (NeurIPS)</b>!</div> <div><span class="material-icons"> event </span> [Aug 2022] LION got accepted to <b>Advances in Neural Information Processing Systems (NeurIPS)</b>!</div>
</div> </div>
</section> </section>
@ -433,15 +433,15 @@ pre {
<div class="flex-row"> <div class="flex-row">
<p> <p>
Denoising diffusion models (DDMs) have shown promising results in 3D point cloud synthesis. To advance 3D DDMs and make them useful Denoising diffusion models (DDMs) have shown promising results in 3D point cloud synthesis. To advance 3D DDMs and make them useful
for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional for digital artists, we require <i>(i)</i> high generation quality, <i>(ii)</i> flexibility for manipulation and applications such as conditional
synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes. To this end, we introduce the synthesis and shape interpolation, and <i>(iii)</i> the ability to output smooth surfaces or meshes. To this end, we introduce the
hierarchical Latent Point Diffusion Model (LION) for 3D shape generation. LION is set up as a variational autoencoder (VAE) with hierarchical Latent Point Diffusion Model (LION) for 3D shape generation. LION is set up as a variational autoencoder (VAE) with
a hierarchical latent space that combines a global shape latent representation with a point-structured latent space. For generation, a hierarchical latent space that combines a global shape latent representation with a point-structured latent space. For generation,
we train two hierarchical DDMs in these latent spaces. The hierarchical VAE approach boosts performance compared to DDMs that operate we train two hierarchical DDMs in these latent spaces. The hierarchical VAE approach boosts performance compared to DDMs that operate
on point clouds directly, while the point-structured latents are still ideally suited for DDM-based modeling. Experimentally, LION on point clouds directly, while the point-structured latents are still ideally suited for DDM-based modeling. Experimentally, LION
achieves state-of-the-art generation performance on multiple ShapeNet benchmarks. Furthermore, our VAE framework allows us to easily achieves state-of-the-art generation performance on multiple ShapeNet benchmarks. Furthermore, our VAE framework allows us to easily
use LION for different relevant tasks: LION excels at multimodal shape denoising and voxel-conditioned synthesis, and it can be adapted use LION for different relevant tasks: LION excels at multimodal shape denoising and voxel-conditioned synthesis, and it can be adapted
for text- and image-driven 3D generation.. We also demonstrate shape autoencoding and latent shape interpolation, and we augment LION with modern for text- and image-driven 3D generation. We also demonstrate shape autoencoding and latent shape interpolation, and we augment LION with modern
surface reconstruction techniques to generate smooth 3D meshes. We hope that LION provides a powerful tool for artists working with surface reconstruction techniques to generate smooth 3D meshes. We hope that LION provides a powerful tool for artists working with
3D shapes due to its high-quality generation, flexibility, and surface reconstruction. 3D shapes due to its high-quality generation, flexibility, and surface reconstruction.
</p> </p>
@ -452,10 +452,36 @@ pre {
<h2>Method</h2> <h2>Method</h2>
<div class="flex-row"> <div class="flex-row">
<p> <p>
LION is set up as a hierarchical point cloud VAE with denoising diffusion models over the shape latent and latent point distributions. We introduce the Latent Point Diffusion Model (LION), a DDM for 3D shape generation.
Point-Voxel CNNs (PVCNN) with adaptive Group Normalization (Ada. GN) are used as neural networks. LION focuses on learning a 3D generative model directly from geometry data without image-based training.
The latent points can be interpreted as a smoothed version of the input point cloud. Similar to previous 3D DDMs in this setting, LION operates on point clouds. However, it is constructed as a VAE with DDMs in latent
Shape As Points (SAP) is optionally used for mesh reconstruction. space. LION comprises a hierarchical latent space with a vector-valued global shape latent and another
point-structured latent space. The latent representations are predicted with point cloud processing
encoders, and two latent DDMs are trained in these latent spaces. Synthesis in LION proceeds by drawing
novel latent samples from the hierarchical latent DDMs and decoding back to the original point
cloud space. Importantly, we also demonstrate how to augment LION with modern surface reconstruction methods to
synthesize smooth shapes as desired by artists. LION has multiple advantages:
</p>
<p>
<b>Expressivity:</b> By mapping point clouds into regularized latent spaces, the DDMs in latent space are
effectively tasked with learning a smoothed distribution. This is easier than training on potentially
complex point clouds directly, thereby improving expressivity. However, point clouds are, in
principle, an ideal representation for DDMs. Because of that, we use latent points, this is, we keep a
point cloud structure for our main latent representation. Augmenting the model with an additional
global shape latent variable in a hierarchical manner further boosts expressivity.
</p>
<p>
<b>Varying Output Types:</b> Extending LION with Shape As Points (SAP) geometry reconstruction
allows us to also output smooth meshes. Fine-tuning SAP on data generated by LIONs autoencoder
reduces synthesis noise and enables us to generate high-quality geometry. LION combines (latent)
point cloud-based modeling, ideal for DDMs, with surface reconstruction, desired by artists.
</p>
<p>
<b>Flexibility:</b> Since LION is set up as a VAE, it can be easily adapted for different tasks without
retraining the latent DDMs: We can efficiently fine-tune LIONs encoders on voxelized or noisy inputs,
which a user can provide for guidance. This enables multimodal voxel-guided synthesis and shape
denoising. We also leverage LIONs latent spaces for shape interpolation and autoencoding. Optionally
training the DDMs conditioned on CLIP embeddings enables image- and text-driven 3D generation.
</p> </p>
</div> </div>
<center> <center>
@ -463,8 +489,11 @@ pre {
<a> <a>
<img width="80%" src="assets/pipeline.jpg"> <img width="80%" src="assets/pipeline.jpg">
</a> </a>
<p class="caption" style="margin-bottom: 24px;"> <p class="caption" style="margin-bottom: 24px;"><br>
Architecture of LION. LION is set up as a hierarchical point cloud VAE with denoising diffusion models over the shape latent and latent point distributions.
Point-Voxel CNNs (PVCNN) with adaptive Group Normalization (Ada. GN) are used as neural networks.
The latent points can be interpreted as a smoothed version of the input point cloud.
Shape As Points (SAP) is optionally used for mesh reconstruction.
</p> </p>
</figure> </figure>
@ -491,13 +520,16 @@ pre {
<section id="novelties"/> <section id="novelties"/>
<hr> <hr>
<h2>Technical Contributions</h2> <h2>Main Contributions</h2>
<div class="flex-row"> <div class="flex-row">
<p>We make the following technical contributions: <p>
<ul style="list-style-type:disc;"> <ul style="list-style-type:disc;">
<li>We explore the training of multiple denoising diffusion models (DDMs) in a latent space..</li> <li>We introduce LION, a novel generate model for 3D shape synthesis. We explore the training of multiple hierarchical denoising diffusion models in latent space.</li>
<li>We train latent DDMs in 3D generation.</li> <!-- <li>We train latent DDMs in 3D generation.</li> -->
<li>We outperform all baselines and demonstrate that LION scale to extremely diverse shape datasets, like modeling 13 or even 55 ShapeNet categories jointly without conditioning. </li> <li>We extensively validate LION's high synthesis quality and reach state-of-the-art performance on widely used ShapeNet benchmarks.</li>
<li>We demonstrate that LION scales to extremely diverse shape datasets. For instance, LION can model 13 or even 55 ShapeNet categories jointly without any class-conditioning. In the other extreme, we also verify that LION can be successfully trained on small datasets with less 100 shapes.</li>
<li>We propose to combine LION with Shape As Points-based surface reconstruction to directly extract practically useful meshes.</li>
<li>We show our model's flexibility by demonstrating how LION can be adapted to various relevant tasks, such as multimodal shape denoising, voxel-guided synthesis, text- and image-driven shape generation, and more.</li>
</ul> </ul>
</p> </p>
</div> </div>
@ -506,10 +538,10 @@ pre {
<section id="results"> <section id="results">
<hr> <hr>
<h2>Generation (Single Category)</h2> <h2>Generation Results (Single Category Models)</h2>
<div class="flex-row"> <!-- <div class="flex-row">
<p>Samples from LION trained on single catgory. </p> <p>Samples from LION trained on single catgory. </p>
</div> </div> -->
<center> <center>
<figure> <figure>
@ -518,7 +550,7 @@ pre {
Your browser does not support the video tag. Your browser does not support the video tag.
</video> </video>
<p class="caption"> <p class="caption">
Generated point clouds and reconstructed mesh of airplanes. Generated point clouds and reconstructed meshes of airplanes.
</p> <br> </p> <br>
</figure> </figure>
<figure> <figure>
@ -527,7 +559,7 @@ pre {
Your browser does not support the video tag. Your browser does not support the video tag.
</video> </video>
<p class="caption"> <p class="caption">
Generated point clouds and reconstructed mesh of chair. Generated point clouds and reconstructed meshes of chairs.
</p> <br> </p> <br>
</figure> </figure>
<figure> <figure>
@ -536,7 +568,7 @@ pre {
Your browser does not support the video tag. Your browser does not support the video tag.
</video> </video>
<p class="caption"> <p class="caption">
Generated point clouds and reconstructed mesh of car. Generated point clouds and reconstructed meshes of cars.
</p> <br> </p> <br>
</figure> </figure>
@ -546,7 +578,7 @@ pre {
Your browser does not support the video tag. Your browser does not support the video tag.
</video> </video>
<p class="caption" style="margin-bottom: 24px;"> <p class="caption" style="margin-bottom: 24px;">
Generated point clouds and reconstructed mesh of Animal. Generated point clouds and reconstructed meshes of animals (model trained on only 553 shapes).
</p> <br> </p> <br>
</figure> </figure>
</center> </center>
@ -557,7 +589,7 @@ pre {
Your browser does not support the video tag. Your browser does not support the video tag.
</video> </video>
<p class="caption"> <p class="caption">
Generated point clouds and reconstructed mesh of bottle. Generated point clouds and reconstructed meshes of bottles (model trained on only 340 shapes).
</p> <br> </p> <br>
</figure> </figure>
</center> </center>
@ -569,17 +601,17 @@ pre {
Your browser does not support the video tag. Your browser does not support the video tag.
</video> </video>
<p class="caption"> <p class="caption">
Generated point clouds and reconstructed mesh of mug. Generated point clouds and reconstructed meshes of mugs (model trained on only 149 shapes).
</p> <br> </p> <br>
</figure> </figure>
</center> </center>
<hr> <hr>
<h2>Generation (Multi-Classes)</h2> <h2>Generation Results (Multi-Class)</h2>
<!-- <div class="flex-row"> <div class="flex-row">
<p>samples from LION trained on multiple ShapeNet catgories, without conditioning. </p> <p>Below we show samples from LION models that were trained on shapes from multiple ShapeNet catgories, <l>without any class-conditioning</l>. We on purpose did not use conditioning to explore LION's scalability to diverse and multimodal datasets in the unconditional setting.</p>
</div> --> </div>
<center> <center>
<figure> <figure>
<video class="centered" width="100%" controls autoplay muted playsinline class="video-background " > <video class="centered" width="100%" controls autoplay muted playsinline class="video-background " >
@ -587,7 +619,7 @@ pre {
Your browser does not support the video tag. Your browser does not support the video tag.
</video> </video>
<p class="caption"> <p class="caption">
Generated point clouds and reconstructed mesh. LION model trained on 13 ShapeNet categories jointly without conditioning. Generated point clouds and reconstructed meshes. The LION model is trained on 13 ShapeNet categories jointly without conditioning.
</p> </p>
<br> <br>
</figure> </figure>
@ -600,7 +632,7 @@ pre {
Your browser does not support the video tag. Your browser does not support the video tag.
</video> </video>
<p class="caption"> <p class="caption">
Generated point clouds and reconstructed mesh. LION model trained on 55 ShapeNet categories jointly without conditioning. Generated point clouds from a LION model that was trained on all 55 ShapeNet categories jointly without conditioning.
</p> </p>
<br> <br>
</figure> </figure>
@ -610,10 +642,11 @@ pre {
<section id="more_results"> <section id="more_results">
<hr> <hr>
<h2>More Results</h2> <h2>More Results and Applications</h2>
<h3>Interpolation </h3> Our main goal was to introduce a high-performance 3D shape generative model. Here, we qualitatively demonstrate how LION can be used for a variety of interesting applications.
<h3>Shape Interpolation</h3>
<div class="flex-row"> <div class="flex-row">
<p>LION can interpolate two shapes by traversing the latent space. The generated shapes are clean and semantically plausible along the entire interpolation path. </p> <p>LION can interpolate shapes by traversing the latent space (interpolation is performed in the latent diffusion models' prior space, using the <i>Probability Flow ODE</i> for deterministic DDM-generation). The generated shapes are clean and semantically plausible along the entire interpolation path.</p>
</div> </div>
<figure> <figure>
<video class="centered" width="100%" controls muted playsinline class="video-background " > <video class="centered" width="100%" controls muted playsinline class="video-background " >
@ -621,7 +654,7 @@ pre {
Your browser does not support the video tag. Your browser does not support the video tag.
</video> </video>
<p class="caption"> <p class="caption">
Left most shape: the source shape. Right most shape: the target shape. The shapes in middle are interpolated results between source and target shape. Leftmost shape: the source shape. Rightmost shape: the target shape. The shapes in the middle are interpolated between source and target shape.
</p> </p>
</figure> </figure>
<center> <center>
@ -638,8 +671,7 @@ pre {
<br> <br>
<h3>Fast Sampling with DDIM</h3> <h3>Fast Sampling with DDIM</h3>
<div class="flex-row"> <div class="flex-row">
<p>The sampling time of LION can be reduced by applying DDIM sampler. <p>LION's sampling time can be reduced by using fast DDM sampler, such as the DDIM sampler. DDIM sampling with 25 steps can already generate high-quality shapes, which takes less than 1 sec. This enables real-time and interactive applications.</p>
DDIM sampler with 25 steps can already generate high-quality shapes, which takes less than 1 sec. </p>
</div> </div>
<center> <center>
<figure style="width: 100%;"> <figure style="width: 100%;">
@ -647,7 +679,7 @@ pre {
<img width="100%" src="assets/ddim_sample.png"> <img width="100%" src="assets/ddim_sample.png">
</a> </a>
<p class="caption" style="margin-bottom: 24px;"> <p class="caption" style="margin-bottom: 24px;">
DDIM samples from LION trained on different data. The top two rows show the number of steps and the wall-clock time used when drawing one sample. DDIM samples from LION trained on different data. The top two rows show the number of steps and the wall-clock time required when drawing one sample.
With DDIM sampling, we can reduce the sampling time from 27.09 sec (1000 steps) to less than 1 sec (25 steps) to generate an object. With DDIM sampling, we can reduce the sampling time from 27.09 sec (1000 steps) to less than 1 sec (25 steps) to generate an object.
</p> </p>
</figure> </figure>
@ -657,7 +689,7 @@ pre {
<h3> Multimodal Generation</h3> <h3> Multimodal Generation</h3>
<div class="flex-row"> <div class="flex-row">
<p> <p>
LION can synthesize different variations of a given shape, enabling multimodal generation in a controlled manner. This is achieved through a diffuse-denoise procedure. LION can synthesize different variations of a given shape, enabling multimodal generation in a controlled manner. This is achieved through a diffuse-denoise procedure, where shapes a diffused for only a few steps in the latent DDMs and then denoised again.
</p> </p>
</div> </div>
<!-- <!--
@ -712,7 +744,7 @@ pre {
<img width="30%" src="assets/multi_modal/airplane/airplane_D200/sap_0000/recon_2.png"> <img width="30%" src="assets/multi_modal/airplane/airplane_D200/sap_0000/recon_2.png">
</a> </a>
<p class="caption" style="margin-bottom: 24px;"> <p class="caption" style="margin-bottom: 24px;">
Multimodal generation of airplane. Multimodal generation of airplanes.
</p> </p>
</figure> </figure>
</center> </center>
@ -725,7 +757,7 @@ pre {
<img width="30%" src="assets/multi_modal/chair/chair_D160/sap_0000/recon_9.png"> <img width="30%" src="assets/multi_modal/chair/chair_D160/sap_0000/recon_9.png">
</a> </a>
<p class="caption" style="margin-bottom: 24px;"> <p class="caption" style="margin-bottom: 24px;">
Multimodal generation of chair. Multimodal generation of chairs.
</p> </p>
</figure> </figure>
</center> </center>
@ -738,15 +770,14 @@ pre {
<img width="30%" src="assets/multi_modal/car/recon_5.png"> <img width="30%" src="assets/multi_modal/car/recon_5.png">
</a> </a>
<p class="caption" style="margin-bottom: 24px;"> <p class="caption" style="margin-bottom: 24px;">
Multimodal generation of car. Multimodal generation of cars.
</p> </p>
</figure> </figure>
</center> </center>
<h3>Voxel-Conditioned Synthesis </h3> <h3>Voxel-Conditioned Synthesis </h3>
<div class="flex-row"> <div class="flex-row">
<p>Given a coarse voxel grid, LION can generate different plausible detailed shapes. </p> <p>Given a coarse voxel grid, LION can generate different plausible detailed shapes. In practice, an artist using a 3D generative model may have a rough idea of the desired shape. For instance, they may be able to quickly construct a coarse voxelized shape, to which the generative model then adds realistic details. We achieve this by fine-tuning our encoder networks on voxelized shapes, and performing a few steps of diffuse-denoise in latent space to generate various plausible detailed shapes.</p>
<p>In practice, an artist using a 3D generative model may have a rough idea of the desired shape. For instance, they may be able to quickly construct a coarse voxelized shape, to which the generative model then adds realistic details. </p>
</div> </div>
<center> <center>
@ -769,12 +800,12 @@ pre {
<h3> Single View Reconstruction </h3> <h3> Single View Reconstruction </h3>
<div class="flex-row"> <div class="flex-row">
<p> <p>
We extend LION to also allow for single view reconstruction (SVR) from RGB data. We render 2D We qualitatively demonstrate how LION can be extended to also allow for single view reconstruction (SVR) from RGB data, using the approach from CLIP-Forge.
images from the 3D ShapeNet shapes, extracted the images CLIP image embeddings, and We render 2D images from the 3D ShapeNet shapes, extract the images CLIP image embeddings, and
trained LIONs latent diffusion models while conditioning on the shapes CLIP image embeddings. train LIONs latent diffusion models while conditioning on the shapes CLIP image embeddings.
At test time, we then take a single view 2D image, extract the CLIP image embedding, and generate At test time, we then take a single view 2D image, extract the CLIP image embedding, and generate
corresponding 3D shapes, thereby effectively performing SVR. We show SVR results from real corresponding 3D shapes, thereby effectively performing SVR. We show SVR results from real
RGB data RGB data.
</p> </p>
</div> </div>
<center> <center>
@ -784,7 +815,7 @@ pre {
<img width="49%" src="assets/svr/img2shape_cari2s_mm_mitsuba_full.jpg"> <img width="49%" src="assets/svr/img2shape_cari2s_mm_mitsuba_full.jpg">
</a> </a>
<p class="caption" style="margin-bottom: 24px;"> <p class="caption" style="margin-bottom: 24px;">
Single view reconstruction from RGB images of chair. For each input image, LION can generate multi-modal outputs. Single view reconstruction from RGB images of chairs. For each input image, LION can generate multi-modal outputs.
</p> </p>
</figure> </figure>
</center> </center>
@ -804,7 +835,7 @@ pre {
<img width="100%" src="assets/svr/img2shape_cari2s_mitsuba_full.jpg"> <img width="100%" src="assets/svr/img2shape_cari2s_mitsuba_full.jpg">
</a> </a>
<p class="caption" style="margin-bottom: 24px;"> <p class="caption" style="margin-bottom: 24px;">
More single view reconstruction from RGB images of car. More single view reconstructions from RGB images of cars.
</p> </p>
</figure> </figure>
</center> </center>
@ -822,14 +853,15 @@ pre {
<img width="60%" src="assets/clipforge_car.png"> <img width="60%" src="assets/clipforge_car.png">
</a> </a>
<p class="caption" style="margin-bottom: 24px;"> <p class="caption" style="margin-bottom: 24px;">
Text-driven shape generation of chairs with LION. Bottom row is the text input Text-driven shape generation of chairs with LION. Bottom row is the text inputs.
</p> </p>
</figure> </figure>
</center> </center>
<h3> Per-sample Text-driven Texture Synthesis</h3> <h3> Per-sample Text-driven Texture Synthesis</h3>
<div class="flex-row"> <div class="flex-row">
<p> <p>
We apply Text2mesh on some generated meshes from LION to additionally synthesize textures in a text-driven manner, leveraging CLIP. The original mesh is generated by LION. We apply Text2mesh on some generated meshes from LION to additionally synthesize textures in a text-driven manner, leveraging CLIP. The original input meshes are generated by LION.
This is only possible because LION can output practically useful meshes with its SAP-based surface reconstruction component (even though the backbone generative modeling component is point cloud-based).
</p> </p>
</div> </div>
<div class="row"> <div class="row">
@ -872,16 +904,16 @@ pre {
<div class="flex-row"> <div class="flex-row">
<div class="download-thumb"> <div class="download-thumb">
<div style="box-sizing: border-box; padding: 16px; margin: auto;"> <div style="box-sizing: border-box; padding: 16px; margin: auto;">
<a href="https://nv-tlabs.github.io/CLD-SGM"><img class="screenshot" src="assets/cld_paper_preview.png"></a> <a href="https://nv-tlabs.github.io/LION"><img class="screenshot" src="assets/cld_paper_preview.png"></a>
</div> </div>
</div> </div>
<div class="paper-stuff"> <div class="paper-stuff">
<p><b>LION: Latent Point Diffusion Models for 3D Shape Generation</b></p> <p><b>LION: Latent Point Diffusion Models for 3D Shape Generation</b></p>
<p>Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis</p> <p>Xiaohui Zeng, Arash Vahdat, Francis Williams, Zan Gojcic, Or Litany, Sanja Fidler, Karsten Kreis</p>
<p><i>Advances in Neural Information Processing Systems (NeurIPS), 2022 <b></b></i></p> <p><i>Advances in Neural Information Processing Systems (NeurIPS), 2022 <b></b></i></p>
<div><span class="material-icons"> description </span><a href="https://github.com/nv-tlabs/LION"> arXiv version</a></div> <div><span class="material-icons"> description </span><a href="https://nv-tlabs.github.io/LION"> arXiv version</a></div>
<div><span class="material-icons"> insert_comment </span><a href="assets/zeng2022lion.bib"> BibTeX</a></div> <div><span class="material-icons"> insert_comment </span><a href="assets/zeng2022lion.bib"> BibTeX</a></div>
<div><span class="material-icons"> integration_instructions </span><a href="https://github.com/nv-tlabs/LION"> Code</a></div> <div><span class="material-icons"> integration_instructions </span><a href="https://nv-tlabs.github.io/LION"> Code</a></div>
</div> </div>
</div> </div>
</div> </div>