diff --git a/index.html b/index.html index 2af9bf2..cddd53e 100644 --- a/index.html +++ b/index.html @@ -387,12 +387,12 @@ pre {
- + description Paper
- + code Code @@ -422,7 +422,7 @@ pre {

News

event [Oct 2022] Project page released!
-
event [Oct 2022] Paper released on arXiv!
+
event [Oct 2022] Paper released on arXiv!
event [Aug 2022] LION got accepted to Advances in Neural Information Processing Systems (NeurIPS)!
@@ -433,15 +433,15 @@ pre {

Denoising diffusion models (DDMs) have shown promising results in 3D point cloud synthesis. To advance 3D DDMs and make them useful - for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional - synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes. To this end, we introduce the + for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and applications such as conditional + synthesis and shape interpolation, and (iii) the ability to output smooth surfaces or meshes. To this end, we introduce the hierarchical Latent Point Diffusion Model (LION) for 3D shape generation. LION is set up as a variational autoencoder (VAE) with a hierarchical latent space that combines a global shape latent representation with a point-structured latent space. For generation, we train two hierarchical DDMs in these latent spaces. The hierarchical VAE approach boosts performance compared to DDMs that operate on point clouds directly, while the point-structured latents are still ideally suited for DDM-based modeling. Experimentally, LION achieves state-of-the-art generation performance on multiple ShapeNet benchmarks. Furthermore, our VAE framework allows us to easily use LION for different relevant tasks: LION excels at multimodal shape denoising and voxel-conditioned synthesis, and it can be adapted - for text- and image-driven 3D generation.. We also demonstrate shape autoencoding and latent shape interpolation, and we augment LION with modern + for text- and image-driven 3D generation. We also demonstrate shape autoencoding and latent shape interpolation, and we augment LION with modern surface reconstruction techniques to generate smooth 3D meshes. We hope that LION provides a powerful tool for artists working with 3D shapes due to its high-quality generation, flexibility, and surface reconstruction.

@@ -452,10 +452,36 @@ pre {

Method

- LION is set up as a hierarchical point cloud VAE with denoising diffusion models over the shape latent and latent point distributions. - Point-Voxel CNNs (PVCNN) with adaptive Group Normalization (Ada. GN) are used as neural networks. - The latent points can be interpreted as a smoothed version of the input point cloud. - Shape As Points (SAP) is optionally used for mesh reconstruction. + We introduce the Latent Point Diffusion Model (LION), a DDM for 3D shape generation. + LION focuses on learning a 3D generative model directly from geometry data without image-based training. + Similar to previous 3D DDMs in this setting, LION operates on point clouds. However, it is constructed as a VAE with DDMs in latent + space. LION comprises a hierarchical latent space with a vector-valued global shape latent and another + point-structured latent space. The latent representations are predicted with point cloud processing + encoders, and two latent DDMs are trained in these latent spaces. Synthesis in LION proceeds by drawing + novel latent samples from the hierarchical latent DDMs and decoding back to the original point + cloud space. Importantly, we also demonstrate how to augment LION with modern surface reconstruction methods to + synthesize smooth shapes as desired by artists. LION has multiple advantages: +

+

+ Expressivity: By mapping point clouds into regularized latent spaces, the DDMs in latent space are + effectively tasked with learning a smoothed distribution. This is easier than training on potentially + complex point clouds directly, thereby improving expressivity. However, point clouds are, in + principle, an ideal representation for DDMs. Because of that, we use latent points, this is, we keep a + point cloud structure for our main latent representation. Augmenting the model with an additional + global shape latent variable in a hierarchical manner further boosts expressivity. +

+

+ Varying Output Types: Extending LION with Shape As Points (SAP) geometry reconstruction + allows us to also output smooth meshes. Fine-tuning SAP on data generated by LION’s autoencoder + reduces synthesis noise and enables us to generate high-quality geometry. LION combines (latent) + point cloud-based modeling, ideal for DDMs, with surface reconstruction, desired by artists. +

+

+ Flexibility: Since LION is set up as a VAE, it can be easily adapted for different tasks without + retraining the latent DDMs: We can efficiently fine-tune LION’s encoders on voxelized or noisy inputs, + which a user can provide for guidance. This enables multimodal voxel-guided synthesis and shape + denoising. We also leverage LION’s latent spaces for shape interpolation and autoencoding. Optionally + training the DDMs conditioned on CLIP embeddings enables image- and text-driven 3D generation.

@@ -463,8 +489,11 @@ pre { -

- Architecture of LION. +


+ LION is set up as a hierarchical point cloud VAE with denoising diffusion models over the shape latent and latent point distributions. + Point-Voxel CNNs (PVCNN) with adaptive Group Normalization (Ada. GN) are used as neural networks. + The latent points can be interpreted as a smoothed version of the input point cloud. + Shape As Points (SAP) is optionally used for mesh reconstruction.

@@ -491,13 +520,16 @@ pre {

-

Technical Contributions

+

Main Contributions

-

We make the following technical contributions: +

    -
  • We explore the training of multiple denoising diffusion models (DDMs) in a latent space..
  • -
  • We train latent DDMs in 3D generation.
  • -
  • We outperform all baselines and demonstrate that LION scale to extremely diverse shape datasets, like modeling 13 or even 55 ShapeNet categories jointly without conditioning.
  • +
  • We introduce LION, a novel generate model for 3D shape synthesis. We explore the training of multiple hierarchical denoising diffusion models in latent space.
  • + +
  • We extensively validate LION's high synthesis quality and reach state-of-the-art performance on widely used ShapeNet benchmarks.
  • +
  • We demonstrate that LION scales to extremely diverse shape datasets. For instance, LION can model 13 or even 55 ShapeNet categories jointly without any class-conditioning. In the other extreme, we also verify that LION can be successfully trained on small datasets with less 100 shapes.
  • +
  • We propose to combine LION with Shape As Points-based surface reconstruction to directly extract practically useful meshes.
  • +
  • We show our model's flexibility by demonstrating how LION can be adapted to various relevant tasks, such as multimodal shape denoising, voxel-guided synthesis, text- and image-driven shape generation, and more.

@@ -506,10 +538,10 @@ pre {

-

Generation (Single Category)

-
+

Generation Results (Single Category Models)

+
@@ -518,7 +550,7 @@ pre { Your browser does not support the video tag.

- Generated point clouds and reconstructed mesh of airplanes. + Generated point clouds and reconstructed meshes of airplanes.


@@ -527,7 +559,7 @@ pre { Your browser does not support the video tag.

- Generated point clouds and reconstructed mesh of chair. + Generated point clouds and reconstructed meshes of chairs.


@@ -536,7 +568,7 @@ pre { Your browser does not support the video tag.

- Generated point clouds and reconstructed mesh of car. + Generated point clouds and reconstructed meshes of cars.


@@ -546,7 +578,7 @@ pre { Your browser does not support the video tag.

- Generated point clouds and reconstructed mesh of Animal. + Generated point clouds and reconstructed meshes of animals (model trained on only 553 shapes).


@@ -557,7 +589,7 @@ pre { Your browser does not support the video tag.

- Generated point clouds and reconstructed mesh of bottle. + Generated point clouds and reconstructed meshes of bottles (model trained on only 340 shapes).


@@ -569,17 +601,17 @@ pre { Your browser does not support the video tag.

- Generated point clouds and reconstructed mesh of mug. + Generated point clouds and reconstructed meshes of mugs (model trained on only 149 shapes).



-

Generation (Multi-Classes)

- +

Generation Results (Multi-Class)

+
+

Below we show samples from LION models that were trained on shapes from multiple ShapeNet catgories, without any class-conditioning. We on purpose did not use conditioning to explore LION's scalability to diverse and multimodal datasets in the unconditional setting.

+

- Generated point clouds and reconstructed mesh. LION model trained on 13 ShapeNet categories jointly without conditioning. + Generated point clouds and reconstructed meshes. The LION model is trained on 13 ShapeNet categories jointly without conditioning.


@@ -600,7 +632,7 @@ pre { Your browser does not support the video tag.

- Generated point clouds and reconstructed mesh. LION model trained on 55 ShapeNet categories jointly without conditioning. + Generated point clouds from a LION model that was trained on all 55 ShapeNet categories jointly without conditioning.


@@ -610,10 +642,11 @@ pre {

-

More Results

-

Interpolation

+

More Results and Applications

+ Our main goal was to introduce a high-performance 3D shape generative model. Here, we qualitatively demonstrate how LION can be used for a variety of interesting applications. +

Shape Interpolation

-

LION can interpolate two shapes by traversing the latent space. The generated shapes are clean and semantically plausible along the entire interpolation path.

+

LION can interpolate shapes by traversing the latent space (interpolation is performed in the latent diffusion models' prior space, using the Probability Flow ODE for deterministic DDM-generation). The generated shapes are clean and semantically plausible along the entire interpolation path.

- Left most shape: the source shape. Right most shape: the target shape. The shapes in middle are interpolated results between source and target shape. + Leftmost shape: the source shape. Rightmost shape: the target shape. The shapes in the middle are interpolated between source and target shape.

@@ -636,10 +669,9 @@ pre {

-

Fast Sampling with DDIM

+

Fast Sampling with DDIM

-

The sampling time of LION can be reduced by applying DDIM sampler. - DDIM sampler with 25 steps can already generate high-quality shapes, which takes less than 1 sec.

+

LION's sampling time can be reduced by using fast DDM sampler, such as the DDIM sampler. DDIM sampling with 25 steps can already generate high-quality shapes, which takes less than 1 sec. This enables real-time and interactive applications.

@@ -647,17 +679,17 @@ pre {

- DDIM samples from LION trained on different data. The top two rows show the number of steps and the wall-clock time used when drawing one sample. + DDIM samples from LION trained on different data. The top two rows show the number of steps and the wall-clock time required when drawing one sample. With DDIM sampling, we can reduce the sampling time from 27.09 sec (1000 steps) to less than 1 sec (25 steps) to generate an object.


-

Multimodal Generation

+

Multimodal Generation

- LION can synthesize different variations of a given shape, enabling multimodal generation in a controlled manner. This is achieved through a diffuse-denoise procedure. + LION can synthesize different variations of a given shape, enabling multimodal generation in a controlled manner. This is achieved through a diffuse-denoise procedure, where shapes a diffused for only a few steps in the latent DDMs and then denoised again.