LION/index.html

547 lines
21 KiB
HTML
Raw Normal View History

2022-09-08 05:02:10 +00:00
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<script type="text/javascript" charset="utf-8" src="https://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script>
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
<style type="text/css">
body {
font-family: "Titillium Web", "HelveticaNeue-Light", "Helvetica Neue Light", "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;
font-weight: 300;
font-size: 17px;
margin-left: auto;
margin-right: auto;
}
@media screen and (min-width: 980px){
body {
width: 980px;
}
}
h1 {
font-weight:300;
line-height: 1.15em;
}
h2 {
font-size: 1.75em;
}
a:link,a:visited {
color: #B6486F;
text-decoration: none;
}
a:hover {
color: #208799;
}
h1, h2, h3 {
text-align: center;
}
h1 {
font-size: 40px;
font-weight: 500;
}
h2 {
font-weight: 400;
margin: 16px 0px 4px 0px;
}
.paper-title {
padding: 16px 0px 16px 0px;
}
section {
margin: 32px 0px 32px 0px;
text-align: justify;
clear: both;
}
.col-5 {
width: 20%;
float: left;
}
.col-4 {
width: 25%;
float: left;
}
.col-3 {
width: 33%;
float: left;
}
.col-2 {
width: 50%;
float: left;
}
.col-1 {
width: 100%;
float: left;
}
.author-row, .affil-row {
font-size: 26px;
}
.author-row-new {
text-align: center;
}
.author-row-new a {
display: inline-block;
font-size: 26px;
padding: 15px;
}
.author-row-new sup {
color: #313436;
font-size: 60%;
}
.affiliations-new {
font-size: 18px;
text-align: center;
width: 80%;
margin: 0 auto;
margin-bottom: 20px;
}
.row {
margin: 16px 0px 16px 0px;
}
.authors {
font-size: 26px;
}
.affiliatons {
font-size: 18px;
}
.affil-row {
margin-top: 18px;
}
.teaser {
max-width: 100%;
}
.text-center {
text-align: center;
}
.screenshot {
width: 256px;
border: 1px solid #ddd;
}
.screenshot-el {
margin-bottom: 16px;
}
hr {
height: 1px;
border: 0;
border-top: 1px solid #ddd;
margin: 0;
}
.material-icons {
vertical-align: -6px;
}
p {
line-height: 1.25em;
}
.caption {
font-size: 16px;
color: #666;
margin-top: 4px;
margin-bottom: 10px;
}
video {
display: block;
margin: auto;
}
figure {
display: block;
margin: auto;
margin-top: 10px;
margin-bottom: 10px;
}
#bibtex pre {
font-size: 14px;
background-color: #eee;
padding: 16px;
}
.blue {
color: #2c82c9;
font-weight: bold;
}
.orange {
color: #d35400;
font-weight: bold;
}
.flex-row {
display: flex;
flex-flow: row wrap;
padding: 0;
margin: 0;
list-style: none;
}
.paper-btn-coming-soon {
position: relative;
top: 0;
left: 0;
}
.coming-soon {
position: absolute;
top: -15px;
right: -15px;
}
.paper-btn {
position: relative;
text-align: center;
display: inline-block;
margin: 8px;
padding: 8px 8px;
border-width: 0;
outline: none;
border-radius: 2px;
background-color: #B6486F;
color: white !important;
font-size: 20px;
width: 100px;
font-weight: 600;
}
.paper-btn-parent {
display: flex;
justify-content: center;
margin: 16px 0px;
}
.paper-btn:hover {
opacity: 0.85;
}
.container {
margin-left: auto;
margin-right: auto;
padding-left: 16px;
padding-right: 16px;
}
.venue {
font-size: 30px;
}
.topnav {
background-color: #EEEEEE;
overflow: hidden;
}
.topnav div {
max-width: 1070px;
margin: 0 auto;
}
.topnav a {
display: inline-block;
color: black;
text-align: center;
vertical-align: middle;
padding: 16px 16px;
text-decoration: none;
font-size: 18px;
}
.topnav img {
padding: 2px 0px;
width: 100%;
margin: 0.2em 0px 0.3em 0px;
vertical-align: middle;
}
pre {
font-size: 0.9em;
padding-left: 7px;
padding-right: 7px;
padding-top: 3px;
padding-bottom: 3px;
border-radius: 3px;
background-color: rgb(235, 235, 235);
overflow-x: auto;
}
.download-thumb {
display: flex;
}
@media only screen and (max-width: 620px) {
.download-thumb {
display: none;
}
}
.paper-stuff {
width: 50%;
font-size: 20px;
}
@media only screen and (max-width: 620px) {
.paper-stuff {
width: 100%;
}
}
</style>
<script type="text/javascript" src="../js/hidebib.js"></script>
<link href='https://fonts.googleapis.com/css?family=Titillium+Web:400,600,400italic,600italic,300,300italic' rel='stylesheet' type='text/css'>
<head>
<title>Score-Based Generative Modeling with Critically-Damped Langevin Diffusion</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta property="og:description" content="Score-Based Generative Modeling with Critically-Damped Langevin Diffusion"/>
<link href="https://fonts.googleapis.com/css2?family=Material+Icons" rel="stylesheet">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:creator" content="@timudk">
<meta name="twitter:title" content="Score-Based Generative Modeling with Critically-Damped Langevin Diffusion">
<meta name="twitter:description" content="Inspired by connections to statistical mechanics, we propose a novel diffusion process, critically-damped Langevin diffusion, that perturbs the data in a smoother manner by leveraging auxiliary velocity variables. This allows us to denoise more efficiently and learn higher quality generative models.">
<meta name="twitter:image" content="https://nv-tlabs.github.io/CLD-SGM/assets/cld_teaser_resized.png">
</head>
<body>
<div class="topnav" id="myTopnav">
<div>
<a href="https://www.nvidia.com/"><img width="100%" src="assets/nvidia.svg"></a>
<a href="https://nv-tlabs.github.io/" ><strong>Toronto AI Lab</strong></a>
</div>
</div>
<div class="container">
<div class="paper-title">
<h1>Score-Based Generative Modeling <br> with Critically-Damped Langevin Diffusion</h1>
</div>
<div id="authors">
<center>
<div class="author-row-new">
<a href="https://timudk.github.io/">Tim Dockhorn<sup>1,2,3</sup></a>
<a href="http://latentspace.cc/">Arash Vahdat<sup>1</sup></a>
<a href="https://karstenkreis.github.io/">Karsten Kreis<sup>1</sup></a>
</div>
</center>
<center>
<div class="affiliations">
<span><sup>1</sup> NVIDIA</span>
<span><sup>2</sup> University of Waterloo</span>
<span><sup>3</sup> Vector Institute</span> <br/>
</div>
<div class="affil-row">
<div class="venue text-center"><b>ICLR 2022 (spotlight)</b></div>
</div>
</center>
<div style="clear: both">
<div class="paper-btn-parent">
<a class="paper-btn" href="https://arxiv.org/abs/2112.07068">
<span class="material-icons"> description </span>
Paper
</a>
<div class="paper-btn-coming-soon">
<a class="paper-btn" href="https://github.com/nv-tlabs/CLD-SGM">
<span class="material-icons"> code </span>
Code
</a>
</div>
</div></div>
</div>
<br>
<section id="teaser-image">
</p><figure style="margin-top: 20px; margin-bottom: 20px;">
<img width="100%" src="./assets/cld_teaser.png" style="margin-bottom: 20px;">
<p class="caption">
In critically-damped Langevin diffusion, the data \(\bf{x}_t\) is augmented with a velocity \(\bf{v}_t\). A diffusion coupling \(\bf{x}_t\)
and \(\bf{v}_t\) is run in the joint data-velocity space (probabilities in red). Noise is injected only into \(\bf{v}_t\). This leads to smooth
diffusion trajectories (green) for the data \(\bf{x}_t\). Denoising only requires \(\nabla_{\bf{v}_t} \log p_t(\bf{v}_t |\bf{x}_t )\).
</p><p class="caption">
</p>
</section>
<section id="news">
<h2>News</h2>
<hr>
<div class="row">
<div><span class="material-icons"> event </span> [Mar 2022] Our <a href=https://github.com/nv-tlabs/CLD-SGM/>code</a> has been released.</div>
<div><span class="material-icons"> event </span> [Feb 2022] Karsten presented our work at <a href=https://mlcollective.org/dlct/>Deep Learning: Classics and Trends</a> by ML Collective (<a href=https://drive.google.com/file/d/1USVGp5FwtJw6WD9-m5GCqcTkgRwWKy6A/view?usp=sharing>slides</a>).</div>
<div><span class="material-icons"> event </span> [Jan 2022] Our paper got accepted at the <b>International Conference on Learning Representations (ICLR)</b> as a <b>spotlight</b> presentation! It received average <a href=https://openreview.net/forum?id=CzceR82CYc>reviewer ratings</a> of 8.5, which makes it a <b>top 0.4%</b> submission!</div>
<div><span class="material-icons"> event </span> [Jan 2022] Tim presented our work at the Vector Institute.</div>
<div><span class="material-icons"> event </span> [Dec 2021] <a href=https://twitter.com/karsten_kreis/status/1471160404747251720>Twitter thread</a> explaining the work in detail.</div>
<div><span class="material-icons"> event </span> [Dec 2021] <a href="https://nv-tlabs.github.io/CLD-SGM">Project page</a> released!</div>
<div><span class="material-icons"> event </span> [Dec 2021] Draft released on <a href="https://arxiv.org/abs/2112.07068">arXiv</a>!</div>
</div>
</section>
<section id="abstract"/>
<h2>Abstract</h2>
<hr>
<div class="flex-row">
<p>
Score-based generative models (SGMs) have demonstrated remarkable synthesis quality. SGMs rely on a diffusion process that gradually
perturbs the data towards a tractable distribution, while the generative model learns to denoise. The complexity of this denoising task
is, apart from the data distribution itself, uniquely determined by the diffusion process. We argue that current SGMs employ overly simplistic
diffusions, leading to unnecessarily complex denoising processes, which limit generative modeling performance. Based on connections to statistical
mechanics, we propose a novel critically-damped Langevin diffusion (CLD) and show that CLD-based SGMs achieve superior performance. CLD can be
interpreted as running a joint diffusion in an extended space, where the auxiliary variables can be considered "velocities" that are coupled to
the data variables as in Hamiltonian dynamics. We derive a novel score matching objective for CLD and show that the model only needs to learn the
score function of the conditional distribution of the velocity given data, an easier task than learning scores of the data directly. We also
derive a new sampling scheme for efficient synthesis from CLD-based diffusion models. We find that CLD outperforms previous SGMs in synthesis
quality for similar network architectures and sampling compute budgets. We show that our novel sampler for CLD significantly outperforms
solvers such as Euler—Maruyama. Our framework provides new insights into score-based denoising diffusion models and can be readily used for
high-resolution image synthesis.
</p>
</div>
</section>
<section id="intro"/>
<h2>Score-Based Generative Modeling <br> with Critically-Damped Langevin Diffusion</h2>
<hr>
<div class="flex-row">
<p> Score-based generative models (SGMs) and denoising diffusion probabilistic models have emerged as a promising class of generative models.
SGMs offer high quality synthesis and sample diversity, do not require adversarial objectives, and have found applications in image, speech,
and music synthesis, image editing, super-resolution, image-to-image translation, and 3D shape generation. SGMs use a diffusion process to gradually
add noise to the data, transforming a complex data distribution to an analytically tractable prior distribution. A neural network is then utilized to
learn the score function—the gradient of the log probability density—of the perturbed data. The learnt scores can be used to solve a stochastic
differential equation (SDE) to synthesize new samples. This corresponds to an iterative denoising process, inverting the forward diffusion.
</p>
<p> It has been shown that the score function that needs to be learnt by the neural network is uniquely determined by the forward diffusion process.
Consequently, the complexity of the learning problem depends, other than on the data itself, only on the diffusion. Hence, the diffusion process
is the key component of SGMs that needs to be revisited to further improve SGMs, for example, in terms of synthesis quality or sampling speed.
</p>
<p> Inspired by statistical mechanics, we propose a novel forward diffusion process, the critically-damped Langevin diffusion (CLD). In CLD, the data
variable, \(\bf{x}_t\) (time \(t\) along the diffusion), is augmented with an additional "velocity" variable \(\bf{v}_t\) and a diffusion process is run in
the joint data-velocity space. Data and velocity are coupled to each other as in Hamiltonian dynamics, and noise is injected only into the velocity
variable. Similarly as in Hamiltonian Monte Carlo, the Hamiltonian component helps to efficiently traverse the joint data-velocity space
and to transform the data distribution into the prior distribution more smoothly. We derive the corresponding score matching objective and show that
for CLD the neural network is tasked with learning only the score of the conditional distribution of velocity given data
\(\nabla_{\bf{v}_t} \log p_t(\bf{v}_t |\bf{x}_t )\), which is arguably easier than learning the score of the diffused data distribution directly. Using
techniques from molecular dynamics, we also derive a novel SDE integrator tailored to CLD's reverse-time synthesis SDE.
</p>
</div>
</section>
<section id="teaser-video">
</p>
<figure>
<video class="centered" width="100%" autoplay loop muted playsinline class="video-background " >
<source src="assets/animation.mp4#t=0.001" type="video/mp4">
Your browser does not support the video tag.
</video>
<p class="caption">
Schematic visualization of CLD's forward diffusion as well as reverse-time synthesis process: At the top, we visualize how a one-dimensional data
distribution (mixture of three Normals) together with the velocity diffuses towards the prior in the joint data-velocity space and how generation proceeds
in the reverse direction. We sample three different diffusion trajectories (in green) and also show the projections into data and velocity space on the
right. We can see smooth diffusion trajectories for the data variables. At the bottom, we visualize a similar diffusion and synthesis process for
(high-dimensional) image generation. We see that the velocities "encode" the data at intermediate times \(t\).
</p>
</figure>
</p>
</section>
<section id="novelties"/>
<h2>Technical Contributions</h2>
<hr>
<div class="flex-row">
<p>We make the following technical contributions:
<ul style="list-style-type:disc;">
<li>We propose CLD, a novel diffusion process for SGMs.</li>
<li>We derive a score matching objective for CLD which requires only the score of the conditional distribution of velocity given data.</li>
<li>We propose hybrid denoising score matching, a new type of denoising score matching ideally suited for scalable training of CLD-based SGMs.</li>
<li>We derive a tailored SDE integrator that enables efficient sampling from CLD-based models.</li>
<li>Overall, we provide novel insights into SGMs and point out important new connections to statistical mechanics.</li>
</ul>
</p>
</div>
</section>
<section id="results">
<h2>Experimental Results</h2>
<hr>
<div class="flex-row">
<p>We extensively validate CLD and the new SDE solver:
<ul style="list-style-type:disc;">
<li>We show that the neural networks learnt in CLD-based SGMs are smoother than those of previous SGMs. We attribute this to the Hamiltonian component in the diffusion and to CLDs easier score function target, the score of the velocity-data conditional distribution \(\nabla_{\bf{v}_t} \log p_t(\bf{v}_t |\bf{x}_t )\).</li>
<li>On the CIFAR-10 image modeling benchmark, we demonstrate that CLD-based models outperform previous diffusion models in synthesis quality for
similar network architectures and sampling compute budgets. Our CLD-based SGMs achieve FID scores of 2.25 and 2.23 using probability flow ODE sampling and generative SDE sampling, respectively.</li>
<li>We show that our novel SDE integrator for CLD is well suited for synthesis with limited neural network calls and significantly outperforms the popular EulerMaruyama method.</li>
<li>We perform ablations on various aspects of CLD and find that CLD does not have difficult-to-tune hyperparameters.</li>
</ul>
Samples from our CLD-based SGMs as well as latent space traversals and sample generation paths are visualized below.
</p>
</div>
<center>
<figure style="width: 100%;">
<a>
<img width="40%" src="assets/cifar10_main.png">
<img width="40%" src="assets/celeba_main.png">
</a>
<p class="caption" style="margin-bottom: 24px;">
Generated samples for CIFAR-10 (left) and CelebA-HQ-256 (right). CLD-based SGMs generate sharp, high-quality, and diverse samples.
</p>
</figure>
</center>
<br>
<figure>
<video class="centered" width="100%" autoplay loop muted playsinline class="video-background " >
<source src="assets/latent_interpolation.mp4#t=0.001" type="video/mp4">
Your browser does not support the video tag.
</video>
<p class="caption">
The sequence above is generated by randomly traversing the latent space of our CLD-SGM model (using the probability flow ODE formulation).
</p>
</figure>
<br> <br>
<figure style="width: 100%;">
<a href="assets/celeba_gen.png">
<img width="100%" src="assets/celeba_gen.png">
</a>
<p class="caption" style="margin-bottom: 24px;">
Visualization of the generation paths of samples from our CelebA-HQ-256 model (synthesis uses only 150 steps). Odd and even rows visualize data and velocity variables, respectively.
The eight columns correspond to times \(t \in \{1.0, 0.5, 0.3, 0.2, 0.1, 10^{-2}, 10^{-3}, 10^{-5}\}\) (from left to right). The velocity distribution
converges to a Normal (different variances) for both \(t \to 0\) and \(t \to 1\). See Appendix F.3 in our paper for visualization details and discussion.
</p>
</figure>
</section>
<section id="paper">
<h2>Paper</h2>
<hr>
<div class="flex-row">
<div class="download-thumb">
<div style="box-sizing: border-box; padding: 16px; margin: auto;">
<a href="https://nv-tlabs.github.io/CLD-SGM"><img class="screenshot" src="assets/cld_paper_preview.png"></a>
</div>
</div>
<div class="paper-stuff">
<p><b>Score-Based Generative Modeling with Critically-Damped Langevin Diffusion</b></p>
<p>Tim Dockhorn, Arash Vahdat, Karsten Kreis</p>
<p><i>International Conference on Learning Representations (ICLR), 2022 <b>(spotlight)</b></i></p>
<div><span class="material-icons"> description </span><a href="https://arxiv.org/abs/2112.07068"> arXiv version</a></div>
<div><span class="material-icons"> insert_comment </span><a href="assets/dockhorn2021score.bib"> BibTeX</a></div>
<div><span class="material-icons"> integration_instructions </span><a href="https://github.com/nv-tlabs/CLD-SGM"> Code</a></div>
</div>
</div>
</div>
</section>
<section id="bibtex">
<h2>Citation</h2>
<hr>
<pre><code>@inproceedings{dockhorn2022score,
title={Score-Based Generative Modeling with Critically-Damped Langevin Diffusion},
author={Tim Dockhorn and Arash Vahdat and Karsten Kreis},
booktitle={International Conference on Learning Representations (ICLR)},
year={2022}
}</code></pre>
</section>
</div>
</body>
</html>