projet-long/slides.md
2023-06-25 20:03:29 +02:00

733 lines
18 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
theme: academic
class: text-white
coverBackgroundUrl: https://plus.unsplash.com/premium_photo-1673553304257-018c85e606f8?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8
coverBackgroundSource: unplash
coverBackgroundSourceUrl: https://unsplash.com/photos/g4I556WCJT0
coverDate: ""
themeConfig:
paginationX: r
paginationY: t
paginationPagesDisabled:
- 1
title: Projet Long
---
<h1 style="font-size: 2.3rem;">Sphere detection and multimedia applications</h1>
<span class="absolute p-2 text-xs right-0 top-0 opacity-50">
2023-03-09
</span>
<span class="absolute bottom-12 opacity-50">
Laurent Fainsin, Pierre-Eliot Jourdan, Raphaëlle Monville-Letu, Jade Neav
</span>
---
# Contents
<div class="h-100 flex items-center text-2xl">
- Types of spheres
- Automatic sphere detection
- Lighting intensity estimation
- Lighting direction estimation
</div>
<figure class="absolute top-15 right-25 w-35">
<img src="https://images.pexels.com/photos/13849458/pexels-photo-13849458.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1"/>
<figcaption class="text-center">Architecture</figcaption>
</figure>
<figure class="absolute top-40 right-75 w-50">
<img src="https://images.pexels.com/photos/3945321/pexels-photo-3945321.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1"/>
<figcaption class="text-center">Cinema</figcaption>
</figure>
<figure class="absolute top-72 right-30 w-45">
<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTzg_yM_NbCIYXfZ55WdtFbAtaF7EUGSKSVBQ&usqp=CAU"/>
<figcaption class="text-center">3D Reconstruction</figcaption>
</figure>
<a href="https://www.pexels.com" class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">pexels</a>
---
class: text-white custombg
---
<style>
.custombg {
background-repeat: no-repeat;
background-position: center center;
background-size: cover;
background-image: url("/assets/spheres.png");
}
</style>
# Types of spheres
---
class: text-white custombg2
---
<style>
.custombg2 {
background-repeat: no-repeat;
background-position: center center;
background-size: cover;
background-image: url("https://media.caveacademy.com/wp-content/uploads/2021/05/04000307/cave_prop1002_chrome_v001_r001.jpg");
}
</style>
## Chrome sphere
<a href="https://caveacademy.com/wiki/onset-production/data-acquisition/data-acquisition-training/the-grey-the-chrome-and-the-macbeth-chart/" class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">CaveAcademy</a>
---
## Acquisition techniques
<div class="h-full flex items-center">
<img src="/assets/capture_hdri.jpg" class="m-auto"/>
</div>
<a href="https://www.youtube.com/watch?v=kwGZa5qTeAI" class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">Louis du Mont</a>
<!-- https://www.youtube.com/watch?v=HCfHQL4kLnw -->
---
## Realistic lighting
<div class="grid grid-cols-2 col-auto m-auto h-100 gap-1">
<img src="/assets/image-026.png" class="m-auto w-full"/>
<img src="/assets/image-027.png" class="m-auto w-full"/>
</div>
<a href="https://dl.acm.org/doi/pdf/10.1145/1103900.1103914" class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">High Dynamic Range Imaging, Paul Debevec</a>
---
class: text-white custombg3
---
<style>
.custombg3 {
background-repeat: no-repeat;
background-position: center center;
background-size: cover;
background-image: url("/assets/shiny.jpg");
}
</style>
## Shiny sphere
<a href="https://caveacademy.com/wiki/onset-production/data-acquisition/data-acquisition-training/the-grey-the-chrome-and-the-macbeth-chart/" class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">CaveAcademy</a>
---
class: text-white custombg4
---
<style>
.custombg4 {
background-repeat: no-repeat;
background-position: center center;
background-size: cover;
background-image: url("https://media.caveacademy.com/wp-content/uploads/2021/05/04000316/cave_prop1002_grey_v001_r001.jpg");
}
</style>
## Matte sphere
<a href="https://caveacademy.com/wiki/onset-production/data-acquisition/data-acquisition-training/the-grey-the-chrome-and-the-macbeth-chart/" class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">CaveAcademy</a>
---
# Automatic sphere detection
<div class="h-100 flex items-center text-2xl">
- Model
- Datasets
- Results
- Perspectives
</div>
<!--
So we have a few applications that use spheres, but if we actually want to perform them, we need to know the locations of said spheres.
Well, there is no known traditional method to directly detect spheres (especially chrome spheres) in images,
so we have no choice but to use deep neural networks.
-->
---
## Model
<div class="h-100 flex items-center">
<img src="/assets/DETR.png" class="m-auto"/>
</div>
<a href="https://arxiv.org/abs/2005.12872" class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">End-to-End Object Detection with Transformers, arXiv:2005.12872</a>
<!--
Let's do deep learning then !
First we need a model, for our problem we chose DETR, from the paper End-to-End Object Detection with Transformers,
published by facebook research in 2020.
We chose this model since it has proven itself to achieve state of the art performance without too much difficulty.
We also chose this model since it is quite recent, well supported by frameworks,
and mostly because there are publicly available pretrained weights of this model online.
And so here is the architecture of the model, a simple CNN backbone, followed by an encoder-decoder transformer, in turn followed by the prediction heads.
-->
---
## Datasets (1/4)
<div class="h-full flex items-center">
<img src="/assets/antoine.webp" class="m-auto h-100"/>
</div>
<span class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">Antoine Laurent</span>
<!--
Secondly we need data, and a lot if possible.
The first dataset we got our hands on were archeological photograph by Antoine Laurent for the purpose of 3D reconstruction for heritage preservation.
This dataset consists of ~1000 images of rocky things in front of rocky dark backgrounds. This dataset was a good start, but had a couple weaknesses. for example it only contained white matte sphere and red & black shiny spheres, no chrome spheres. Also since it was for 3D reconstruction lots of images were "the same" as only the direction of light varied, which made it very prone to overfitting.
-->
---
## Datasets (2/4)
<div class="h-full flex items-center">
<img src="/assets/illumination.webp" class="m-auto h-100"/>
</div>
<a href="https://projects.csail.mit.edu/illumination/" class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">A Dataset of Multi-Illumination Images in the Wild</a>
<!--
The second dataset we got our hands on is similar to the first one. It is from the paper "A Dataset of Multi-Illumination Images in the Wild" whose purpose was to relight a scene, so nothing to do with what we wanna do, but at least each image contained a chrome and grey sphere. The main weakness of this dataset was that it only had indoor images.
We trained a first model on these two datasets, and it didn't work very well on totally new images. It was clear that this dataset was not enough to achieve generalization.
-->
---
## Datasets (3/4)
<div class="h-full flex items-center">
<img src="/assets/compositing.webp" class="m-auto h-100"/>
</div>
<span class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs"><a href="https://cocodataset.org/#home">MS COCO</a> compositing</span>
<!--
So we turned ourself to synthetic dataset.
The first dataset we created was by using compositing, so we basically pasted spheres on top of random images. It worked quite well, but since we do not know the environment around the images we could not paste coherent chrome spheres.
-->
---
## Datasets (4/4)
<div class="h-full flex items-center">
<img src="/assets/render.webp" class="m-auto h-100"/>
</div>
<span class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">
<a href="https://www.blender.org/">Blender</a>,
<a href="https://polyhaven.com/">PolyHaven</a>
</span>
<!--
We thus made coherents images with blender. coherent reflections, with polyhaven.
sprinkled some light augmentations
-->
---
## Results (1/8)
<div class="h-full flex items-center">
<img src="/assets/image2_0_0.jpg" class="m-auto h-110">
</div>
<!-- standard sphere kinda degraded, still correctly recognized -->
---
## Results (2/8)
<div class="h-full flex items-center">
<img src="/assets/image2_0_2.jpg" class="m-auto h-110">
</div>
<!--
occlusion
reflection inside chrome not detected
-->
---
## Results (3/8)
<div class="h-full flex items-center">
<img src="/assets/image2_0_3.jpg" class="m-auto h-110">
</div>
<!--
closeup
-->
---
## Results (4/8)
<div class="h-full flex items-center">
<img src="/assets/image2_0_4.jpg" class="m-auto h-110">
</div>
<!--
detection not so great, though the scene is pretty complex
-->
---
## Results (5/8)
<div class="h-full flex items-center">
<img src="/assets/image2_0_5.jpg" class="m-auto h-110">
</div>
---
## Results (6/8)
<div class="h-full flex items-center">
<img src="/assets/image2_1_0.jpg" class="m-auto h-110">
</div>
---
## Results (7/8)
<div class="h-full flex items-center">
<img src="/assets/image2_1_1.jpg" class="m-auto h-110">
</div>
---
## Results (8/8)
<div class="h-full flex items-center">
<img src="/assets/image2_0_1.jpg" class="m-auto h-110">
</div>
<!--
one false positive, though can be filtered, 0.95 vs 0.99 threshold
-->
---
## Perspectives
<div class="h-full flex items-center">
<img src="/assets/surface-imperfections.png" class="m-auto h-110"/>
</div>
<a href="https://www.poliigon.com/textures/surface-imperfections" class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">Poliigon.com</a>
<!--
augmentations inside blender to make spheres more realistic, prb textures scratches, fingerprints...
bigger network resnet 101
alternative architecture, deformable attention, conditionnal detr, dino
-->
---
# Lighting intensity estimation
<div class="h-100 flex items-center text-2xl">
- Photometric Stereo
- Lambert Law
- Problem formulation
- Algorithms
- Generated images
- Results
- Perspectives
</div>
<!--
2nd problem = estimate the intensity of the lighting in an image
-> important problem in 3D reconstruction (Photometric Stereo)
-->
---
## Photometric Stereo
<div class="h-100 flex items-center">
<img src= "https://upload.wikimedia.org/wikipedia/commons/b/b5/Photometric_stereo.png" class="m-auto h-90"/>
</div>
- Estimate the surface normals of an object
- Shiny spheres $\rightarrow$ direction of the lighting
<a href="https://en.wikipedia.org/wiki/Photometric_stereo" class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">Wikipedia</a>
<!--
Photometric stereo = technique for estimating the surface normals of objects by observing that object under different lighting conditions
-->
---
## Lambert law
<div class="h-100 flex items-center">
<span>
$I(q) = \rho(Q) \times \vec{n}(Q) \cdot \vec{s}(Q)$
- $\rho(Q)$ is the albedo
- $\vec{n}(Q)$ is the normal vector
- $\vec{s}(Q) = \phi \times \vec{s_0}(Q)$ is the lighting direction
</span>
</div>
<img src="/assets/stereo.png" class="h-100 absolute right-10 top-10"/>
<a href="https://www.laserfocusworld.com/lasers-sources/article/14035413/photometric-stereo-techniques-analyze-reflections-to-improve-image-contrast" class="absolute bottom-0 font-extralight mb-1 mr-2 right-0 text-xs">LaserFocusWorld</a>
<!--
Q 3D point -> q projection of Q i an image
Albedo = fraction of light that a surface can reflect
-->
---
## Problem formulation
<div class="h-100 flex items-center">
<span>
$N$ lightings, $P$ pixels \
$\rightarrow I = M \times S \times D_{\phi}$
- $I \in \mathbb{R}^{P \times N} \rightarrow$ gray scale levels $\rightarrow$ known from image pixels
- $M \in \mathbb{R}^{P \times 3} \rightarrow$ the albedo and the normals $\rightarrow$ **unknown**
- $S \in \mathbb{R}^{3 \times N} \rightarrow$ direction of lightings $\rightarrow$ known from shiny spheres
- $D_{phi} = diag(\phi_1,...,\phi_{N}) \in \mathbb{R}^{ N \times N} \rightarrow$ intensities of lightings $\rightarrow$ **to be determined**
</span>
</div>
---
## Algorithm 1
<div class="h-100 flex items-center">
<img src="/assets/algo1.svg" class="m-auto h-72"/>
<span>
Intensities : $[\phi_1,...,\phi_{N}]$
New values : $\phi_j \plusmn \delta, \ j \in [1,..,N]$
Estimation of the matrix $M$
Mean-squared error : $\underset{\phi_i}{\min} || I - M S D_{\phi} ||_2^2$
Update the value of $\phi_j$
Repeat previous steps
</span>
</div>
<!--
IDEA : first determine the matrix M and then find the best values for phi
Initialize the intensities phi
For each iteration : fix all the intensities except phi_j
Determine the matrix M
delta -> small increment
Schema = importance of the initialization of intensities -> local/global minimum
-->
---
## Algorithm 2
<div class="h-100 flex items-center">
<div class="w-full">
Algorithm 1 $\rightarrow$ too long
$$I = M S D_{\phi} \iff M = I(S D_{\phi})^\dagger = I (S D_{\phi})^T [(S D_{\phi})(S D_{\phi})^T]^{-1}$$
Lambert law :
$$
\begin{align*}
I &= I (S D_{\phi})^T [(S D_{\phi})(S D_{\phi})^T]^{-1} S D_{\phi} \\
&= I D_{\phi} S^T S^{-T} D_{\phi}^{-2} S^{-1} S D_{\phi}
\end{align*}
$$
New residual :
$$\underset{\phi_i}{\min} || I - I D_{\phi} S^T S^{-T} D_{\phi}^{-2} S^{-1} S D_{\phi} ||_2^2$$
</div>
</div>
<!--
Too long = 2 for loops + M to estimate twice for each intensity and iteration
Write the Lambert law only as a function of D_phi
Non linear problem = can be solved directly using the Matlab function lsqnonlin()
Execution time = 10 seconds instead of 5 min (for 1000 iterations)
-->
---
## Generated images
<div class="grid grid-cols-4 col-auto h-110 m-auto">
<img src="/assets/im2.jpg" class="m-auto h-50"/>
<img src="/assets/im3.jpg" class="m-auto h-50"/>
<img src="/assets/im4.jpg" class="m-auto h-50"/>
<img src="/assets/im5.jpg" class="m-auto h-50"/>
<img src="/assets/im12.jpg" class="m-auto h-50"/>
<img src="/assets/im13.jpg" class="m-auto h-50"/>
<img src="/assets/im14.jpg" class="m-auto h-50"/>
<img src="/assets/im15.jpg" class="m-auto h-50"/>
</div>
<!--
Generated images of shiny half-spheres under 9 different lightings
4 different directions here + different intensities
500 by 500 pixels
As we know the intensities = verify that our algorithm works on generated data
-->
---
## Results (1/2)
<div class="h-100 flex items-center">
<img src="/assets/residu_4.jpg" class="m-auto w-full"/>
<img src="/assets/residu2d_3.jpg" class="m-auto w-full"/>
</div>
<!--
Plot the residual as a function of 1 or 2 intensities
Show the existence of a minimum for the real value of the intensity
-->
---
## Results (2/2)
<div class="h-100 flex items-center">
<img src="/assets/resultats_finaux.jpg" class="m-auto h-110"/>
</div>
<!--
Fix intensity 1 and determine the 8 other intensities
Blue = real / Orange = found => convincing
-->
---
## Real images
<div class="h-full flex items-center">
<img src="/assets/raph/real_images.svg" class="m-auto h-full"/>
</div>
<!--
Real data : 1 tapestry of a comet + sculpture
12 different lightings for 1st / 17 for 2nd
Big size of the images = need to crop -> selection of the region in red (1000 by 1000 pixels)
No knowledge on the intensities = to be determined
-->
---
## Results
<div class="h-100 flex items-center">
<img src="/assets/comete.svg" class="m-auto w-full"/>
<img src="/assets/stsernin.svg" class="m-auto w-full"/>
</div>
<!--
Values are coherent with the lighting condictions we can observe on the images
-->
---
## Perspectives
3D reconstruction
<img src="/assets/3d_estimation.svg" class="m-auto w-full"/>
<!--
Main use = Photometric Stereo
TP => script to compute the surface normals of the object
Results are not very satisfactory (pringles != half-sphere) maybe a problem of coordinate system ?
To be improved in a future work...-->
---
# Lighting direction estimation
<div class="h-100 flex items-center text-2xl">
- Estimation of lighting vector
- Neural Network
- Real data
- Generated data
- Results
- Perspectives
</div>
---
## Estimation of lighting vector
<style>
.mermaid {
margin: auto;
width: 75%;
}
</style>
<div class="h-100 flex items-center">
```mermaid
flowchart LR
id1[Bounding box of the sphere]
id2[Deduce the normals]
id3[Resolution of I = s * n]
id1 --> id2
id2 --> id3
```
</div>
---
## Neural Network
<div class="h-100 flex items-center">
<img src="/assets/raph/neural_network.svg" class="m-auto w-full"/>
</div>
---
## ResNet-50
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/resnet_architecture.png" class="m-auto h-full"/>
---
## Real data : creation of mask
<img src="/assets/raph/mask_crea.svg" class="m-auto h-full"/>
---
## Generated data with blender
<style>
table, td, th, tr {
  border: none !important;
border-collapse: collapse !important;
border-style: none !important;
background-color: unset !important;
overflow: hidden;
margin: auto;
text-align: center;
}
</style>
<table>
<tr>
<td>Simulated matte spheres</td>
<td><img src="/assets/raph/matte_ball_3.png" class="m-auto h-50"></td>
<td><img src="/assets/raph/matte_ball.png" class="m-auto h-50"></td>
</tr>
<tr>
<td>Generated data with different lightings</td>
<td><img src="/assets/raph/auto_82.png" class="m-auto h-50"></td>
<td><img src="/assets/raph/auto_91.png" class="m-auto h-50"></td>
</tr>
</table>
---
## Results
<img src="/assets/raph/results.png" class="m-auto h-full"/>
---
## Perspectives
<div class="h-100 flex items-center">
<span>
- Create more data to prevent overfitting
- Diversify the types of data lighting (more than 8 directions)
- Transform the model into something more general: \
$\rightarrow$ from {image of sphere, vector lighting} to {image of objects, vector lighting}
</span>
</div>