more stuff

This commit is contained in:
Laureηt 2023-01-29 20:38:40 +01:00
parent f3cda0b3d6
commit b8940f367d
Signed by: Laurent
SSH key fingerprint: SHA256:kZEpW8cMJ54PDeCvOhzreNr4FSh6R13CMGH/POoO8DI
4 changed files with 42 additions and 7 deletions

View file

@ -1,4 +1,6 @@
{
"explorer.excludeGitIgnore": true,
"latex-workshop.latex.recipe.default": "latexmk (lualatex)"
"latex-workshop.latex.recipe.default": "latexmk (lualatex)",
"gitlens.codeLens.authors.enabled": false,
"gitlens.codeLens.recentChange.enabled": false,
}

BIN
assets/tasks.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.3 MiB

Binary file not shown.

View file

@ -70,6 +70,9 @@ Previous work by Laurent Fainsin et al. in~\cite{spheredetect} attempted to addr
The automatic detection (or segmentation) of spheres in scenes is a rather niche task and as a result there exists no known direct method to solve this problem.
Parler des trucs qui n'ont rien à voir ici mais qui donne de l'espoir,
truc de jade et truc de PE.
\section{Datasets}
In~\cite{spheredetect}, it is explained that obtaining clean photographs with spherical markers for use in 3D reconstruction techniques is unsurprisingly rare. To address this issue, the authors of the paper created a dataset for training their model using custom python and blender scripts. This involved compositing known spherical markers (real or synthetic) onto background images from the COCO dataset~\cite{COCO}. The resulting dataset can be seen in Figure~\ref{fig:spheredetect_dataset}.
@ -84,7 +87,7 @@ In~\cite{spheredetect}, it is explained that obtaining clean photographs with sp
\label{fig:spheredetect_dataset}
\end{figure}
Additionally, synthetic images of chrome spheres can also be generated using free (CC0 1.0 Universal Public Domain Dedication) environment maps from~\cite{haven_hdris_nodate}. These environment maps provide a wide range of realistic lighting conditions and can be used to simulate different lighting scenarios, such as different times of day, weather conditions, or indoor lighting setups. This can help to further increase the diversity of the dataset and make the model more robust to different lighting conditions, which is crucial for the task of detecting chrome sphere markers.
Additionally, synthetic images of chrome spheres can also be generated using free (CC0 1.0 Universal Public Domain Dedication) environment maps from PolyHaven~\cite{haven_hdris_nodate}. These environment maps provide a wide range of realistic lighting conditions and can be used to simulate different lighting scenarios, such as different times of day, weather conditions, or indoor lighting setups. This can help to further increase the diversity of the dataset and make the model more robust to different lighting conditions, which is crucial for the task of detecting chrome sphere markers.
\subsection{Antoine Laurent}
@ -121,7 +124,7 @@ The authors propose a deep learning-based model called DeepLight, which takes an
\subsection{Multi-Illumination Images in the Wild}
In the paper "A Dataset of Multi-Illumination Images in the Wild"~\cite{murmann_dataset_2019}, the authors present a dataset containing over 1000 real-world scenes and their corresponding panoptic segmentation, captured under 25 different lighting conditions. This dataset can be used as a valuable resource for various computer vision tasks such as relighting, image recognition, object detection and image segmentation. The dataset, which is composed of a wide variety of lighting conditions, can be useful in training models to detect chrome spheres in images, as it allows the model to be robust to different scenarios, improving its performance in real-world applications.
In the paper "A Dataset of Multi-Illumination Images in the Wild"~\cite{murmann_dataset_2019}, the authors present a dataset containing over 1000 real-world scenes and their corresponding panoptic segmentation, captured under 25 different lighting conditions. This dataset can be used as a valuable resource for various computer vision tasks such as relighting, image recognition, object detection and image segmentation. The dataset, which is composed of a wide variety of lighting conditions, can be useful in training models to detect chrome spheres in images, as it would allow the model to be robust to different scenarios, improving its performance in real-world applications.
\begin{figure}[ht]
\centering
@ -141,6 +144,17 @@ The output of such annotators can be integrated with HuggingFace Datasets~\cite{
\section{Models}
Computer vision encompasses a range of tasks, including classification, classification with localization, object detection, semantic segmentation, instance segmentation, and panoptic segmentation, as illustrated in Figure~\ref{fig:tasks}.
Each of these tasks involves different objectives and challenges, and advances in these areas have greatly improved the ability of computers to understand and interpret visual information. For example, classification tasks aim to identify the class of an object in an image, while object detection tasks seek to locate and classify multiple objects within an image. Semantic segmentation and instance segmentation focus on understanding the relationships between objects and their parts, and panoptic segmentation seeks to merge these tasks into a single comprehensive solution. We will examine a variety of models for our computer vision problem.
\begin{figure}[ht]
\centering
\includegraphics[height=0.35\linewidth]{tasks.png}
\caption{The different types of tasks in Computer Vision.}
\label{fig:tasks}
\end{figure}
\subsection{Mask R-CNN}
In~\cite{spheredetect}, the authors use Mask R-CNN~\cite{MaskRCNN} as a base model for their task. Mask R-CNN is a neural network that is able to perform instance segmentation, which is the task of detecting and segmenting objects in an image.
@ -156,7 +170,7 @@ The network is composed of two parts: a backbone network and a region proposal n
The network is trained using a loss function that is composed of three terms: the classification loss, the bounding box regression loss, and the mask loss. The classification loss is used to train the network to classify each region proposal as either a sphere or not a sphere. The bounding box regression loss is used to train the network to regress the bounding box of each region proposal. The mask loss is used to train the network to generate a mask for each region proposal. The original network was trained using the COCO dataset~\cite{COCO}.
While the authors of the paper~\cite{spheredetect} obtain good results from this network on matte spheres, their performance drop when shiny spheres are introduced. This could be explained by the fact that convolutional neural network tend to extract local features from images. Indeed, you can only really indentify a chrome sphere if you can observe the "interior and exterior" of the sphere, delimited by a "distortion" effect.
The authors of the paper~\cite{spheredetect} achieved favorable results using the network on matte spheres, however, its performance declined when shiny spheres were introduced. This can be attributed to the fact that convolutional neural networks typically extract local features from images. Observing both the interior and exterior of a chrome sphere, as defined by a "distortion" effect, is necessary to accurately identify it.
\subsection{Ellipse R-CNN}
@ -192,7 +206,7 @@ DETR (DEtection TRansformer)~\cite{carion_end--end_2020} is a new method propose
DETR uses a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture, as seen in Figure~\ref{fig:detr}. Given a fixed small set of learned object queries, the model reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. This makes the model conceptually simple and does not require a specialized library, unlike many other modern detectors.
DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner and it significantly outperforms competitive baselines. The training code and pretrained models are available at the project's website.
DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner and it significantly outperforms competitive baselines.
\begin{figure}[ht]
\centering
@ -212,6 +226,8 @@ DINO (DETR with Improved deNoising anchOr boxes)~\cite{zhang_dino_2022} is a sta
\subsection{Mask2Former}
Mask2Former~\cite{cheng_masked-attention_2022} is a novel method for instance segmentation. It represents a transformer-based approach that leverages the strengths of the Transformer architecture to perform instance segmentation in a direct and simple manner. The main idea behind Mask2Former is to treat instance segmentation as a direct prediction problem, where the goal is to predict a set of instance masks directly from an input image. Unlike traditional instance segmentation methods that require multiple stages and hand-designed components, such as anchor generation, Non-maximum suppression, or post-processing steps, Mask2Former streamlines the instance segmentation pipeline.
\begin{figure}[ht]
\centering
\includegraphics[height=0.4\linewidth]{Mask2Former.pdf}
@ -219,17 +235,34 @@ DINO (DETR with Improved deNoising anchOr boxes)~\cite{zhang_dino_2022} is a sta
\label{fig:mask2former}
\end{figure}
Mask2Former uses a set-based loss function and a transformer encoder-decoder architecture to perform instance segmentation. Given a fixed set of instance queries, Mask2Former uses its encoder to extract features from the input image and the decoder to directly output the final set of instance masks. The set-based loss function enforces unique predictions and ensures that the output masks are well-formed and accurate. The use of the transformer architecture in Mask2Former enables it to effectively model the relations between the instances and the image context, leading to improved instance segmentation performance.
Overall, Mask2Former offers a simple and effective approach to instance segmentation that can achieve state-of-the-art performance on standard instance segmentation benchmarks. Its direct and efficient pipeline makes it well-suited for real-world applications, and its ability to leverage the strengths of the Transformer architecture makes it an attractive choice for researchers and practitioners alike.
\section{Training}
For the training process, we plan to utilize PyTorch Lightning, a high-level library for PyTorch, and the HuggingFace Transformers library for our transformer model. The optimizer we plan to use is AdamW, a variation of the Adam optimizer that is well-suited for training deep learning models. We aim to ensure reproducibility by using Nix for our setup. The development environment will be in Visual Studio Code and we will use Poetry for managing Python dependencies. This combination of tools is expected to streamline the training process and ensure reliable results.
\subsection{Loss functions}
\subsection{Metrics}
pytorch metrics
dice
IoU
\subsection{Experiment tracking}
To keep track of our experiments and their results, we will utilize Weights \& Biases (W\&B) and Aim. W\&B is a popular experiment tracking tool that provides a simple interface for logging and visualizing metrics, models, and artifacts. Aim is a collaborative machine learning platform that provides a unified way to track, compare, and explain experiments across teams and tools. By utilizing these tools, we aim to efficiently track our experiments and compare results. This will allow us to make data-driven decisions and achieve better results if we have enough time.
\section{Deployment}
For deployment, we plan to use the ONNX format. This format provides a standard for interoperability between different AI frameworks and helps ensure compatibility with a wide range of deployment scenarios. To ensure the deployment process is seamless, we will carefully choose an architecture that is exportable, though most popular architectures are compatible with ONNX. Our model will be run in production using ONNXRuntime, a framework that allows for efficient inference using ONNX models. This combination of tools and formats will ensure that our model can be deployed quickly and easily in a variety of production environments such as AliceVision Meshroom.
\section{Conclusion}
From what we know it is now rather easy to elaborate a plan to try to solve our problem. ...
In conclusion, the detection of matte spheres has been explored and is possible, however, the automatic detection of chrome spheres has not been fully investigated. The initial step towards this goal would be to evaluate the capabilities of transformer-based architectures, such as DETR, in detecting chrome spheres. If successful, further improvements can include the prediction of bounding ellipses instead of just bounding boxes, exporting the model to the ONNX format, and implementation inside the Alicevision Meshroom software.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%