feat: second day

This commit is contained in:
Laureηt 2023-01-24 17:48:08 +01:00
parent 8f65a57b33
commit 41547c2133
Signed by: Laurent
SSH key fingerprint: SHA256:kZEpW8cMJ54PDeCvOhzreNr4FSh6R13CMGH/POoO8DI
10 changed files with 393 additions and 317 deletions

View file

@ -1,3 +1,4 @@
{
"explorer.excludeGitIgnore": true,
"explorer.excludeGitIgnore": true,
"latex-workshop.latex.recipe.default": "latexmk (lualatex)"
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 13 MiB

After

Width:  |  Height:  |  Size: 9.7 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.1 MiB

BIN
assets/deeplight.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

BIN
assets/dir_7_mip2.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 181 KiB

BIN
assets/materials_mip2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

BIN
src/paper.pdf Normal file

Binary file not shown.

View file

@ -1,11 +1,26 @@
\documentclass[a4paper, 11pt]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{graphicx}
\usepackage{amsfonts}
\usepackage{color}
\usepackage[pagebackref,breaklinks,colorlinks]{hyperref}
\documentclass[
11pt,
a4paper
]{article}
% Packages
\usepackage{fontspec}
\usepackage{libertinus-otf}
\usepackage[a4paper, hmargin=2cm, vmargin=3cm]{geometry}
\usepackage{graphicx}
\usepackage{microtype}
\usepackage{amsmath}
% pdfx loads both hyperref and xcolor internally
% \usepackage{hyperref}
% \usepackage{xcolor}
\usepackage[a-3u]{pdfx}
% We use \hypersetup to pass options to hyperref
\hypersetup{
colorlinks = true,
breaklinks = true,
}
\graphicspath{{../assets/}}
@ -13,10 +28,9 @@
\title{"Projet Long" Bibliography}
\author{Laurent Fainsin}
\date{\the\year-\ifnum\month<10\relax0\fi\the\month-\ifnum\day<10\relax0\fi\the\day}
\date{2023-01-24}
\maketitle
\newpage
{
\hypersetup{hidelinks}
@ -30,7 +44,7 @@
The field of 3D reconstruction techniques in photography, such as Reflectance Transformation Imaging (RTI)~\cite{giachetti2018} and Photometric Stereo~\cite{durou2020}, often require a precise understanding of the lighting conditions in the scene being captured. One common method for calibrating the lighting is to include one or more spheres in the scene, as shown in the left example of Figure~\ref{fig:intro}. However, manually outlining these spheres can be tedious and time-consuming, especially in the field of visual effects where the presence of chrome spheres is prevalent~\cite{jahirul_grey_2021}. This task can be made more efficient by using deep learning methods for detection. The goal of this project is to develop a neural network that can accurately detect both matte and shiny spheres in a scene.
\begin{figure}[h]
\begin{figure}[ht]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.35\linewidth]{matte.jpg} &
@ -44,7 +58,7 @@ The field of 3D reconstruction techniques in photography, such as Reflectance Tr
Previous work by Laurent Fainsin et al. in~\cite{spheredetect} attempted to address this problem by using a neural network called Mask R-CNN~\cite{MaskRCNN} for instance segmentation of spheres in images. However, this approach is limited in its ability to detect shiny spheres, as demonstrated in the right image of Figure~\ref{fig:previouswork}. The network was trained on images of matte spheres and was unable to generalize to shiny spheres, which highlights the need for further research in this area.
\begin{figure}[h]
\begin{figure}[ht]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.35\linewidth]{matte_inference.png} &
@ -54,39 +68,75 @@ Previous work by Laurent Fainsin et al. in~\cite{spheredetect} attempted to addr
\label{fig:previouswork}
\end{figure}
\section{Current state of the art}
The automatic detection (or segmentation) of spheres in scenes is a rather niche task and as a result there exists no known direct method to solve this problem.
\subsection{Datasets}
\section{Datasets}
In~\cite{spheredetect}, it is explained that obtaining clean photographs with spherical markers for use in 3D reconstruction techniques are unsurprisingly rare. To address this issue, the authors of the paper crafted a training custom dataset using python and blender scripts. This was done by compositing known spherical markers (real or synthetic) onto background images from the COCO dataset~\cite{COCO}. The result of such technique is visible in Figure~\ref{fig:spheredetectdataset}.
In~\cite{spheredetect}, it is explained that clean photographs with spherical markers for use in 3D reconstruction techniques are unsurprisingly rare. To address this issue, the authors of the paper crafted a training dataset using python and blender scripts. This was done by compositing known spherical markers (real or synthetic) onto background images from the COCO dataset~\cite{COCO}. The result of such technique is visible in Figure~\ref{fig:spheredetect_dataset}.
\begin{figure}[h]
\begin{figure}[ht]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.3\linewidth]{dataset1.jpg} &
\includegraphics[height=0.3\linewidth]{dataset2.jpg}
\end{tabular}
\caption{Example of the synthetic dataset used in~\cite{spheredetect}.}
\label{fig:spheredetectdataset}
\label{fig:spheredetect_dataset}
\end{figure}
During the research of this bibliography we found some additional datasets that we may be able to use.
\cite{legendre_deeplight_2019}
in the same way one you could generate synthetic images of chrome spheres using free (C0) env map from
\cite{haven_hdris_nodate}
\cite{murmann_dataset_2019}
\subsection{Models}
\subsection{Antoine Laurent}
\subsubsection{Mask R-CNN}
\begin{figure}[ht]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.3\linewidth]{antoine_laurent_1.jpg} &
\includegraphics[height=0.3\linewidth]{antoine_laurent_2.jpg}
\end{tabular}
\caption{Example of clean photographs with spehrical markers from Antoine Laurent.}
\label{fig:antoine_laurent_dataset}
\end{figure}
\subsection{DeepLight}
\begin{figure}[ht]
\centering
\includegraphics[height=0.3\linewidth]{deeplight.png}
\caption{Example the dataset from~\cite{legendre_deeplight_2019}.}
\label{fig:deeplight_dataset}
\end{figure}
\subsection{Multi-Illumination Images in the Wild}
\begin{figure}[ht]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.3\linewidth]{dir_7_mip2.jpg} &
\includegraphics[height=0.3\linewidth]{materials_mip2.png}
\end{tabular}
\caption{Example data from~\cite{murmann_dataset_2019}.}
\label{fig:murmann_dataset}
\end{figure}
\subsection{Labelling}
\cite{noauthor_label_nodate}
\subsection{Versionning}
\cite{noauthor_datasets_nodate}
\section{Models}
\subsection{Mask R-CNN}
In~\cite{spheredetect}, the authors use Mask R-CNN~\cite{MaskRCNN} as a base model for their task. Mask R-CNN is a neural network that is able to perform instance segmentation, which is the task of detecting and segmenting objects in an image.
The network is composed of two parts: a backbone network and a region proposal network (RPN). The backbone network is a convolutional neural network that is used to extract features from the input image. The RPN is a fully convolutional network that is used to generate region proposals, which are bounding boxes that are used to crop the input image. The RPN is then used to generate a mask for each region proposal, which is used to segment the object in the image.
\begin{figure}[h]
\begin{figure}[ht]
\centering
\includegraphics[width=0.6\linewidth]{MaskRCNN.png}
\caption{The Mask-RCNN~\cite{MaskRCNN} architecture.}
@ -97,31 +147,31 @@ The network is trained using a loss function that is composed of three terms: th
While the authors of the paper~\cite{spheredetect} obtain good results from this network on matte spheres, their performance drop when shiny spheres are introduced. This could be explained by the fact that convolutional neural network tend to extract local features from images. Indeed, you can only really indentify a chrome sphere if you can observe the "interior and exterior" of the sphere, delimited by a "distortion" effect.
\subsubsection{Ellipse R-CNN}
\subsection{Ellipse R-CNN}
To detect spheres in images, it is sufficient to estimate the center and radius of their projected circles. However, due to the perspective nature of photographs, the circles are often distorted and appear as ellipses.
The Ellipse R-CNN~\cite{dong_ellipse_2021} is a modified version of the Mask R-CNN~\cite{MaskRCNN} which can detect ellipses in images, it addresses this issue by using an additional branch in the network to predict the axes of the ellipse and its orientation, which allows for more accurate detection of objects and in our case spheres. It also have a feature of handling occlusion, by predicting the segmentation mask for each ellipse, it can handle overlapping and occluded objects. This makes it an ideal choice for detecting spheres in real-world images with complex backgrounds and variable lighting conditions.
\begin{figure}[h]
\begin{figure}[ht]
\centering
\includegraphics[width=0.6\linewidth]{EllipseRCNN.png}
\caption{The Ellipse R-CNN~\cite{dong_ellipse_2021} architecture.}
\label{fig:ellipsercnn}
\end{figure}
\subsubsection{GPN}
\subsection{GPN}
\begin{figure}[h]
\begin{figure}[ht]
\centering
\includegraphics[width=0.6\linewidth]{GPN.png}
\caption{The GPN~\cite{li_detecting_2019} architecture.}
\label{fig:gpn}
\end{figure}
\subsubsection{DETR}
\subsection{DETR}
\begin{figure}[h]
\begin{figure}[ht]
\centering
\includegraphics[width=0.8\linewidth]{DETR.png}
\caption{The DETR~\cite{carion_end--end_2020} architecture.}
@ -130,6 +180,16 @@ The Ellipse R-CNN~\cite{dong_ellipse_2021} is a modified version of the Mask R-C
+ \cite{zhang_dino_2022}
\subsection{Mask2Former}
\section{Training}
\subsection{Loss functions}
\subsection{Metrics}
\subsection{Experiment tracking}
\section{Conclusion}
From what we know it is now rather easy to elaborate a plan to try to solve our problem. ...
@ -137,6 +197,7 @@ From what we know it is now rather easy to elaborate a plan to try to solve our
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newpage
\bibliography{zotero,qcav}
\bibliographystyle{plain}

14
src/paper.xmpdata Normal file
View file

@ -0,0 +1,14 @@
\Author{Laurent Fainsin}
\Title{
"Projet Long" Bibliography
}
\Language{English}
\Keywords{}
\Publisher{Self-Published}
\Subject{
Bibliography
}
\Date{2023-01-24}
\PublicationType{Bibliography}
\Source{}
\URLlink{}

View file

@ -1,381 +1,381 @@
@misc{van_strien_training_2022,
title = {Training an object detection model using {Hugging} {Face}},
url = {https://danielvanstrien.xyz/huggingface/huggingface-datasets/transformers/2022/08/16/detr-object-detection.html},
abstract = {training a Detr object detection model using Hugging Face transformers and datasets},
language = {en},
urldate = {2023-01-17},
journal = {Daniel van Strien},
author = {Van Strien, Daniel},
month = aug,
year = {2022},
file = {Snapshot:/home/laurent/Zotero/storage/DXQJISMX/detr-object-detection.html:text/html}
title = {Training an object detection model using {Hugging} {Face}},
url = {https://danielvanstrien.xyz/huggingface/huggingface-datasets/transformers/2022/08/16/detr-object-detection.html},
abstract = {training a Detr object detection model using Hugging Face transformers and datasets},
language = {en},
urldate = {2023-01-17},
journal = {Daniel van Strien},
author = {Van Strien, Daniel},
month = aug,
year = {2022},
file = {Snapshot:/home/laurent/Zotero/storage/DXQJISMX/detr-object-detection.html:text/html},
}
@article{dror_recognition_nodate,
title = {Recognition of {Surface} {Reflectance} {Properties} from a {Single} {Image} under {Unknown} {Real}-{World} {Illumination}},
abstract = {This paper describes a machine vision system that classifies reflectance properties of surfaces such as metal, plastic, or paper, under unknown real-world illumination. We demonstrate performance of our algorithm for surfaces of arbitrary geometry. Reflectance estimation under arbitrary omnidirectional illumination proves highly underconstrained. Our reflectance estimation algorithm succeeds by learning relationships between surface reflectance and certain statistics computed from an observed image, which depend on statistical regularities in the spatial structure of real-world illumination. Although the algorithm assumes known geometry, its statistical nature makes it robust to inaccurate geometry estimates.},
language = {en},
author = {Dror, Ron O and Adelson, Edward H and Willsky, Alan S},
file = {Dror et al. - Recognition of Surface Reflectance Properties from .pdf:/home/laurent/Zotero/storage/HJXFDDT6/Dror et al. - Recognition of Surface Reflectance Properties from .pdf:application/pdf}
title = {Recognition of {Surface} {Reflectance} {Properties} from a {Single} {Image} under {Unknown} {Real}-{World} {Illumination}},
abstract = {This paper describes a machine vision system that classifies reflectance properties of surfaces such as metal, plastic, or paper, under unknown real-world illumination. We demonstrate performance of our algorithm for surfaces of arbitrary geometry. Reflectance estimation under arbitrary omnidirectional illumination proves highly underconstrained. Our reflectance estimation algorithm succeeds by learning relationships between surface reflectance and certain statistics computed from an observed image, which depend on statistical regularities in the spatial structure of real-world illumination. Although the algorithm assumes known geometry, its statistical nature makes it robust to inaccurate geometry estimates.},
language = {en},
author = {Dror, Ron O and Adelson, Edward H and Willsky, Alan S},
file = {Dror et al. - Recognition of Surface Reflectance Properties from .pdf:/home/laurent/Zotero/storage/HJXFDDT6/Dror et al. - Recognition of Surface Reflectance Properties from .pdf:application/pdf},
}
@article{legendre_deeplight_2019,
title = {{DeepLight}: {Learning} {Illumination} for {Unconstrained} {Mobile} {Mixed} {Reality}},
abstract = {We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV). For training data, we collect videos of various reflective spheres placed within the cameras FOV, leaving most of the background unoccluded, leveraging that materials with diverse reflectance functions reveal different lighting cues in a single exposure. We train a deep neural network to regress from the LDR background image to HDR lighting by matching the LDR ground truth sphere images to those rendered with the predicted illumination using imagebased relighting, which is differentiable. Our inference runs at interactive frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed reality. Training on automatically exposed and white-balanced videos, we improve the realism of rendered objects compared to the state-of-the art methods for both indoor and outdoor scenes.},
language = {en},
author = {LeGendre, Chloe and Ma, Wan-Chun and Fyffe, Graham and Flynn, John and Charbonnel, Laurent and Busch, Jay and Debevec, Paul},
year = {2019},
file = {LeGendre et al. - DeepLight Learning Illumination for Unconstrained.pdf:/home/laurent/Zotero/storage/7FGL25G5/LeGendre et al. - DeepLight Learning Illumination for Unconstrained.pdf:application/pdf}
title = {{DeepLight}: {Learning} {Illumination} for {Unconstrained} {Mobile} {Mixed} {Reality}},
abstract = {We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV). For training data, we collect videos of various reflective spheres placed within the cameras FOV, leaving most of the background unoccluded, leveraging that materials with diverse reflectance functions reveal different lighting cues in a single exposure. We train a deep neural network to regress from the LDR background image to HDR lighting by matching the LDR ground truth sphere images to those rendered with the predicted illumination using imagebased relighting, which is differentiable. Our inference runs at interactive frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed reality. Training on automatically exposed and white-balanced videos, we improve the realism of rendered objects compared to the state-of-the art methods for both indoor and outdoor scenes.},
language = {en},
author = {LeGendre, Chloe and Ma, Wan-Chun and Fyffe, Graham and Flynn, John and Charbonnel, Laurent and Busch, Jay and Debevec, Paul},
year = {2019},
file = {LeGendre et al. - DeepLight Learning Illumination for Unconstrained.pdf:/home/laurent/Zotero/storage/7FGL25G5/LeGendre et al. - DeepLight Learning Illumination for Unconstrained.pdf:application/pdf},
}
@misc{tazi_fine-tuning_nodate,
title = {Fine-tuning {DETR} for license plates detection},
url = {https://kaggle.com/code/nouamane/fine-tuning-detr-for-license-plates-detection},
abstract = {Explore and run machine learning code with Kaggle Notebooks {\textbar} Using data from multiple data sources},
language = {en},
urldate = {2023-01-17},
author = {Tazi, Nouamane},
file = {Snapshot:/home/laurent/Zotero/storage/WHFVB3QC/fine-tuning-detr-for-license-plates-detection.html:text/html}
title = {Fine-tuning {DETR} for license plates detection},
url = {https://kaggle.com/code/nouamane/fine-tuning-detr-for-license-plates-detection},
abstract = {Explore and run machine learning code with Kaggle Notebooks {\textbar} Using data from multiple data sources},
language = {en},
urldate = {2023-01-17},
author = {Tazi, Nouamane},
file = {Snapshot:/home/laurent/Zotero/storage/WHFVB3QC/fine-tuning-detr-for-license-plates-detection.html:text/html},
}
@inproceedings{murmann_dataset_2019,
address = {Seoul, Korea (South)},
title = {A {Dataset} of {Multi}-{Illumination} {Images} in the {Wild}},
isbn = {978-1-72814-803-8},
url = {https://ieeexplore.ieee.org/document/9008252/},
doi = {10.1109/ICCV.2019.00418},
abstract = {Collections of images under a single, uncontrolled illumination [42] have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation [26, 43, 18]. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. The data simply does not contain the necessary supervisory signals. Multiillumination datasets are notoriously hard to capture, so the data is typically collected at small scale, in controlled environments, either using multiple light sources [10, 53], or robotic gantries [8, 20]. This leads to image collections that are not representative of the variety and complexity of real-world scenes. We introduce a new multi-illumination dataset of more than 1000 real scenes, each captured in high dynamic range and high resolution, under 25 lighting conditions. We demonstrate the richness of this dataset by training state-of-the-art models for three challenging applications: single-image illumination estimation, image relighting, and mixed-illuminant white balance.},
language = {en},
urldate = {2023-01-17},
booktitle = {2019 {IEEE}/{CVF} {International} {Conference} on {Computer} {Vision} ({ICCV})},
publisher = {IEEE},
author = {Murmann, Lukas and Gharbi, Michael and Aittala, Miika and Durand, Fredo},
month = oct,
year = {2019},
pages = {4079--4088},
file = {Murmann et al. - 2019 - A Dataset of Multi-Illumination Images in the Wild.pdf:/home/laurent/Zotero/storage/KH9HA9SQ/Murmann et al. - 2019 - A Dataset of Multi-Illumination Images in the Wild.pdf:application/pdf}
address = {Seoul, Korea (South)},
title = {A {Dataset} of {Multi}-{Illumination} {Images} in the {Wild}},
isbn = {978-1-72814-803-8},
url = {https://ieeexplore.ieee.org/document/9008252/},
doi = {10.1109/ICCV.2019.00418},
abstract = {Collections of images under a single, uncontrolled illumination [42] have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation [26, 43, 18]. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. The data simply does not contain the necessary supervisory signals. Multiillumination datasets are notoriously hard to capture, so the data is typically collected at small scale, in controlled environments, either using multiple light sources [10, 53], or robotic gantries [8, 20]. This leads to image collections that are not representative of the variety and complexity of real-world scenes. We introduce a new multi-illumination dataset of more than 1000 real scenes, each captured in high dynamic range and high resolution, under 25 lighting conditions. We demonstrate the richness of this dataset by training state-of-the-art models for three challenging applications: single-image illumination estimation, image relighting, and mixed-illuminant white balance.},
language = {en},
urldate = {2023-01-17},
booktitle = {2019 {IEEE}/{CVF} {International} {Conference} on {Computer} {Vision} ({ICCV})},
publisher = {IEEE},
author = {Murmann, Lukas and Gharbi, Michael and Aittala, Miika and Durand, Fredo},
month = oct,
year = {2019},
pages = {4079--4088},
file = {Murmann et al. - 2019 - A Dataset of Multi-Illumination Images in the Wild.pdf:/home/laurent/Zotero/storage/KH9HA9SQ/Murmann et al. - 2019 - A Dataset of Multi-Illumination Images in the Wild.pdf:application/pdf},
}
@misc{arora_annotated_2021,
title = {The {Annotated} {DETR}},
url = {https://amaarora.github.io/2021/07/26/annotateddetr.html},
abstract = {This is a place where I write freely and try to uncomplicate the complicated for myself and everyone else through Python code.},
language = {en},
urldate = {2023-01-17},
journal = {Committed towards better future},
author = {Arora, Aman},
month = jul,
year = {2021},
file = {Snapshot:/home/laurent/Zotero/storage/G78PSBHE/annotateddetr.html:text/html}
title = {The {Annotated} {DETR}},
url = {https://amaarora.github.io/2021/07/26/annotateddetr.html},
abstract = {This is a place where I write freely and try to uncomplicate the complicated for myself and everyone else through Python code.},
language = {en},
urldate = {2023-01-17},
journal = {Committed towards better future},
author = {Arora, Aman},
month = jul,
year = {2021},
file = {Snapshot:/home/laurent/Zotero/storage/G78PSBHE/annotateddetr.html:text/html},
}
@misc{carion_end--end_2020,
title = {End-to-{End} {Object} {Detection} with {Transformers}},
url = {http://arxiv.org/abs/2005.12872},
doi = {10.48550/arXiv.2005.12872},
abstract = {We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines. Training code and pretrained models are available at https://github.com/facebookresearch/detr.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey},
month = may,
year = {2020},
note = {arXiv:2005.12872 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/KBRPD4CU/Carion et al. - 2020 - End-to-End Object Detection with Transformers.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/6445LQV5/2005.html:text/html}
title = {End-to-{End} {Object} {Detection} with {Transformers}},
url = {http://arxiv.org/abs/2005.12872},
doi = {10.48550/arXiv.2005.12872},
abstract = {We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines. Training code and pretrained models are available at https://github.com/facebookresearch/detr.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey},
month = may,
year = {2020},
note = {arXiv:2005.12872 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/KBRPD4CU/Carion et al. - 2020 - End-to-End Object Detection with Transformers.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/6445LQV5/2005.html:text/html},
}
@misc{li_detecting_2019,
title = {Detecting {Lesion} {Bounding} {Ellipses} {With} {Gaussian} {Proposal} {Networks}},
url = {http://arxiv.org/abs/1902.09658},
doi = {10.48550/arXiv.1902.09658},
abstract = {Lesions characterized by computed tomography (CT) scans, are arguably often elliptical objects. However, current lesion detection systems are predominantly adopted from the popular Region Proposal Networks (RPNs) that only propose bounding boxes without fully leveraging the elliptical geometry of lesions. In this paper, we present Gaussian Proposal Networks (GPNs), a novel extension to RPNs, to detect lesion bounding ellipses. Instead of directly regressing the rotation angle of the ellipse as the common practice, GPN represents bounding ellipses as 2D Gaussian distributions on the image plain and minimizes the Kullback-Leibler (KL) divergence between the proposed Gaussian and the ground truth Gaussian for object localization. We show the KL divergence loss approximately incarnates the regression loss in the RPN framework when the rotation angle is 0. Experiments on the DeepLesion dataset show that GPN significantly outperforms RPN for lesion bounding ellipse detection thanks to lower localization error. GPN is open sourced at https://github.com/baidu-research/GPN},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Li, Yi},
month = feb,
year = {2019},
note = {arXiv:1902.09658 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/IB8AWGHV/Li - 2019 - Detecting Lesion Bounding Ellipses With Gaussian P.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/ZGKBBB98/1902.html:text/html}
title = {Detecting {Lesion} {Bounding} {Ellipses} {With} {Gaussian} {Proposal} {Networks}},
url = {http://arxiv.org/abs/1902.09658},
doi = {10.48550/arXiv.1902.09658},
abstract = {Lesions characterized by computed tomography (CT) scans, are arguably often elliptical objects. However, current lesion detection systems are predominantly adopted from the popular Region Proposal Networks (RPNs) that only propose bounding boxes without fully leveraging the elliptical geometry of lesions. In this paper, we present Gaussian Proposal Networks (GPNs), a novel extension to RPNs, to detect lesion bounding ellipses. Instead of directly regressing the rotation angle of the ellipse as the common practice, GPN represents bounding ellipses as 2D Gaussian distributions on the image plain and minimizes the Kullback-Leibler (KL) divergence between the proposed Gaussian and the ground truth Gaussian for object localization. We show the KL divergence loss approximately incarnates the regression loss in the RPN framework when the rotation angle is 0. Experiments on the DeepLesion dataset show that GPN significantly outperforms RPN for lesion bounding ellipse detection thanks to lower localization error. GPN is open sourced at https://github.com/baidu-research/GPN},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Li, Yi},
month = feb,
year = {2019},
note = {arXiv:1902.09658 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/IB8AWGHV/Li - 2019 - Detecting Lesion Bounding Ellipses With Gaussian P.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/ZGKBBB98/1902.html:text/html},
}
@misc{noauthor_detr_nodate,
title = {{DETR}},
url = {https://huggingface.co/docs/transformers/model_doc/detr},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/2AQYDSL3/detr.html:text/html}
title = {{DETR}},
url = {https://huggingface.co/docs/transformers/model_doc/detr},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/2AQYDSL3/detr.html:text/html},
}
@misc{noauthor_opencv_nodate,
title = {{OpenCV}: {Camera} {Calibration}},
url = {https://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html},
urldate = {2023-01-17},
file = {OpenCV\: Camera Calibration:/home/laurent/Zotero/storage/7C3DT2WU/tutorial_py_calibration.html:text/html}
title = {{OpenCV}: {Camera} {Calibration}},
url = {https://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html},
urldate = {2023-01-17},
file = {OpenCV\: Camera Calibration:/home/laurent/Zotero/storage/7C3DT2WU/tutorial_py_calibration.html:text/html},
}
@misc{jahirul_grey_2021,
title = {The {Grey}, the {Chrome} and the {Macbeth} {Chart} {CAVE} {Academy}},
url = {https://caveacademy.com/wiki/onset-production/data-acquisition/data-acquisition-training/the-grey-the-chrome-and-the-macbeth-chart/},
language = {en-US},
urldate = {2023-01-17},
author = {Jahirul, Amin},
month = jul,
year = {2021},
file = {Snapshot:/home/laurent/Zotero/storage/TM2TJKMH/the-grey-the-chrome-and-the-macbeth-chart.html:text/html}
title = {The {Grey}, the {Chrome} and the {Macbeth} {Chart} {CAVE} {Academy}},
url = {https://caveacademy.com/wiki/onset-production/data-acquisition/data-acquisition-training/the-grey-the-chrome-and-the-macbeth-chart/},
language = {en-US},
urldate = {2023-01-17},
author = {Jahirul, Amin},
month = jul,
year = {2021},
file = {Snapshot:/home/laurent/Zotero/storage/TM2TJKMH/the-grey-the-chrome-and-the-macbeth-chart.html:text/html},
}
@misc{doppenberg_lunar_2022,
title = {Lunar {Orbit} {Navigation} {Using} {Ellipse} {R}-{CNN} and {Crater} {Pattern} {Matching}},
copyright = {MIT},
url = {https://github.com/wdoppenberg/crater-detection},
abstract = {Autonomous Lunar Orbit Navigation Using Ellipse R-CNN and Crater Pattern Matching},
urldate = {2023-01-17},
author = {Doppenberg, Wouter},
month = aug,
year = {2022},
note = {original-date: 2020-10-19T16:32:29Z},
keywords = {crater-detection, ellipse-rcnn, faster-rcnn, space-engineering}
title = {Lunar {Orbit} {Navigation} {Using} {Ellipse} {R}-{CNN} and {Crater} {Pattern} {Matching}},
copyright = {MIT},
url = {https://github.com/wdoppenberg/crater-detection},
abstract = {Autonomous Lunar Orbit Navigation Using Ellipse R-CNN and Crater Pattern Matching},
urldate = {2023-01-17},
author = {Doppenberg, Wouter},
month = aug,
year = {2022},
note = {original-date: 2020-10-19T16:32:29Z},
keywords = {crater-detection, ellipse-rcnn, faster-rcnn, space-engineering},
}
@misc{doppenberg_ellipse_2022,
title = {Ellipse {R}-{CNN}},
copyright = {MIT},
url = {https://github.com/wdoppenberg/ellipse-rcnn},
abstract = {A PyTorch implementation of Ellipse R-CNN},
urldate = {2023-01-17},
author = {Doppenberg, Wouter},
month = dec,
year = {2022},
note = {original-date: 2021-06-25T09:21:44Z},
keywords = {ellipse-rcnn, deep-learning, pytorch, pytorch-lightning, region-based}
title = {Ellipse {R}-{CNN}},
copyright = {MIT},
url = {https://github.com/wdoppenberg/ellipse-rcnn},
abstract = {A PyTorch implementation of Ellipse R-CNN},
urldate = {2023-01-17},
author = {Doppenberg, Wouter},
month = dec,
year = {2022},
note = {original-date: 2021-06-25T09:21:44Z},
keywords = {ellipse-rcnn, deep-learning, pytorch, pytorch-lightning, region-based},
}
@misc{wok_finetune_2022,
title = {Finetune {DETR}},
copyright = {MIT},
url = {https://github.com/woctezuma/finetune-detr},
abstract = {Fine-tune Facebook's DETR (DEtection TRansformer) on Colaboratory.},
urldate = {2023-01-17},
author = {Wok},
month = dec,
year = {2022},
note = {original-date: 2020-08-03T17:17:35Z},
keywords = {balloon, balloons, colab, colab-notebook, colaboratory, detr, facebook, finetune, finetunes, finetuning, google-colab, google-colab-notebook, google-colaboratory, instance, instance-segmentation, instances, segementation, segment}
title = {Finetune {DETR}},
copyright = {MIT},
url = {https://github.com/woctezuma/finetune-detr},
abstract = {Fine-tune Facebook's DETR (DEtection TRansformer) on Colaboratory.},
urldate = {2023-01-17},
author = {Wok},
month = dec,
year = {2022},
note = {original-date: 2020-08-03T17:17:35Z},
keywords = {balloon, balloons, colab, colab-notebook, colaboratory, detr, facebook, finetune, finetunes, finetuning, google-colab, google-colab-notebook, google-colaboratory, instance, instance-segmentation, instances, segementation, segment},
}
@misc{noauthor_datasets_nodate,
title = {Datasets},
url = {https://huggingface.co/docs/datasets/index},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/RYXSCZR7/index.html:text/html}
title = {Datasets},
url = {https://huggingface.co/docs/datasets/index},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/RYXSCZR7/index.html:text/html},
}
@misc{rogge_transformers_2020,
title = {Transformers {Tutorials}"},
copyright = {MIT},
url = {https://github.com/NielsRogge/Transformers-Tutorials},
abstract = {This repository contains demos I made with the Transformers library by HuggingFace.},
urldate = {2023-01-17},
author = {Rogge, Niels},
month = sep,
year = {2020},
doi = {10.5281/zenodo.1234}
title = {Transformers {Tutorials}"},
copyright = {MIT},
url = {https://github.com/NielsRogge/Transformers-Tutorials},
abstract = {This repository contains demos I made with the Transformers library by HuggingFace.},
urldate = {2023-01-17},
author = {Rogge, Niels},
month = sep,
year = {2020},
doi = {10.5281/zenodo.1234},
}
@misc{noauthor_recommendations_2020,
title = {Recommendations for training {Detr} on custom dataset? · {Issue} \#9 · facebookresearch/detr},
shorttitle = {Recommendations for training {Detr} on custom dataset?},
url = {https://github.com/facebookresearch/detr/issues/9},
abstract = {Very impressed with the all new innovative architecture in Detr! Can you clarify recommendations for training on a custom dataset? Should we build a model similar to demo and train, or better to us...},
language = {en},
urldate = {2023-01-17},
journal = {GitHub},
month = may,
year = {2020},
file = {Snapshot:/home/laurent/Zotero/storage/G2S6584X/9.html:text/html}
title = {Recommendations for training {Detr} on custom dataset? · {Issue} \#9 · facebookresearch/detr},
shorttitle = {Recommendations for training {Detr} on custom dataset?},
url = {https://github.com/facebookresearch/detr/issues/9},
abstract = {Very impressed with the all new innovative architecture in Detr! Can you clarify recommendations for training on a custom dataset? Should we build a model similar to demo and train, or better to us...},
language = {en},
urldate = {2023-01-17},
journal = {GitHub},
month = may,
year = {2020},
file = {Snapshot:/home/laurent/Zotero/storage/G2S6584X/9.html:text/html},
}
@misc{noauthor_auto_nodate,
title = {Auto {Classes}},
url = {https://huggingface.co/docs/transformers/model_doc/auto},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17}
title = {Auto {Classes}},
url = {https://huggingface.co/docs/transformers/model_doc/auto},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
}
@misc{noauthor_swin_nodate,
title = {Swin {Transformer}},
url = {https://huggingface.co/docs/transformers/v4.24.0/en/model_doc/swin},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/K2NDEY49/swin.html:text/html}
title = {Swin {Transformer}},
url = {https://huggingface.co/docs/transformers/v4.24.0/en/model_doc/swin},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/K2NDEY49/swin.html:text/html},
}
@misc{rajesh_pytorch_2022,
title = {{PyTorch} {Implementations} of various state of the art architectures.},
url = {https://github.com/04RR/SOTA-Vision},
abstract = {Implementation of various state of the art architectures used in computer vision.},
urldate = {2023-01-17},
author = {Rajesh, Rohit},
month = sep,
year = {2022},
note = {original-date: 2021-05-02T03:32:10Z},
keywords = {deep-learning, pytorch, deep-learning-algorithms, pytorch-implementation, transformer-architecture}
title = {{PyTorch} {Implementations} of various state of the art architectures.},
url = {https://github.com/04RR/SOTA-Vision},
abstract = {Implementation of various state of the art architectures used in computer vision.},
urldate = {2023-01-17},
author = {Rajesh, Rohit},
month = sep,
year = {2022},
note = {original-date: 2021-05-02T03:32:10Z},
keywords = {deep-learning, pytorch, deep-learning-algorithms, pytorch-implementation, transformer-architecture},
}
@misc{mmdetection_contributors_openmmlab_2018,
title = {{OpenMMLab} {Detection} {Toolbox} and {Benchmark}},
copyright = {Apache-2.0},
url = {https://github.com/open-mmlab/mmdetection},
abstract = {OpenMMLab Detection Toolbox and Benchmark},
urldate = {2023-01-17},
author = {{MMDetection Contributors}},
month = aug,
year = {2018},
note = {original-date: 2018-08-22T07:06:06Z}
title = {{OpenMMLab} {Detection} {Toolbox} and {Benchmark}},
copyright = {Apache-2.0},
url = {https://github.com/open-mmlab/mmdetection},
abstract = {OpenMMLab Detection Toolbox and Benchmark},
urldate = {2023-01-17},
author = {{MMDetection Contributors}},
month = aug,
year = {2018},
note = {original-date: 2018-08-22T07:06:06Z},
}
@misc{noauthor_awesome_2023,
title = {Awesome {Detection} {Transformer}},
url = {https://github.com/IDEA-Research/awesome-detection-transformer},
abstract = {Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)},
urldate = {2023-01-17},
publisher = {IDEA-Research},
month = jan,
year = {2023},
note = {original-date: 2022-03-09T05:11:49Z}
title = {Awesome {Detection} {Transformer}},
url = {https://github.com/IDEA-Research/awesome-detection-transformer},
abstract = {Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)},
urldate = {2023-01-17},
publisher = {IDEA-Research},
month = jan,
year = {2023},
note = {original-date: 2022-03-09T05:11:49Z},
}
@misc{arakelyan_aim_2020,
title = {Aim},
copyright = {Apache-2.0},
url = {https://github.com/aimhubio/aim},
abstract = {Aim 💫 — easy-to-use and performant open-source ML experiment tracker.},
urldate = {2023-01-17},
author = {Arakelyan, Gor and Soghomonyan, Gevorg and {The Aim team}},
month = jun,
year = {2020},
doi = {10.5281/zenodo.6536395}
title = {Aim},
copyright = {Apache-2.0},
url = {https://github.com/aimhubio/aim},
abstract = {Aim 💫 — easy-to-use and performant open-source ML experiment tracker.},
urldate = {2023-01-17},
author = {Arakelyan, Gor and Soghomonyan, Gevorg and {The Aim team}},
month = jun,
year = {2020},
doi = {10.5281/zenodo.6536395},
}
@misc{noauthor_open_nodate,
title = {Open {Source} {Data} {Labeling}},
url = {https://labelstud.io/},
abstract = {A flexible data labeling tool for all data types. Prepare training data for computer vision, natural language processing, speech, voice, and video models.},
language = {en},
urldate = {2023-01-17},
journal = {Label Studio},
file = {Snapshot:/home/laurent/Zotero/storage/7Y3X7GTY/labelstud.io.html:text/html}
@misc{noauthor_label_nodate,
title = {Label {Studio}},
url = {https://labelstud.io/},
abstract = {A flexible data labeling tool for all data types. Prepare training data for computer vision, natural language processing, speech, voice, and video models.},
language = {en},
urldate = {2023-01-17},
journal = {Label Studio},
file = {Snapshot:/home/laurent/Zotero/storage/7Y3X7GTY/labelstud.io.html:text/html},
}
@misc{noauthor_miscellaneous_nodate,
title = {Miscellaneous {Transformations} and {Projections}},
url = {http://paulbourke.net/geometry/transformationprojection/},
urldate = {2023-01-17},
file = {Miscellaneous Transformations and Projections:/home/laurent/Zotero/storage/WP7ZDCKF/transformationprojection.html:text/html}
title = {Miscellaneous {Transformations} and {Projections}},
url = {http://paulbourke.net/geometry/transformationprojection/},
urldate = {2023-01-17},
file = {Miscellaneous Transformations and Projections:/home/laurent/Zotero/storage/WP7ZDCKF/transformationprojection.html:text/html},
}
@article{jun-fang_wu_nonmetric_2010,
title = {Nonmetric calibration of camera lens distortion using concentric circles pattern},
url = {http://ieeexplore.ieee.org/document/5535290/},
doi = {10.1109/MACE.2010.5535290},
abstract = {A method of distortion calibration for camera is proposed. The distortion center and distortion coefficients are estimated separately. The planar concentric circles are used as the calibration pattern. By analyzing the geometrical and projective characters of concentric circles, we deduce that the line connecting the centroids of distorted concentric circles must go through the distortion center. This is utilized to compute the distortion parameters and the solution in the sense of least square are obtained. The proposed approach is entirely noniterative, therefore it keeps away from the procedure of iterative optimization. On the other hand, it is nonmetric, thus it is low cost. Experiments on both synthetic and real image data are reported. The results show our method behaves excellently. Moreover, the capability of our method to resist noise is satisfying.},
urldate = {2023-01-17},
journal = {2010 International Conference on Mechanic Automation and Control Engineering},
author = {{Jun-Fang Wu} and {Gui-Xiong Liu}},
month = jun,
year = {2010},
note = {Conference Name: 2010 International Conference on Mechanic Automation and Control Engineering (MACE)
ISBN: 9781424477371
Place: Wuhan, China
Publisher: IEEE},
pages = {3338--3341},
annote = {[TLDR] The proposed approach is entirely noniterative, therefore it keeps away from the procedure of iterative optimization and is nonmetric, thus it is low cost and the capability of the method to resist noise is satisfying.}
title = {Nonmetric calibration of camera lens distortion using concentric circles pattern},
url = {http://ieeexplore.ieee.org/document/5535290/},
doi = {10.1109/MACE.2010.5535290},
abstract = {A method of distortion calibration for camera is proposed. The distortion center and distortion coefficients are estimated separately. The planar concentric circles are used as the calibration pattern. By analyzing the geometrical and projective characters of concentric circles, we deduce that the line connecting the centroids of distorted concentric circles must go through the distortion center. This is utilized to compute the distortion parameters and the solution in the sense of least square are obtained. The proposed approach is entirely noniterative, therefore it keeps away from the procedure of iterative optimization. On the other hand, it is nonmetric, thus it is low cost. Experiments on both synthetic and real image data are reported. The results show our method behaves excellently. Moreover, the capability of our method to resist noise is satisfying.},
urldate = {2023-01-17},
journal = {2010 International Conference on Mechanic Automation and Control Engineering},
author = {{Jun-Fang Wu} and {Gui-Xiong Liu}},
month = jun,
year = {2010},
note = {Conference Name: 2010 International Conference on Mechanic Automation and Control Engineering (MACE)
ISBN: 9781424477371
Place: Wuhan, China
Publisher: IEEE},
pages = {3338--3341},
annote = {[TLDR] The proposed approach is entirely noniterative, therefore it keeps away from the procedure of iterative optimization and is nonmetric, thus it is low cost and the capability of the method to resist noise is satisfying.},
}
@misc{qiu_describing_2021,
title = {Describing and {Localizing} {Multiple} {Changes} with {Transformers}},
url = {http://arxiv.org/abs/2103.14146},
doi = {10.48550/arXiv.2103.14146},
abstract = {Change captioning tasks aim to detect changes in image pairs observed before and after a scene change and generate a natural language description of the changes. Existing change captioning studies have mainly focused on a single change.However, detecting and describing multiple changed parts in image pairs is essential for enhancing adaptability to complex scenarios. We solve the above issues from three aspects: (i) We propose a simulation-based multi-change captioning dataset; (ii) We benchmark existing state-of-the-art methods of single change captioning on multi-change captioning; (iii) We further propose Multi-Change Captioning transformers (MCCFormers) that identify change regions by densely correlating different regions in image pairs and dynamically determines the related change regions with words in sentences. The proposed method obtained the highest scores on four conventional change captioning evaluation metrics for multi-change captioning. Additionally, our proposed method can separate attention maps for each change and performs well with respect to change localization. Moreover, the proposed framework outperformed the previous state-of-the-art methods on an existing change captioning benchmark, CLEVR-Change, by a large margin (+6.1 on BLEU-4 and +9.7 on CIDEr scores), indicating its general ability in change captioning tasks.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Qiu, Yue and Yamamoto, Shintaro and Nakashima, Kodai and Suzuki, Ryota and Iwata, Kenji and Kataoka, Hirokatsu and Satoh, Yutaka},
month = sep,
year = {2021},
note = {arXiv:2103.14146 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
annote = {Comment: Accepted by ICCV2021. 18 pages, 15 figures, project page: https://cvpaperchallenge.github.io/Describing-and-Localizing-Multiple-Change-with-Transformers/},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/6GLDC5C7/Qiu et al. - 2021 - Describing and Localizing Multiple Changes with Tr.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/4ZUPCEKT/2103.html:text/html}
title = {Describing and {Localizing} {Multiple} {Changes} with {Transformers}},
url = {http://arxiv.org/abs/2103.14146},
doi = {10.48550/arXiv.2103.14146},
abstract = {Change captioning tasks aim to detect changes in image pairs observed before and after a scene change and generate a natural language description of the changes. Existing change captioning studies have mainly focused on a single change.However, detecting and describing multiple changed parts in image pairs is essential for enhancing adaptability to complex scenarios. We solve the above issues from three aspects: (i) We propose a simulation-based multi-change captioning dataset; (ii) We benchmark existing state-of-the-art methods of single change captioning on multi-change captioning; (iii) We further propose Multi-Change Captioning transformers (MCCFormers) that identify change regions by densely correlating different regions in image pairs and dynamically determines the related change regions with words in sentences. The proposed method obtained the highest scores on four conventional change captioning evaluation metrics for multi-change captioning. Additionally, our proposed method can separate attention maps for each change and performs well with respect to change localization. Moreover, the proposed framework outperformed the previous state-of-the-art methods on an existing change captioning benchmark, CLEVR-Change, by a large margin (+6.1 on BLEU-4 and +9.7 on CIDEr scores), indicating its general ability in change captioning tasks.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Qiu, Yue and Yamamoto, Shintaro and Nakashima, Kodai and Suzuki, Ryota and Iwata, Kenji and Kataoka, Hirokatsu and Satoh, Yutaka},
month = sep,
year = {2021},
note = {arXiv:2103.14146 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
annote = {Comment: Accepted by ICCV2021. 18 pages, 15 figures, project page: https://cvpaperchallenge.github.io/Describing-and-Localizing-Multiple-Change-with-Transformers/},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/6GLDC5C7/Qiu et al. - 2021 - Describing and Localizing Multiple Changes with Tr.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/4ZUPCEKT/2103.html:text/html},
}
@misc{lahoud_3d_2022,
title = {{3D} {Vision} with {Transformers}: {A} {Survey}},
shorttitle = {{3D} {Vision} with {Transformers}},
url = {http://arxiv.org/abs/2208.04309},
doi = {10.48550/arXiv.2208.04309},
abstract = {The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its ability to learn long-range dependencies. This replacement was proven to be successful in numerous tasks, in which several state-of-the-art methods rely on transformers for better learning. In computer vision, the 3D field has also witnessed an increase in employing the transformer for 3D convolution neural networks and multi-layer perceptron networks. Although a number of surveys have focused on transformers in vision in general, 3D vision requires special attention due to the difference in data representation and processing when compared to 2D vision. In this work, we present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks, including classification, segmentation, detection, completion, pose estimation, and others. We discuss transformer design in 3D vision, which allows it to process data with various 3D representations. For each application, we highlight key properties and contributions of proposed transformer-based methods. To assess the competitiveness of these methods, we compare their performance to common non-transformer methods on 12 3D benchmarks. We conclude the survey by discussing different open directions and challenges for transformers in 3D vision. In addition to the presented papers, we aim to frequently update the latest relevant papers along with their corresponding implementations at: https://github.com/lahoud/3d-vision-transformers.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
month = aug,
year = {2022},
note = {arXiv:2208.04309 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/AN3SNSVC/Lahoud et al. - 2022 - 3D Vision with Transformers A Survey.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/6BXWCFI5/2208.html:text/html}
title = {{3D} {Vision} with {Transformers}: {A} {Survey}},
shorttitle = {{3D} {Vision} with {Transformers}},
url = {http://arxiv.org/abs/2208.04309},
doi = {10.48550/arXiv.2208.04309},
abstract = {The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its ability to learn long-range dependencies. This replacement was proven to be successful in numerous tasks, in which several state-of-the-art methods rely on transformers for better learning. In computer vision, the 3D field has also witnessed an increase in employing the transformer for 3D convolution neural networks and multi-layer perceptron networks. Although a number of surveys have focused on transformers in vision in general, 3D vision requires special attention due to the difference in data representation and processing when compared to 2D vision. In this work, we present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks, including classification, segmentation, detection, completion, pose estimation, and others. We discuss transformer design in 3D vision, which allows it to process data with various 3D representations. For each application, we highlight key properties and contributions of proposed transformer-based methods. To assess the competitiveness of these methods, we compare their performance to common non-transformer methods on 12 3D benchmarks. We conclude the survey by discussing different open directions and challenges for transformers in 3D vision. In addition to the presented papers, we aim to frequently update the latest relevant papers along with their corresponding implementations at: https://github.com/lahoud/3d-vision-transformers.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
month = aug,
year = {2022},
note = {arXiv:2208.04309 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/AN3SNSVC/Lahoud et al. - 2022 - 3D Vision with Transformers A Survey.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/6BXWCFI5/2208.html:text/html},
}
@misc{noauthor_weights_nodate,
title = {Weights \& {Biases} {Developer} tools for {ML}},
url = {https://wandb.ai/site/, http://wandb.ai/site},
abstract = {WandB is a central dashboard to keep track of your hyperparameters, system metrics, and predictions so you can compare models live, and share your findings.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/GRIMYX6J/site.html:text/html}
title = {Weights \& {Biases} {Developer} tools for {ML}},
url = {https://wandb.ai/site/, http://wandb.ai/site},
abstract = {WandB is a central dashboard to keep track of your hyperparameters, system metrics, and predictions so you can compare models live, and share your findings.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/GRIMYX6J/site.html:text/html},
}
@article{dong_ellipse_2021,
title = {Ellipse {R}-{CNN}: {Learning} to {Infer} {Elliptical} {Object} from {Clustering} and {Occlusion}},
volume = {30},
issn = {1057-7149, 1941-0042},
shorttitle = {Ellipse {R}-{CNN}},
url = {http://arxiv.org/abs/2001.11584},
doi = {10.1109/TIP.2021.3050673},
abstract = {Images of heavily occluded objects in cluttered scenes, such as fruit clusters in trees, are hard to segment. To further retrieve the 3D size and 6D pose of each individual object in such cases, bounding boxes are not reliable from multiple views since only a little portion of the object's geometry is captured. We introduce the first CNN-based ellipse detector, called Ellipse R-CNN, to represent and infer occluded objects as ellipses. We first propose a robust and compact ellipse regression based on the Mask R-CNN architecture for elliptical object detection. Our method can infer the parameters of multiple elliptical objects even they are occluded by other neighboring objects. For better occlusion handling, we exploit refined feature regions for the regression stage, and integrate the U-Net structure for learning different occlusion patterns to compute the final detection score. The correctness of ellipse regression is validated through experiments performed on synthetic data of clustered ellipses. We further quantitatively and qualitatively demonstrate that our approach outperforms the state-of-the-art model (i.e., Mask R-CNN followed by ellipse fitting) and its three variants on both synthetic and real datasets of occluded and clustered elliptical objects.},
urldate = {2023-01-17},
journal = {IEEE Transactions on Image Processing},
author = {Dong, Wenbo and Roy, Pravakar and Peng, Cheng and Isler, Volkan},
year = {2021},
note = {arXiv:2001.11584 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics},
pages = {2193--2206},
annote = {Comment: 18 pages, 20 figures, 7 tables},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/QERXUH24/Dong et al. - 2021 - Ellipse R-CNN Learning to Infer Elliptical Object.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/KNUA7S3S/2001.html:text/html}
title = {Ellipse {R}-{CNN}: {Learning} to {Infer} {Elliptical} {Object} from {Clustering} and {Occlusion}},
volume = {30},
issn = {1057-7149, 1941-0042},
shorttitle = {Ellipse {R}-{CNN}},
url = {http://arxiv.org/abs/2001.11584},
doi = {10.1109/TIP.2021.3050673},
abstract = {Images of heavily occluded objects in cluttered scenes, such as fruit clusters in trees, are hard to segment. To further retrieve the 3D size and 6D pose of each individual object in such cases, bounding boxes are not reliable from multiple views since only a little portion of the object's geometry is captured. We introduce the first CNN-based ellipse detector, called Ellipse R-CNN, to represent and infer occluded objects as ellipses. We first propose a robust and compact ellipse regression based on the Mask R-CNN architecture for elliptical object detection. Our method can infer the parameters of multiple elliptical objects even they are occluded by other neighboring objects. For better occlusion handling, we exploit refined feature regions for the regression stage, and integrate the U-Net structure for learning different occlusion patterns to compute the final detection score. The correctness of ellipse regression is validated through experiments performed on synthetic data of clustered ellipses. We further quantitatively and qualitatively demonstrate that our approach outperforms the state-of-the-art model (i.e., Mask R-CNN followed by ellipse fitting) and its three variants on both synthetic and real datasets of occluded and clustered elliptical objects.},
urldate = {2023-01-17},
journal = {IEEE Transactions on Image Processing},
author = {Dong, Wenbo and Roy, Pravakar and Peng, Cheng and Isler, Volkan},
year = {2021},
note = {arXiv:2001.11584 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics},
pages = {2193--2206},
annote = {Comment: 18 pages, 20 figures, 7 tables},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/QERXUH24/Dong et al. - 2021 - Ellipse R-CNN Learning to Infer Elliptical Object.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/KNUA7S3S/2001.html:text/html},
}
@misc{haven_hdris_nodate,
title = {{HDRIs}},
url = {https://polyhaven.com/hdris/},
abstract = {Hundreds of free HDRI environments, ready to use for any purpose.},
language = {en},
urldate = {2023-01-17},
journal = {Poly Haven},
author = {Haven, Poly}
title = {{HDRIs}},
url = {https://polyhaven.com/hdris/},
abstract = {Hundreds of free HDRI environments, ready to use for any purpose.},
language = {en},
urldate = {2023-01-17},
journal = {Poly Haven},
author = {Haven, Poly},
}
@misc{zhang_dino_2022,
title = {{DINO}: {DETR} with {Improved} {DeNoising} {Anchor} {Boxes} for {End}-to-{End} {Object} {Detection}},
shorttitle = {{DINO}},
url = {http://arxiv.org/abs/2203.03605},
doi = {10.48550/arXiv.2203.03605},
abstract = {We present DINO ({\textbackslash}textbf\{D\}ETR with {\textbackslash}textbf\{I\}mproved de{\textbackslash}textbf\{N\}oising anch{\textbackslash}textbf\{O\}r boxes), a state-of-the-art end-to-end object detector. \% in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrastive way for denoising training, a mixed query selection method for anchor initialization, and a look forward twice scheme for box prediction. DINO achieves \$49.4\$AP in \$12\$ epochs and \$51.3\$AP in \$24\$ epochs on COCO with a ResNet-50 backbone and multi-scale features, yielding a significant improvement of \${\textbackslash}textbf\{+6.0\}\${\textbackslash}textbf\{AP\} and \${\textbackslash}textbf\{+2.7\}\${\textbackslash}textbf\{AP\}, respectively, compared to DN-DETR, the previous best DETR-like model. DINO scales well in both model size and data size. Without bells and whistles, after pre-training on the Objects365 dataset with a SwinL backbone, DINO obtains the best results on both COCO {\textbackslash}texttt\{val2017\} (\${\textbackslash}textbf\{63.2\}\${\textbackslash}textbf\{AP\}) and {\textbackslash}texttt\{test-dev\} ({\textbackslash}textbf\{\${\textbackslash}textbf\{63.3\}\$AP\}). Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results. Our code will be available at {\textbackslash}url\{https://github.com/IDEACVR/DINO\}.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Zhang, Hao and Li, Feng and Liu, Shilong and Zhang, Lei and Su, Hang and Zhu, Jun and Ni, Lionel M. and Shum, Heung-Yeung},
month = jul,
year = {2022},
note = {arXiv:2203.03605 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/NFL7ASJI/Zhang et al. - 2022 - DINO DETR with Improved DeNoising Anchor Boxes fo.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/IJEI9W7E/2203.html:text/html}
title = {{DINO}: {DETR} with {Improved} {DeNoising} {Anchor} {Boxes} for {End}-to-{End} {Object} {Detection}},
shorttitle = {{DINO}},
url = {http://arxiv.org/abs/2203.03605},
doi = {10.48550/arXiv.2203.03605},
abstract = {We present DINO ({\textbackslash}textbf\{D\}ETR with {\textbackslash}textbf\{I\}mproved de{\textbackslash}textbf\{N\}oising anch{\textbackslash}textbf\{O\}r boxes), a state-of-the-art end-to-end object detector. \% in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrastive way for denoising training, a mixed query selection method for anchor initialization, and a look forward twice scheme for box prediction. DINO achieves \$49.4\$AP in \$12\$ epochs and \$51.3\$AP in \$24\$ epochs on COCO with a ResNet-50 backbone and multi-scale features, yielding a significant improvement of \${\textbackslash}textbf\{+6.0\}\${\textbackslash}textbf\{AP\} and \${\textbackslash}textbf\{+2.7\}\${\textbackslash}textbf\{AP\}, respectively, compared to DN-DETR, the previous best DETR-like model. DINO scales well in both model size and data size. Without bells and whistles, after pre-training on the Objects365 dataset with a SwinL backbone, DINO obtains the best results on both COCO {\textbackslash}texttt\{val2017\} (\${\textbackslash}textbf\{63.2\}\${\textbackslash}textbf\{AP\}) and {\textbackslash}texttt\{test-dev\} ({\textbackslash}textbf\{\${\textbackslash}textbf\{63.3\}\$AP\}). Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results. Our code will be available at {\textbackslash}url\{https://github.com/IDEACVR/DINO\}.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Zhang, Hao and Li, Feng and Liu, Shilong and Zhang, Lei and Su, Hang and Zhu, Jun and Ni, Lionel M. and Shum, Heung-Yeung},
month = jul,
year = {2022},
note = {arXiv:2203.03605 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/NFL7ASJI/Zhang et al. - 2022 - DINO DETR with Improved DeNoising Anchor Boxes fo.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/IJEI9W7E/2203.html:text/html},
}