first day

This commit is contained in:
Laureηt 2023-01-17 19:35:28 +01:00
parent 7252ac6f4d
commit ca2f6e11e0
Signed by: Laurent
SSH key fingerprint: SHA256:kZEpW8cMJ54PDeCvOhzreNr4FSh6R13CMGH/POoO8DI
17 changed files with 792 additions and 14 deletions

16
.vscode/settings.json vendored
View file

@ -1,15 +1,3 @@
{ {
"files.exclude": { "explorer.excludeGitIgnore": true,
"**/.git": true, }
"**/.svn": true,
"**/.hg": true,
"**/CVS": true,
"**/.DS_Store": true,
"**/Thumbs.db": true,
// "**/*.aux": true,
// "**/*.fdb_latexmk": true,
// "**/*.fls": true,
// "**/*.log": true,
// "**/*.synctex.gz": true,
}
}

BIN
assets/DETR.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 323 KiB

BIN
assets/EllipseRCNN.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 196 KiB

BIN
assets/GPN.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 251 KiB

BIN
assets/MaskRCNN.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 178 KiB

BIN
assets/dataset1.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

BIN
assets/dataset2.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 86 KiB

BIN
assets/matte.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.6 MiB

BIN
assets/matte_inference.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 346 KiB

BIN
assets/shiny.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.4 MiB

BIN
assets/shiny_inference.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 MiB

View file

@ -0,0 +1,124 @@
% CVPR 2022 Paper Template
% based on the CVPR template provided by Ming-Ming Cheng (https://github.com/MCG-NKU/CVPR_Template)
% modified and extended by Stefan Roth (stefan.roth@NOSPAMtu-darmstadt.de)
\documentclass[10pt,twocolumn,a4paper]{article}
%%%%%%%%% PAPER TYPE - PLEASE UPDATE FOR FINAL VERSION
%\usepackage[review]{cvpr} % To produce the REVIEW version
\usepackage{cvpr} % To produce the CAMERA-READY version
%\usepackage[pagenumbers]{cvpr} % To force page numbers, e.g. for an arXiv version
% Include other packages here, before hyperref.
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{booktabs}
\usepackage[a4paper, hmargin=2cm, vmargin=3cm]{geometry}
% It is strongly recommended to use hyperref, especially for the review version.
% hyperref with option pagebackref eases the reviewers' job.
% Please disable hyperref *only* if you encounter grave issues, e.g. with the
% file validation for the camera-ready version.
%
% If you comment hyperref and then uncomment it, you should delete
% ReviewTempalte.aux before re-running LaTeX.
% (Or just hit 'q' on the first LaTeX run, let it finish, and you
% should be clear).
\usepackage[pagebackref,breaklinks,colorlinks]{hyperref}
% Support for easy cross-referencing
\usepackage[capitalize]{cleveref}
\crefname{section}{Sec.}{Secs.}
\Crefname{section}{Section}{Sections}
\Crefname{table}{Table}{Tables}
\crefname{table}{Tab.}{Tabs.}
\begin{document}
%%%%%%%%% TITLE
\title{Neural sphere detection in images for lighting calibration}
\author{
Laurent Fainsin\\
ENSEEIHT\\
{\tt\small laurent@fainsin.bzh}
}
\maketitle
%%%%%%%%% ABSTRACT
\begin{abstract}
We present a method for the automatic recognition of spherical markers in images using deep learning. The markers are used for precise lighting calibration required for photometric 3D-vision techniques such as RTI or photometric stereo. We use the Mask R-CNN model for instance segmentation, and train it on a dataset of synthetically generated images. We demonstrate that our method can accurately detect markers in real images and that it outperforms traditional methods.
\end{abstract}
\begin{keywords}
Sphere detection, Instance segmentation, Neural network, Mask R-CNN, Lighting calibration.
\end{keywords}
%%%%%%%%% BODY TEXT
\section{Introduction}
\label{sec:intro}
During my 2022 summer internship, as part of my engineering curriculum, I chose a research-oriented technical internship to discover this aspect of computer science engineering. I received an offer from my professor working at the halfway through the year to work with them at the Research Institute in Computer Science of Toulouse for the REVA team. As a research intern in computer vision, I worked on automatic recognition of spherical markers in images using deep learning for precise lighting calibration required for photometric stereo.
\section{Account of the work}
\label{sec:work}
The work mainly consisted of improving a previous method of marker detection, which was not based on a deep learning model, but used only traditional algorithms, it was a very manual process.
\subsection{Model}
For this purpose, many papers were analyzed to get an idea of the state of the art, thus multiple deep learning models were investigated, their performances and flexibility were compared. We decided to use Mask R-CNN, as it is a well-established model with good standard implementation for instance segmentation. As in any deep learning project, most of the work was spent training and fine-tuning hyper-parameters in order to detect the markers as best as possible.
\subsection{Dataset}
The work also consisted of investigating new ways to generate the training data, as there were no datasets available for our specific application. Clean photos used in photometric stereo are unsurprisingly rare. Our final training image set consisted of synthetics images generated via compositing. We used the 2017 COCO unlabelled images dataset, containing 123287 images in which we have embedded spherical markers. These markers originated from photographs of spheres in situ under various illuminations, and from synthetic renders from Blender. We present an example of such a picture in Figure \ref{fig:train}. Combined with various data augmentation transformations, this allowed us to easily obtain an image set of considerable size with the associated ground truth.
\begin{figure}[t]
\centering
%\fbox{\rule{0pt}{2in} \rule{0.9\linewidth}{0pt}}
\includegraphics[width=\linewidth]{image.jpg}
\caption{Example of synthetic data from our dataset: picture from COCO with composited spheres on top}
\label{fig:test}
\end{figure}
\begin{figure}[t]
\centering
%\fbox{\rule{0pt}{2in} \rule{0.9\linewidth}{0pt}}
\includegraphics[width=\linewidth]{RESULTAT2.png}
\caption{Inference output of a test image}
\label{fig:train}
\end{figure}
\subsection{Deploying}
The final task was to deploy the trained model to production. The model was first converted to the popular ONNX format. The popular 3D reconstruction software open-source Meshroom by AliceVision was then modified to use ONNXRuntime to use the model. Scientists and Archeologists are now able to compute automatically the light direction in their images when reconstructing scenes in which a white sphere is present, just like in Figure \ref{fig:test}.
\section{Feedback analysis}
\label{sec:feedback}
This internship was my first experience in applied research, and I learned a lot about working in a research environment. In particular, I learned how to use various deep learning frameworks (PyTorch, Weights \& Biases, etc.), how to train and fine-tune models, and how to evaluate their performances. I also learned how to generate synthetic data, which is a very important skill in the field of computer vision, where real data is often sparse.
I also learned a lot about the process of research itself, from the formulation of the problem to the publication of the results. In particular, I learned how to write a scientific paper, which is a very valuable skill for any computer scientist.
Finally, I learned how to work in a team of researchers, and how to communicate my work to other people. This is a very important skill for any computer scientist, as research is often a very collaborative effort.
\section{Conclusion}
\label{sec:conclusion}
Overall, I had a very positive experience during my internship. I learned a lot of new skills, and gained a better understanding of the research process. I would definitely recommend this type of internship to any computer science student who is interested in research. This internship was a very valuable experience for me, and I am very grateful to have had the opportunity to work in such a stimulating environment.
\subsection{Acknowledgement}
I would like to thank my supervisors, Jean Mélou and Jean-Denis Durou, for their guidance and support during my internship. I would also like to thank the REVA team, and the Research Institute in Computer Science of Toulouse, for their hospitality and for providing me with the resources I needed to complete my work.
%%%%%%%%% REFERENCES
{\small
\bibliographystyle{ieee_fullname}
\bibliography{egbib}
\nocite{*}
}
\end{document}

48
src/qcav.bib Normal file
View file

@ -0,0 +1,48 @@
@inproceedings{MaskRCNN,
author = {He, Kaiming and Gkioxari, Georgia and Dollár, Piotr and Girshick, Ross},
booktitle = {Proceedings of ICCV},
title = {{Mask R-CNN}},
year = {2017},
doi = {10.1109/ICCV.2017.322}
}
@inproceedings{CoCo,
author = {Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro
and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C. Lawrence},
title = {{Microsoft COCO: Common Objects in Context}},
booktitle = {Proceedings of ECCV},
year = {2014}
}
@inproceedings{girshick2015fast,
author = {Girshick, Ross},
title = {{Fast R-CNN}},
booktitle = {Proceedings of ICCV},
year = {2015}
}
@incollection{durou2020,
author = {Durou, Jean-Denis and Falcone, Maurizio and Qu{\'e}au, Yvain and Tozza, Silvia},
title = {{A Comprehensive Introduction to Photometric 3D-reconstruction}},
booktitle = {{Advances in Photometric 3D-Reconstruction}},
pages = {1--29},
publisher = {{Springer}},
collection = {{Advances in Computer Vision and Pattern Recognition}},
year = {2020}
}
@article{giachetti2018,
author = {Giachetti, Andrea and Ciortan, Irina Mihaela and Daffara, Claudia and Marchioro, Giacomo and Pintus, Ruggero and Gobbetti, Enrico},
title = {A novel framework for highlight reflectance transformation imaging},
journal = {CVIU},
volume = {168},
pages = {118-131},
year = {2018}
}
@inproceedings{spheredetect,
author = {Laurent Fainsin and Jean Mélou and Lilian Calvet and Antoine Laurent and Axel Carlier and Jean-Denis Durou},
title = {Neural sphere detection in images for lighting calibration},
booktitle = {Proceedings of QCAV},
year = {2023}
}

94
src/qcav.tex.old Normal file
View file

@ -0,0 +1,94 @@
\documentclass[]{spie} %>>> use for US letter paper
%\documentclass[a4paper]{spie} %>>> use this instead for A4 paper
%\documentclass[nocompress]{spie} %>>> to avoid compression of citations
\renewcommand{\baselinestretch}{1.0} % Change to 1.65 for double spacing
\usepackage{amsmath,amsfonts,amssymb}
\usepackage{graphicx}
\usepackage[colorlinks=true, allcolors=blue]{hyperref}
\title{Neural sphere detection in images for lighting calibration}
\author{Laurent \textsc{Fainsin}}
\author{Jean \textsc{M\'elou}}
\author{Lilian \textsc{Calvet}}
\author{Axel \textsc{Carlier}}
\author{Jean-Denis \textsc{Durou}}
\affil{IRIT, UMR CNRS 5505, Universit{\'e} de Toulouse, France}
% Option to view page numbers
\pagestyle{empty} % change to \pagestyle{plain} for page numbers
\setcounter{page}{301} % Set start page numbering at e.g. 301
\begin{document}
\maketitle
\begin{abstract}
The detection of spheres in images is useful for photometric 3D-vision techniques such as RTI~\cite{giachetti2018} or photometric stereo~\cite{durou2020}, for which a precise calibration of the lighting is required. We propose to train a neural network called Mask R-CNN for this task, and show that the segmentation of any number of spheres in an image using this network is at least as accurate, and much faster, than manual segmentation.
\end{abstract}
% Include a list of keywords after the abstract
\keywords{Sphere detection, Instance segmentation, Neural network, Mask R-CNN, Lighting calibration.}
\section{Methodology}
\label{sec:methodo}
Our training dataset consists of synthetics images generated via compositing. We used the 2017 COCO~\cite{CoCo} unlabelled images dataset, containing 123287 images in which we have embedded spherical markers. These markers originated from photographs of spheres in situ under various illuminations, and from synthetic renders from Blender. We present an example of such picture in Figure~\ref{fig:train}. Combined with various data augmentation transformations, this allowed us to easily obtain an image set of considerable size with the associated ground truth.
The Mask R-CNN~\cite{MaskRCNN} neural network is particularly well-suited to our problem since it aims at an instance segmentation, which will allow us to perform different treatments on each of the detected spheres. Indeed, detection networks like Faster R-CNN have two outputs: the class of the detected object and its bounding box. Mask R-CNN adds a third branch (see Figure~\ref{fig:maskRCNN}) which allows us to obtain the mask of the object.
\begin{figure}[!h]
\centering
\includegraphics[width=0.5\linewidth]{Figures/MaskRCNN.png}
\caption{Third branch of Mask R-CNN, which allows instance segmentation (image extracted from~\cite{MaskRCNN}).}
\label{fig:maskRCNN}
\end{figure}
We chose the official PyTorch implementation of Mask R-CNN from the TorchVision module. This implementation performs additional transforms to our images before feeding them to the model. Our images are thus resized and normalized appropriately. Indeed, some of the images are authentic archaeological images that are used for metrological purposes and are therefore very large.
We used the original Mask R-CNN loss function: $L = L_\text{cls} + L_\text{box} + L_\text{mask}$. As $L_\text{cls}$ concerns classification and is a log-loss, it is not of much interest to us for the moment since we currently have only one class. On the other hand, $L_\text{box}$ robustly measures the adequacy of the estimated bounding box with respect to the ground truth~\cite{girshick2015fast} via a smooth L1 loss, whereas $L_\text{mask}$ evaluates the resulting mask using an average binary cross-entropy loss.
We used the mean Average Precision (mAP) as our main metric. It is a classical metric in object detection, based on the principle of Intersection over Union (IoU). The network was trained using an Adam optimizer with a learning rate of $1.10^{-3}$, a train batch size of $6$, and an unlimited number of epochs as we opted for an early stopping strategy on the mAP, with a patience of $5$ and minimum delta of $0.01$. We ultimately obtain a bounding box mAP of about 0.8, which indicates a good detection of our spheres.
\begin{figure}[!h]
\centering
\begin{tabular}{ccc}
\includegraphics[width=0.3\linewidth]{Figures/Train/1/image.jpg} &
\includegraphics[width=0.3\linewidth]{Figures/Train/1/MASK.PNG} &
\includegraphics[width=0.3\linewidth]{Figures/Train/1/result.png}
\end{tabular}
\caption{Example of synthetic data from our dataset. From left to right: picture from COCO with composited spheres on top; generated ground truth mask of the spheres (each color denotes an instance); inference output of our network.}
\label{fig:train}
\end{figure}
\section{Results}
\label{sec:conclusion}
Once segmented, the silhouette of a sphere can indeed give us a lot of information about the luminous environment of the 3D-scene. In the particular case where the sphere is matte, the brightest point is the one where the normal points towards the light source. The left image in Figure \ref{fig:results} shows an example of capture made in a painted cave, where such a sphere has been placed near the wall, in order to implement photometric stereo.
\begin{figure}[!h]
\centering
\begin{tabular}{ccc}
\includegraphics[height=0.25\linewidth]{Figures/Test/RESULTAT2.png} &
\includegraphics[height=0.25\linewidth]{Figures/Results/4.png} &
\includegraphics[height=0.25\linewidth]{Figures/Results/2b.png}
\end{tabular}
\caption{Inference outputs of three test images: the chrome sphere in the right image has not been detected.}
\label{fig:results}
\end{figure}
\section{Conclusion and Perspectives}
\label{sec:conclusion}
In this paper, we present a new method for calibration spheres detection using deep learning, which is necessary for several 3D-reconstruction techniques such as RTI or photometric stereo. This is a rather simple task (Hough transform does the trick), but problems arise when a robust detection is required, as cast shadows or any circular patterns create false positives. We therefore propose a neural network based approach, which is much faster than manual detection, and even more accurate, in practice, when shadows are located near the silhouette boundary.
We deliberately put aside the classification allowed by the Mask R-CNN neural network. We therefore hope to be able to use this aspect to detect more types of spheres, especially the chrome spheres that are used in the post-production industry to collect a complete mapping of the light environment (such a sphere has not been detected in the right image in Figure \ref{fig:results}).
\bibliography{biblio} % bibliography data in report.bib
\bibliographystyle{spiebib} % makes bibtex use spiebib.bst
\end{document}

BIN
src/rapport3-lfainsin.pdf Normal file

Binary file not shown.

143
src/rapport3-lfainsin.tex Normal file
View file

@ -0,0 +1,143 @@
\documentclass[a4paper, 11pt]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{graphicx}
\usepackage{amsfonts}
\usepackage{color}
\usepackage[pagebackref,breaklinks,colorlinks]{hyperref}
\usepackage[a4paper, hmargin=2cm, vmargin=3cm]{geometry}
\graphicspath{{../assets/}}
\begin{document}
\title{"Projet Long" Bibliography}
\author{Laurent Fainsin}
\date{\the\year-\ifnum\month<10\relax0\fi\the\month-\ifnum\day<10\relax0\fi\the\day}
\maketitle
\newpage
{
\hypersetup{hidelinks}
\tableofcontents
}
\newpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
The field of 3D reconstruction techniques in photography, such as Reflectance Transformation Imaging (RTI)~\cite{giachetti2018} and Photometric Stereo~\cite{durou2020}, often require a precise understanding of the lighting conditions in the scene being captured. One common method for calibrating the lighting is to include one or more spheres in the scene, as shown in the left example of Figure~\ref{fig:intro}. However, manually outlining these spheres can be tedious and time-consuming, especially in the field of visual effects where the presence of chrome spheres is prevalent~\cite{jahirul_grey_2021}. This task can be made more efficient by using deep learning methods for detection. The goal of this project is to develop a neural network that can accurately detect both matte and shiny spheres in a scene.
\begin{figure}[h]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.35\linewidth]{matte.jpg} &
\includegraphics[height=0.35\linewidth]{shiny.jpg}
\end{tabular}
\caption{Left: a scene with matte spheres. Right: a scene with a shiny sphere.}
\label{fig:intro}
\end{figure}
\section{Previous work}
Previous work by Laurent Fainsin et al. in~\cite{spheredetect} attempted to address this problem by using a neural network called Mask R-CNN~\cite{MaskRCNN} for instance segmentation of spheres in images. However, this approach is limited in its ability to detect shiny spheres, as demonstrated in the right image of Figure~\ref{fig:previouswork}. The network was trained on images of matte spheres and was unable to generalize to shiny spheres, which highlights the need for further research in this area.
\begin{figure}[h]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.35\linewidth]{matte_inference.png} &
\includegraphics[height=0.35\linewidth]{shiny_inference.png}
\end{tabular}
\caption{Mask R-CNN~\cite{MaskRCNN} inferences from~\cite{spheredetect} on Figure~\ref{fig:intro}.}
\label{fig:previouswork}
\end{figure}
\section{Current state of the art}
The automatic detection (or segmentation) of spheres in scenes is a rather niche task and as a result there exists no known direct method to solve this problem.
\subsection{Datasets}
In~\cite{spheredetect}, it is explained that obtaining clean photographs with spherical markers for use in 3D reconstruction techniques are unsurprisingly rare. To address this issue, the authors of the paper crafted a training custom dataset using python and blender scripts. This was done by compositing known spherical markers (real or synthetic) onto background images from the COCO dataset~\cite{COCO}. The result of such technique is visible in Figure~\ref{fig:spheredetectdataset}.
\begin{figure}[h]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.3\linewidth]{dataset1.jpg} &
\includegraphics[height=0.3\linewidth]{dataset2.jpg}
\end{tabular}
\caption{Example of the synthetic dataset used in~\cite{spheredetect}.}
\label{fig:spheredetectdataset}
\end{figure}
During the research of this bibliography we found some additional datasets that we may be able to use.
\cite{legendre_deeplight_2019}
\cite{haven_hdris_nodate}
\cite{murmann_dataset_2019}
\subsection{Models}
\subsubsection{Mask R-CNN}
In~\cite{spheredetect}, the authors use Mask R-CNN~\cite{MaskRCNN} as a base model for their task. Mask R-CNN is a neural network that is able to perform instance segmentation, which is the task of detecting and segmenting objects in an image.
The network is composed of two parts: a backbone network and a region proposal network (RPN). The backbone network is a convolutional neural network that is used to extract features from the input image. The RPN is a fully convolutional network that is used to generate region proposals, which are bounding boxes that are used to crop the input image. The RPN is then used to generate a mask for each region proposal, which is used to segment the object in the image.
\begin{figure}[h]
\centering
\includegraphics[width=0.6\linewidth]{MaskRCNN.png}
\caption{The Mask-RCNN~\cite{MaskRCNN} architecture.}
\label{fig:maskrcnn}
\end{figure}
The network is trained using a loss function that is composed of three terms: the classification loss, the bounding box regression loss, and the mask loss. The classification loss is used to train the network to classify each region proposal as either a sphere or not a sphere. The bounding box regression loss is used to train the network to regress the bounding box of each region proposal. The mask loss is used to train the network to generate a mask for each region proposal. The original network was trained using the COCO dataset~\cite{COCO}.
While the authors of the paper~\cite{spheredetect} obtain good results from this network on matte spheres, their performance drop when shiny spheres are introduced. This could be explained by the fact that convolutional neural network tend to extract local features from images. Indeed, you can only really indentify a chrome sphere if you can observe the "interior and exterior" of the sphere, delimited by a "distortion" effect.
\subsubsection{Ellipse R-CNN}
To detect spheres in images, it is sufficient to estimate the center and radius of their projected circles. However, due to the perspective nature of photographs, the circles are often distorted and appear as ellipses.
The Ellipse R-CNN~\cite{dong_ellipse_2021} is a modified version of the Mask R-CNN~\cite{MaskRCNN} which can detect ellipses in images, it addresses this issue by using an additional branch in the network to predict the axes of the ellipse and its orientation, which allows for more accurate detection of objects and in our case spheres. It also have a feature of handling occlusion, by predicting the segmentation mask for each ellipse, it can handle overlapping and occluded objects. This makes it an ideal choice for detecting spheres in real-world images with complex backgrounds and variable lighting conditions.
\begin{figure}[h]
\centering
\includegraphics[width=0.6\linewidth]{EllipseRCNN.png}
\caption{The Ellipse R-CNN~\cite{dong_ellipse_2021} architecture.}
\label{fig:ellipsercnn}
\end{figure}
\subsubsection{GPN}
\begin{figure}[h]
\centering
\includegraphics[width=0.6\linewidth]{GPN.png}
\caption{The GPN~\cite{li_detecting_2019} architecture.}
\label{fig:gpn}
\end{figure}
\subsubsection{DETR}
\begin{figure}[h]
\centering
\includegraphics[width=0.8\linewidth]{DETR.png}
\caption{The DETR~\cite{carion_end--end_2020} architecture.}
\label{fig:detr}
\end{figure}
+ \cite{zhang_dino_2022}
\section{Conclusion}
From what we know it is now rather easy to elaborate a plan to try to solve our problem. ...
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newpage
\bibliography{zotero,qcav}
\bibliographystyle{plain}
\end{document}

381
src/zotero.bib Normal file
View file

@ -0,0 +1,381 @@
@misc{van_strien_training_2022,
title = {Training an object detection model using {Hugging} {Face}},
url = {https://danielvanstrien.xyz/huggingface/huggingface-datasets/transformers/2022/08/16/detr-object-detection.html},
abstract = {training a Detr object detection model using Hugging Face transformers and datasets},
language = {en},
urldate = {2023-01-17},
journal = {Daniel van Strien},
author = {Van Strien, Daniel},
month = aug,
year = {2022},
file = {Snapshot:/home/laurent/Zotero/storage/DXQJISMX/detr-object-detection.html:text/html}
}
@article{dror_recognition_nodate,
title = {Recognition of {Surface} {Reflectance} {Properties} from a {Single} {Image} under {Unknown} {Real}-{World} {Illumination}},
abstract = {This paper describes a machine vision system that classifies reflectance properties of surfaces such as metal, plastic, or paper, under unknown real-world illumination. We demonstrate performance of our algorithm for surfaces of arbitrary geometry. Reflectance estimation under arbitrary omnidirectional illumination proves highly underconstrained. Our reflectance estimation algorithm succeeds by learning relationships between surface reflectance and certain statistics computed from an observed image, which depend on statistical regularities in the spatial structure of real-world illumination. Although the algorithm assumes known geometry, its statistical nature makes it robust to inaccurate geometry estimates.},
language = {en},
author = {Dror, Ron O and Adelson, Edward H and Willsky, Alan S},
file = {Dror et al. - Recognition of Surface Reflectance Properties from .pdf:/home/laurent/Zotero/storage/HJXFDDT6/Dror et al. - Recognition of Surface Reflectance Properties from .pdf:application/pdf}
}
@article{legendre_deeplight_2019,
title = {{DeepLight}: {Learning} {Illumination} for {Unconstrained} {Mobile} {Mixed} {Reality}},
abstract = {We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV). For training data, we collect videos of various reflective spheres placed within the cameras FOV, leaving most of the background unoccluded, leveraging that materials with diverse reflectance functions reveal different lighting cues in a single exposure. We train a deep neural network to regress from the LDR background image to HDR lighting by matching the LDR ground truth sphere images to those rendered with the predicted illumination using imagebased relighting, which is differentiable. Our inference runs at interactive frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed reality. Training on automatically exposed and white-balanced videos, we improve the realism of rendered objects compared to the state-of-the art methods for both indoor and outdoor scenes.},
language = {en},
author = {LeGendre, Chloe and Ma, Wan-Chun and Fyffe, Graham and Flynn, John and Charbonnel, Laurent and Busch, Jay and Debevec, Paul},
year = {2019},
file = {LeGendre et al. - DeepLight Learning Illumination for Unconstrained.pdf:/home/laurent/Zotero/storage/7FGL25G5/LeGendre et al. - DeepLight Learning Illumination for Unconstrained.pdf:application/pdf}
}
@misc{tazi_fine-tuning_nodate,
title = {Fine-tuning {DETR} for license plates detection},
url = {https://kaggle.com/code/nouamane/fine-tuning-detr-for-license-plates-detection},
abstract = {Explore and run machine learning code with Kaggle Notebooks {\textbar} Using data from multiple data sources},
language = {en},
urldate = {2023-01-17},
author = {Tazi, Nouamane},
file = {Snapshot:/home/laurent/Zotero/storage/WHFVB3QC/fine-tuning-detr-for-license-plates-detection.html:text/html}
}
@inproceedings{murmann_dataset_2019,
address = {Seoul, Korea (South)},
title = {A {Dataset} of {Multi}-{Illumination} {Images} in the {Wild}},
isbn = {978-1-72814-803-8},
url = {https://ieeexplore.ieee.org/document/9008252/},
doi = {10.1109/ICCV.2019.00418},
abstract = {Collections of images under a single, uncontrolled illumination [42] have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation [26, 43, 18]. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. The data simply does not contain the necessary supervisory signals. Multiillumination datasets are notoriously hard to capture, so the data is typically collected at small scale, in controlled environments, either using multiple light sources [10, 53], or robotic gantries [8, 20]. This leads to image collections that are not representative of the variety and complexity of real-world scenes. We introduce a new multi-illumination dataset of more than 1000 real scenes, each captured in high dynamic range and high resolution, under 25 lighting conditions. We demonstrate the richness of this dataset by training state-of-the-art models for three challenging applications: single-image illumination estimation, image relighting, and mixed-illuminant white balance.},
language = {en},
urldate = {2023-01-17},
booktitle = {2019 {IEEE}/{CVF} {International} {Conference} on {Computer} {Vision} ({ICCV})},
publisher = {IEEE},
author = {Murmann, Lukas and Gharbi, Michael and Aittala, Miika and Durand, Fredo},
month = oct,
year = {2019},
pages = {4079--4088},
file = {Murmann et al. - 2019 - A Dataset of Multi-Illumination Images in the Wild.pdf:/home/laurent/Zotero/storage/KH9HA9SQ/Murmann et al. - 2019 - A Dataset of Multi-Illumination Images in the Wild.pdf:application/pdf}
}
@misc{arora_annotated_2021,
title = {The {Annotated} {DETR}},
url = {https://amaarora.github.io/2021/07/26/annotateddetr.html},
abstract = {This is a place where I write freely and try to uncomplicate the complicated for myself and everyone else through Python code.},
language = {en},
urldate = {2023-01-17},
journal = {Committed towards better future},
author = {Arora, Aman},
month = jul,
year = {2021},
file = {Snapshot:/home/laurent/Zotero/storage/G78PSBHE/annotateddetr.html:text/html}
}
@misc{carion_end--end_2020,
title = {End-to-{End} {Object} {Detection} with {Transformers}},
url = {http://arxiv.org/abs/2005.12872},
doi = {10.48550/arXiv.2005.12872},
abstract = {We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines. Training code and pretrained models are available at https://github.com/facebookresearch/detr.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey},
month = may,
year = {2020},
note = {arXiv:2005.12872 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/KBRPD4CU/Carion et al. - 2020 - End-to-End Object Detection with Transformers.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/6445LQV5/2005.html:text/html}
}
@misc{li_detecting_2019,
title = {Detecting {Lesion} {Bounding} {Ellipses} {With} {Gaussian} {Proposal} {Networks}},
url = {http://arxiv.org/abs/1902.09658},
doi = {10.48550/arXiv.1902.09658},
abstract = {Lesions characterized by computed tomography (CT) scans, are arguably often elliptical objects. However, current lesion detection systems are predominantly adopted from the popular Region Proposal Networks (RPNs) that only propose bounding boxes without fully leveraging the elliptical geometry of lesions. In this paper, we present Gaussian Proposal Networks (GPNs), a novel extension to RPNs, to detect lesion bounding ellipses. Instead of directly regressing the rotation angle of the ellipse as the common practice, GPN represents bounding ellipses as 2D Gaussian distributions on the image plain and minimizes the Kullback-Leibler (KL) divergence between the proposed Gaussian and the ground truth Gaussian for object localization. We show the KL divergence loss approximately incarnates the regression loss in the RPN framework when the rotation angle is 0. Experiments on the DeepLesion dataset show that GPN significantly outperforms RPN for lesion bounding ellipse detection thanks to lower localization error. GPN is open sourced at https://github.com/baidu-research/GPN},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Li, Yi},
month = feb,
year = {2019},
note = {arXiv:1902.09658 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/IB8AWGHV/Li - 2019 - Detecting Lesion Bounding Ellipses With Gaussian P.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/ZGKBBB98/1902.html:text/html}
}
@misc{noauthor_detr_nodate,
title = {{DETR}},
url = {https://huggingface.co/docs/transformers/model_doc/detr},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/2AQYDSL3/detr.html:text/html}
}
@misc{noauthor_opencv_nodate,
title = {{OpenCV}: {Camera} {Calibration}},
url = {https://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html},
urldate = {2023-01-17},
file = {OpenCV\: Camera Calibration:/home/laurent/Zotero/storage/7C3DT2WU/tutorial_py_calibration.html:text/html}
}
@misc{jahirul_grey_2021,
title = {The {Grey}, the {Chrome} and the {Macbeth} {Chart} {CAVE} {Academy}},
url = {https://caveacademy.com/wiki/onset-production/data-acquisition/data-acquisition-training/the-grey-the-chrome-and-the-macbeth-chart/},
language = {en-US},
urldate = {2023-01-17},
author = {Jahirul, Amin},
month = jul,
year = {2021},
file = {Snapshot:/home/laurent/Zotero/storage/TM2TJKMH/the-grey-the-chrome-and-the-macbeth-chart.html:text/html}
}
@misc{doppenberg_lunar_2022,
title = {Lunar {Orbit} {Navigation} {Using} {Ellipse} {R}-{CNN} and {Crater} {Pattern} {Matching}},
copyright = {MIT},
url = {https://github.com/wdoppenberg/crater-detection},
abstract = {Autonomous Lunar Orbit Navigation Using Ellipse R-CNN and Crater Pattern Matching},
urldate = {2023-01-17},
author = {Doppenberg, Wouter},
month = aug,
year = {2022},
note = {original-date: 2020-10-19T16:32:29Z},
keywords = {crater-detection, ellipse-rcnn, faster-rcnn, space-engineering}
}
@misc{doppenberg_ellipse_2022,
title = {Ellipse {R}-{CNN}},
copyright = {MIT},
url = {https://github.com/wdoppenberg/ellipse-rcnn},
abstract = {A PyTorch implementation of Ellipse R-CNN},
urldate = {2023-01-17},
author = {Doppenberg, Wouter},
month = dec,
year = {2022},
note = {original-date: 2021-06-25T09:21:44Z},
keywords = {ellipse-rcnn, deep-learning, pytorch, pytorch-lightning, region-based}
}
@misc{wok_finetune_2022,
title = {Finetune {DETR}},
copyright = {MIT},
url = {https://github.com/woctezuma/finetune-detr},
abstract = {Fine-tune Facebook's DETR (DEtection TRansformer) on Colaboratory.},
urldate = {2023-01-17},
author = {Wok},
month = dec,
year = {2022},
note = {original-date: 2020-08-03T17:17:35Z},
keywords = {balloon, balloons, colab, colab-notebook, colaboratory, detr, facebook, finetune, finetunes, finetuning, google-colab, google-colab-notebook, google-colaboratory, instance, instance-segmentation, instances, segementation, segment}
}
@misc{noauthor_datasets_nodate,
title = {Datasets},
url = {https://huggingface.co/docs/datasets/index},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/RYXSCZR7/index.html:text/html}
}
@misc{rogge_transformers_2020,
title = {Transformers {Tutorials}"},
copyright = {MIT},
url = {https://github.com/NielsRogge/Transformers-Tutorials},
abstract = {This repository contains demos I made with the Transformers library by HuggingFace.},
urldate = {2023-01-17},
author = {Rogge, Niels},
month = sep,
year = {2020},
doi = {10.5281/zenodo.1234}
}
@misc{noauthor_recommendations_2020,
title = {Recommendations for training {Detr} on custom dataset? · {Issue} \#9 · facebookresearch/detr},
shorttitle = {Recommendations for training {Detr} on custom dataset?},
url = {https://github.com/facebookresearch/detr/issues/9},
abstract = {Very impressed with the all new innovative architecture in Detr! Can you clarify recommendations for training on a custom dataset? Should we build a model similar to demo and train, or better to us...},
language = {en},
urldate = {2023-01-17},
journal = {GitHub},
month = may,
year = {2020},
file = {Snapshot:/home/laurent/Zotero/storage/G2S6584X/9.html:text/html}
}
@misc{noauthor_auto_nodate,
title = {Auto {Classes}},
url = {https://huggingface.co/docs/transformers/model_doc/auto},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17}
}
@misc{noauthor_swin_nodate,
title = {Swin {Transformer}},
url = {https://huggingface.co/docs/transformers/v4.24.0/en/model_doc/swin},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/K2NDEY49/swin.html:text/html}
}
@misc{rajesh_pytorch_2022,
title = {{PyTorch} {Implementations} of various state of the art architectures.},
url = {https://github.com/04RR/SOTA-Vision},
abstract = {Implementation of various state of the art architectures used in computer vision.},
urldate = {2023-01-17},
author = {Rajesh, Rohit},
month = sep,
year = {2022},
note = {original-date: 2021-05-02T03:32:10Z},
keywords = {deep-learning, pytorch, deep-learning-algorithms, pytorch-implementation, transformer-architecture}
}
@misc{mmdetection_contributors_openmmlab_2018,
title = {{OpenMMLab} {Detection} {Toolbox} and {Benchmark}},
copyright = {Apache-2.0},
url = {https://github.com/open-mmlab/mmdetection},
abstract = {OpenMMLab Detection Toolbox and Benchmark},
urldate = {2023-01-17},
author = {{MMDetection Contributors}},
month = aug,
year = {2018},
note = {original-date: 2018-08-22T07:06:06Z}
}
@misc{noauthor_awesome_2023,
title = {Awesome {Detection} {Transformer}},
url = {https://github.com/IDEA-Research/awesome-detection-transformer},
abstract = {Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)},
urldate = {2023-01-17},
publisher = {IDEA-Research},
month = jan,
year = {2023},
note = {original-date: 2022-03-09T05:11:49Z}
}
@misc{arakelyan_aim_2020,
title = {Aim},
copyright = {Apache-2.0},
url = {https://github.com/aimhubio/aim},
abstract = {Aim 💫 — easy-to-use and performant open-source ML experiment tracker.},
urldate = {2023-01-17},
author = {Arakelyan, Gor and Soghomonyan, Gevorg and {The Aim team}},
month = jun,
year = {2020},
doi = {10.5281/zenodo.6536395}
}
@misc{noauthor_open_nodate,
title = {Open {Source} {Data} {Labeling}},
url = {https://labelstud.io/},
abstract = {A flexible data labeling tool for all data types. Prepare training data for computer vision, natural language processing, speech, voice, and video models.},
language = {en},
urldate = {2023-01-17},
journal = {Label Studio},
file = {Snapshot:/home/laurent/Zotero/storage/7Y3X7GTY/labelstud.io.html:text/html}
}
@misc{noauthor_miscellaneous_nodate,
title = {Miscellaneous {Transformations} and {Projections}},
url = {http://paulbourke.net/geometry/transformationprojection/},
urldate = {2023-01-17},
file = {Miscellaneous Transformations and Projections:/home/laurent/Zotero/storage/WP7ZDCKF/transformationprojection.html:text/html}
}
@article{jun-fang_wu_nonmetric_2010,
title = {Nonmetric calibration of camera lens distortion using concentric circles pattern},
url = {http://ieeexplore.ieee.org/document/5535290/},
doi = {10.1109/MACE.2010.5535290},
abstract = {A method of distortion calibration for camera is proposed. The distortion center and distortion coefficients are estimated separately. The planar concentric circles are used as the calibration pattern. By analyzing the geometrical and projective characters of concentric circles, we deduce that the line connecting the centroids of distorted concentric circles must go through the distortion center. This is utilized to compute the distortion parameters and the solution in the sense of least square are obtained. The proposed approach is entirely noniterative, therefore it keeps away from the procedure of iterative optimization. On the other hand, it is nonmetric, thus it is low cost. Experiments on both synthetic and real image data are reported. The results show our method behaves excellently. Moreover, the capability of our method to resist noise is satisfying.},
urldate = {2023-01-17},
journal = {2010 International Conference on Mechanic Automation and Control Engineering},
author = {{Jun-Fang Wu} and {Gui-Xiong Liu}},
month = jun,
year = {2010},
note = {Conference Name: 2010 International Conference on Mechanic Automation and Control Engineering (MACE)
ISBN: 9781424477371
Place: Wuhan, China
Publisher: IEEE},
pages = {3338--3341},
annote = {[TLDR] The proposed approach is entirely noniterative, therefore it keeps away from the procedure of iterative optimization and is nonmetric, thus it is low cost and the capability of the method to resist noise is satisfying.}
}
@misc{qiu_describing_2021,
title = {Describing and {Localizing} {Multiple} {Changes} with {Transformers}},
url = {http://arxiv.org/abs/2103.14146},
doi = {10.48550/arXiv.2103.14146},
abstract = {Change captioning tasks aim to detect changes in image pairs observed before and after a scene change and generate a natural language description of the changes. Existing change captioning studies have mainly focused on a single change.However, detecting and describing multiple changed parts in image pairs is essential for enhancing adaptability to complex scenarios. We solve the above issues from three aspects: (i) We propose a simulation-based multi-change captioning dataset; (ii) We benchmark existing state-of-the-art methods of single change captioning on multi-change captioning; (iii) We further propose Multi-Change Captioning transformers (MCCFormers) that identify change regions by densely correlating different regions in image pairs and dynamically determines the related change regions with words in sentences. The proposed method obtained the highest scores on four conventional change captioning evaluation metrics for multi-change captioning. Additionally, our proposed method can separate attention maps for each change and performs well with respect to change localization. Moreover, the proposed framework outperformed the previous state-of-the-art methods on an existing change captioning benchmark, CLEVR-Change, by a large margin (+6.1 on BLEU-4 and +9.7 on CIDEr scores), indicating its general ability in change captioning tasks.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Qiu, Yue and Yamamoto, Shintaro and Nakashima, Kodai and Suzuki, Ryota and Iwata, Kenji and Kataoka, Hirokatsu and Satoh, Yutaka},
month = sep,
year = {2021},
note = {arXiv:2103.14146 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
annote = {Comment: Accepted by ICCV2021. 18 pages, 15 figures, project page: https://cvpaperchallenge.github.io/Describing-and-Localizing-Multiple-Change-with-Transformers/},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/6GLDC5C7/Qiu et al. - 2021 - Describing and Localizing Multiple Changes with Tr.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/4ZUPCEKT/2103.html:text/html}
}
@misc{lahoud_3d_2022,
title = {{3D} {Vision} with {Transformers}: {A} {Survey}},
shorttitle = {{3D} {Vision} with {Transformers}},
url = {http://arxiv.org/abs/2208.04309},
doi = {10.48550/arXiv.2208.04309},
abstract = {The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its ability to learn long-range dependencies. This replacement was proven to be successful in numerous tasks, in which several state-of-the-art methods rely on transformers for better learning. In computer vision, the 3D field has also witnessed an increase in employing the transformer for 3D convolution neural networks and multi-layer perceptron networks. Although a number of surveys have focused on transformers in vision in general, 3D vision requires special attention due to the difference in data representation and processing when compared to 2D vision. In this work, we present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks, including classification, segmentation, detection, completion, pose estimation, and others. We discuss transformer design in 3D vision, which allows it to process data with various 3D representations. For each application, we highlight key properties and contributions of proposed transformer-based methods. To assess the competitiveness of these methods, we compare their performance to common non-transformer methods on 12 3D benchmarks. We conclude the survey by discussing different open directions and challenges for transformers in 3D vision. In addition to the presented papers, we aim to frequently update the latest relevant papers along with their corresponding implementations at: https://github.com/lahoud/3d-vision-transformers.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
month = aug,
year = {2022},
note = {arXiv:2208.04309 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/AN3SNSVC/Lahoud et al. - 2022 - 3D Vision with Transformers A Survey.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/6BXWCFI5/2208.html:text/html}
}
@misc{noauthor_weights_nodate,
title = {Weights \& {Biases} {Developer} tools for {ML}},
url = {https://wandb.ai/site/, http://wandb.ai/site},
abstract = {WandB is a central dashboard to keep track of your hyperparameters, system metrics, and predictions so you can compare models live, and share your findings.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/GRIMYX6J/site.html:text/html}
}
@article{dong_ellipse_2021,
title = {Ellipse {R}-{CNN}: {Learning} to {Infer} {Elliptical} {Object} from {Clustering} and {Occlusion}},
volume = {30},
issn = {1057-7149, 1941-0042},
shorttitle = {Ellipse {R}-{CNN}},
url = {http://arxiv.org/abs/2001.11584},
doi = {10.1109/TIP.2021.3050673},
abstract = {Images of heavily occluded objects in cluttered scenes, such as fruit clusters in trees, are hard to segment. To further retrieve the 3D size and 6D pose of each individual object in such cases, bounding boxes are not reliable from multiple views since only a little portion of the object's geometry is captured. We introduce the first CNN-based ellipse detector, called Ellipse R-CNN, to represent and infer occluded objects as ellipses. We first propose a robust and compact ellipse regression based on the Mask R-CNN architecture for elliptical object detection. Our method can infer the parameters of multiple elliptical objects even they are occluded by other neighboring objects. For better occlusion handling, we exploit refined feature regions for the regression stage, and integrate the U-Net structure for learning different occlusion patterns to compute the final detection score. The correctness of ellipse regression is validated through experiments performed on synthetic data of clustered ellipses. We further quantitatively and qualitatively demonstrate that our approach outperforms the state-of-the-art model (i.e., Mask R-CNN followed by ellipse fitting) and its three variants on both synthetic and real datasets of occluded and clustered elliptical objects.},
urldate = {2023-01-17},
journal = {IEEE Transactions on Image Processing},
author = {Dong, Wenbo and Roy, Pravakar and Peng, Cheng and Isler, Volkan},
year = {2021},
note = {arXiv:2001.11584 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics},
pages = {2193--2206},
annote = {Comment: 18 pages, 20 figures, 7 tables},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/QERXUH24/Dong et al. - 2021 - Ellipse R-CNN Learning to Infer Elliptical Object.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/KNUA7S3S/2001.html:text/html}
}
@misc{haven_hdris_nodate,
title = {{HDRIs}},
url = {https://polyhaven.com/hdris/},
abstract = {Hundreds of free HDRI environments, ready to use for any purpose.},
language = {en},
urldate = {2023-01-17},
journal = {Poly Haven},
author = {Haven, Poly}
}
@misc{zhang_dino_2022,
title = {{DINO}: {DETR} with {Improved} {DeNoising} {Anchor} {Boxes} for {End}-to-{End} {Object} {Detection}},
shorttitle = {{DINO}},
url = {http://arxiv.org/abs/2203.03605},
doi = {10.48550/arXiv.2203.03605},
abstract = {We present DINO ({\textbackslash}textbf\{D\}ETR with {\textbackslash}textbf\{I\}mproved de{\textbackslash}textbf\{N\}oising anch{\textbackslash}textbf\{O\}r boxes), a state-of-the-art end-to-end object detector. \% in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrastive way for denoising training, a mixed query selection method for anchor initialization, and a look forward twice scheme for box prediction. DINO achieves \$49.4\$AP in \$12\$ epochs and \$51.3\$AP in \$24\$ epochs on COCO with a ResNet-50 backbone and multi-scale features, yielding a significant improvement of \${\textbackslash}textbf\{+6.0\}\${\textbackslash}textbf\{AP\} and \${\textbackslash}textbf\{+2.7\}\${\textbackslash}textbf\{AP\}, respectively, compared to DN-DETR, the previous best DETR-like model. DINO scales well in both model size and data size. Without bells and whistles, after pre-training on the Objects365 dataset with a SwinL backbone, DINO obtains the best results on both COCO {\textbackslash}texttt\{val2017\} (\${\textbackslash}textbf\{63.2\}\${\textbackslash}textbf\{AP\}) and {\textbackslash}texttt\{test-dev\} ({\textbackslash}textbf\{\${\textbackslash}textbf\{63.3\}\$AP\}). Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results. Our code will be available at {\textbackslash}url\{https://github.com/IDEACVR/DINO\}.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Zhang, Hao and Li, Feng and Liu, Shilong and Zhang, Lei and Su, Hang and Zhu, Jun and Ni, Lionel M. and Shum, Heung-Yeung},
month = jul,
year = {2022},
note = {arXiv:2203.03605 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/NFL7ASJI/Zhang et al. - 2022 - DINO DETR with Improved DeNoising Anchor Boxes fo.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/IJEI9W7E/2203.html:text/html}
}