Compare commits

..

No commits in common. "biblio" and "master" have entirely different histories.

58 changed files with 1983 additions and 10444 deletions

View file

@ -5,8 +5,11 @@ root = true
[*]
indent_style = space
indent_size = 2
indent_size = 4
end_of_line = lf
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
[*.{json,toml,yaml,yml}]
indent_size = 2

1
.envrc
View file

@ -1 +0,0 @@
use flake

30
.gitattributes vendored Normal file
View file

@ -0,0 +1,30 @@
# https://github.com/alexkaratarakis/gitattributes/blob/master/Python.gitattributes
# Basic .gitattributes for a python repo.
# Source files
# ============
*.pxd text diff=python
*.py text diff=python
*.py3 text diff=python
*.pyw text diff=python
*.pyx text diff=python
*.pyz text diff=python
*.pyi text diff=python
# Binary files
# ============
*.db binary
*.p binary
*.pkl binary
*.pickle binary
*.pyc binary export-ignore
*.pyo binary export-ignore
*.pyd binary
# Jupyter notebook
*.ipynb text
# Note: .db, .p, and .pkl files are associated
# with the python modules ``pickle``, ``dbm.*``,
# ``shelve``, ``marshal``, ``anydbm``, & ``bsddb``
# (among others).

467
.gitignore vendored
View file

@ -1,304 +1,167 @@
.direnv
dataset_*
lightning_logs
# https://github.com/github/gitignore/blob/main/TeX.gitignore
## Core latex/pdflatex auxiliary files:
*.aux
*.lof
*.jpg
*.png
# https://github.com/github/gitignore/blob/main/Python.gitignore
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
*.lot
*.fls
*.out
*.toc
*.fmt
*.fot
*.cb
*.cb2
.*.lb
## Intermediate documents:
*.dvi
*.xdv
*-converted-to.*
# these rules might exclude image files for figures etc.
# *.ps
# *.eps
# *.pdf
## Generated if empty string is given at "Please type another file name for output:"
.pdf
## Bibliography auxiliary files (bibtex/biblatex/biber):
*.bbl
*.bcf
*.blg
*-blx.aux
*-blx.bib
*.run.xml
## Build tool auxiliary files:
*.fdb_latexmk
*.synctex
*.synctex(busy)
*.synctex.gz
*.synctex.gz(busy)
*.pdfsync
## Build tool directories for auxiliary files
# latexrun
latex.out/
## Auxiliary and intermediate files from other packages:
# algorithms
*.alg
*.loa
# achemso
acs-*.bib
# amsthm
*.thm
# beamer
*.nav
*.pre
*.snm
*.vrb
# changes
*.soc
# comment
*.cut
# cprotect
*.cpt
# elsarticle (documentclass of Elsevier journals)
*.spl
# endnotes
*.ent
# fixme
*.lox
# feynmf/feynmp
*.mf
*.mp
*.t[1-9]
*.t[1-9][0-9]
*.tfm
#(r)(e)ledmac/(r)(e)ledpar
*.end
*.?end
*.[1-9]
*.[1-9][0-9]
*.[1-9][0-9][0-9]
*.[1-9]R
*.[1-9][0-9]R
*.[1-9][0-9][0-9]R
*.eledsec[1-9]
*.eledsec[1-9]R
*.eledsec[1-9][0-9]
*.eledsec[1-9][0-9]R
*.eledsec[1-9][0-9][0-9]
*.eledsec[1-9][0-9][0-9]R
# glossaries
*.acn
*.acr
*.glg
*.glo
*.gls
*.glsdefs
*.lzo
*.lzs
*.slg
*.slo
*.sls
# uncomment this for glossaries-extra (will ignore makeindex's style files!)
# *.ist
# gnuplot
*.gnuplot
*.table
# gnuplottex
*-gnuplottex-*
# gregoriotex
*.gaux
*.glog
*.gtex
# htlatex
*.4ct
*.4tc
*.idv
*.lg
*.trc
*.xref
# hyperref
*.brf
# knitr
*-concordance.tex
# TODO Uncomment the next line if you use knitr and want to ignore its generated tikz files
# *.tikz
*-tikzDictionary
# listings
*.lol
# luatexja-ruby
*.ltjruby
# makeidx
*.idx
*.ilg
*.ind
# minitoc
*.maf
*.mlf
*.mlt
*.mtc[0-9]*
*.slf[0-9]*
*.slt[0-9]*
*.stc[0-9]*
# minted
_minted*
*.pyg
# morewrites
*.mw
# newpax
*.newpax
# nomencl
*.nlg
*.nlo
*.nls
# pax
*.pax
# pdfpcnotes
*.pdfpc
# sagetex
*.sagetex.sage
*.sagetex.py
*.sagetex.scmd
# scrwfile
*.wrt
# svg
svg-inkscape/
# sympy
*.sout
*.sympy
sympy-plots-for-*.tex/
# pdfcomment
*.upa
*.upb
# pythontex
*.pytxcode
pythontex-files-*/
# tcolorbox
*.listing
# thmtools
*.loe
# TikZ & PGF
*.dpth
*.md5
*.auxlock
# titletoc
*.ptc
# todonotes
*.tdo
# vhistory
*.hst
*.ver
# easy-todo
*.lod
# xcolor
*.xcp
# xmpincl
*.xmpi
# xindy
*.xdy
# xypic precompiled matrices and outlines
*.xyc
*.xyd
# endfloat
*.ttt
*.fff
# Latexian
TSWLatexianTemp*
## Editors:
# WinEdt
*.bak
*.sav
# Texpad
.texpadtmp
# LyX
*.lyx~
# Kile
*.backup
# gummi
.*.swp
# KBibTeX
*~[0-9]*
# TeXnicCenter
*.tps
# auto folder when using emacs and auctex
./auto/*
*.el
# expex forward references with \gathertags
*-tags.tex
# standalone packages
*.sta
# Makeindex log files
*.lpz
# xwatermark package
*.xwm
# REVTeX puts footnotes in the bibliography by default, unless the nofootinbib
# option is specified. Footnotes are the stored in a file with suffix Notes.bib.
# Uncomment the next line to have this generated file ignored.
#*Notes.bib
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

View file

@ -1,5 +0,0 @@
{
"recommendations": [
"james-yu.latex-workshop"
]
}

36
.vscode/launch.json vendored Normal file
View file

@ -0,0 +1,36 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${workspaceFolder}/src/main.py",
// "program": "${workspaceFolder}/src/spheres.py",
// "program": "${workspaceFolder}/src/datamodule.py",
"args": [
// "fit",
"predict",
// "--ckpt_path",
// "${workspaceFolder}/lightning_logs/version_264/checkpoints/epoch=9-st&ep=1000.ckpt",
"--data.num_workers",
"1",
"--trainer.benchmark",
"false",
"--trainer.num_sanity_val_steps",
"0",
"--data.persistent_workers",
"false",
"--data.batch_size",
"1",
"--trainer.val_check_interval",
"1"
],
"console": "integratedTerminal",
"justMyCode": false
}
]
}

31
.vscode/settings.json vendored
View file

@ -1,6 +1,27 @@
{
"explorer.excludeGitIgnore": true,
"latex-workshop.latex.recipe.default": "latexmk (lualatex)",
"gitlens.codeLens.authors.enabled": false,
"gitlens.codeLens.recentChange.enabled": false,
}
"python.analysis.typeCheckingMode": "off",
"python.formatting.provider": "black",
"editor.formatOnSave": true,
"python.linting.enabled": true,
"python.linting.lintOnSave": true,
"python.linting.flake8Enabled": true,
"python.linting.mypyEnabled": true,
"python.linting.banditEnabled": true,
"python.languageServer": "Pylance",
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.codeActionsOnSave": {
"source.organizeImports": true
}
},
"files.exclude": {
"**/.git": true,
"**/.svn": true,
"**/.hg": true,
"**/CVS": true,
"**/.DS_Store": true,
"**/Thumbs.db": true,
"**/__pycache__": true,
"**/.mypy_cache": true,
},
}

21
LICENSE Normal file
View file

@ -0,0 +1,21 @@
MIT License
Copyright (c) 2022 Laurent Fainsin
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

50
README.md Normal file
View file

@ -0,0 +1,50 @@
# Neural sphere detection in images for lighting calibration
# Installation
Clone the repository:
```bash
git clone https://github.com/Laurent2916/REVA-DETR.git
cd REVA-DETR/
```
Install and activate the environment:
```bash
micromamba install -f environment.yml
micromamba activate qcav
```
## Usage
Everything is managed thanks to [Lightning CLI](https://lightning.ai/docs/pytorch/latest/api/lightning.pytorch.cli.LightningCLI.html#lightning.pytorch.cli.LightningCLI) !
Start a training:
```bash
python src/main.py fit
```
Start inference on images:
```bash
python src/main.py predict --ckpt_path <path_to_checkpoint>
```
Quick and dirty way to export to `.onnx`:
```python
>>> from src.module import DETR
>>> checkpoint = "<path_to_checkpoint>"
>>> model = DETR.load_from_checkpoint(checkpoint)
>>> model.net.save_pretrained("hugginface_checkpoint")
```
```bash
python -m transformers.onnx --model=hugginface_checkpoint onnx_export/
```
## License
Distributed under the [MIT](https://choosealicense.com/licenses/mit/) license. \
See [`LICENSE`](https://github.com/Laurent2916/REVA-DETR/blob/master/LICENSE) for more information.
## Contact
Laurent Fainsin _[loʁɑ̃ fɛ̃zɛ̃]_ \
\<[laurent@fainsin.bzh](mailto:laurent@fainsin.bzh)\>

Binary file not shown.

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 196 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 251 KiB

Binary file not shown.

File diff suppressed because one or more lines are too long

Binary file not shown.

Before

Width:  |  Height:  |  Size: 668 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.7 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 57 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 63 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 181 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 124 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.6 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 346 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 86 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.7 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.4 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 406 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.5 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.3 MiB

35
environment.yml Normal file
View file

@ -0,0 +1,35 @@
name: qcav
channels:
- nodefaults
- pytorch
- nvidia
- conda-forge
dependencies:
# basic python
- rich
# science
- numpy
- scipy
- opencv
# pytorch
- pytorch
- torchvision
- torchaudio
- pytorch-cuda
- lightning # currently broken, install manually with pip
# deep learning libraries
- transformers
- datasets
- timm
# dev tools
- ruff
- black
- isort
- mypy
- pre-commit
# logging
- tensorboard
# visualization
- matplotlib

View file

@ -1,43 +0,0 @@
{
"nodes": {
"flake-utils": {
"locked": {
"lastModified": 1667395993,
"narHash": "sha256-nuEHfE/LcWyuSWnS8t12N1wc105Qtau+/OdUAjtQ0rA=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "5aed5285a952e0b949eb3ba02c12fa4fcfef535f",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"nixpkgs": {
"locked": {
"lastModified": 1673796341,
"narHash": "sha256-1kZi9OkukpNmOaPY7S5/+SlCDOuYnP3HkXHvNDyLQcc=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "6dccdc458512abce8d19f74195bb20fdb067df50",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixos-unstable",
"repo": "nixpkgs",
"type": "github"
}
},
"root": {
"inputs": {
"flake-utils": "flake-utils",
"nixpkgs": "nixpkgs"
}
}
},
"root": "root",
"version": 7
}

View file

@ -1,17 +0,0 @@
{
description = "Biblio proj long";
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
flake-utils.url = "github:numtide/flake-utils";
};
outputs = { self, nixpkgs, flake-utils }:
flake-utils.lib.eachDefaultSystem (system:
let pkgs = nixpkgs.legacyPackages.${system};
in {
devShell = pkgs.mkShell {
buildInputs = with pkgs; [ texlive.combined.scheme-full ];
};
});
}

23
pyproject.toml Normal file
View file

@ -0,0 +1,23 @@
[tool.ruff]
line-length = 120
select = ["E", "F", "I"]
[tool.black]
exclude = '''
/(
\.git
\.venv
)/
'''
include = '\.pyi?$'
line-length = 120
target-version = ["py310"]
[tool.isort]
multi_line_output = 3
profile = "black"
[tool.mypy]
python_version = "3.10"
warn_return_any = true
warn_unused_configs = true

274
src/datamodule/DETR.py Normal file
View file

@ -0,0 +1,274 @@
import datasets
import torch
from lightning.pytorch import LightningDataModule
from lightning.pytorch.utilities import CombinedLoader
from torch.utils.data import DataLoader
from torchvision.transforms import AugMix
from transformers import DetrFeatureExtractor
class DETRDataModule(LightningDataModule):
"""PyTorch Lightning data module for DETR."""
def __init__(
self,
num_workers: int = 8,
batch_size: int = 6,
prefetch_factor: int = 2,
model_name: str = "facebook/detr-resnet-50",
persistent_workers: bool = True,
):
"""Constructor.
Args:
num_workers (int, optional): Number of workers.
batch_size (int, optional): Batch size.
prefetch_factor (int, optional): Prefetch factor.
val_split (float, optional): Validation split.
model_name (str, optional): Model name.
"""
super().__init__()
# save params
self.num_workers = num_workers
self.batch_size = batch_size
self.prefetch_factor = prefetch_factor
self.persistent_workers = persistent_workers
# get feature extractor
self.feature_extractor = DetrFeatureExtractor.from_pretrained(model_name)
def prepare_data(self):
"""Download data and prepare for training."""
# load datasets
self.illumination = datasets.load_dataset("src/dataset/multi_illumination.py", split="train")
self.render = datasets.load_dataset("src/dataset/synthetic.py", split="train")
self.real = datasets.load_dataset("src/dataset/antoine_laurent.py", split="train")
# split datasets
self.illumination = self.illumination.train_test_split(test_size=0.01)
self.render = self.render.train_test_split(test_size=0.01)
self.real = self.real.train_test_split(test_size=0.1)
# print some info
print(f"illumination: {self.illumination}")
print(f"render: {self.render}")
print(f"real: {self.real}")
# other datasets
self.predict_ds = datasets.load_dataset("src/dataset/predict.py", split="train")
# define AugMix transform
self.mix = AugMix()
# useful mappings
self.labels = self.real["test"].features["objects"][0]["category_id"].names
self.id2label = {k: v for k, v in enumerate(self.labels)}
self.label2id = {v: k for k, v in enumerate(self.labels)}
def train_transform(self, batch):
"""Training transform.
Args:
batch (dict): Batch precollated by HuggingFace datasets.
Structure is similar to the following:
{
"image": list[PIL.Image],
"image_id": list[int],
"objects": [
{
"bbox": list[float, 4],
"category_id": int,
}
]
}
Returns:
dict: Augmented and processed batch.
Structure is similar to the following:
{
"pixel_values": TensorType["batch", "canal", "width", "height"],
"pixel_mask": TensorType["batch", 1200, 1200],
"labels": List[Dict[str, TensorType["batch", "num_boxes", "num_labels"]]],
}
"""
# extract images, ids and objects from batch
images = batch["image"]
ids = batch["image_id"]
objects = batch["objects"]
# apply AugMix transform
images_mixed = [self.mix(image) for image in images]
# build targets for feature extractor
targets = [
{
"image_id": id,
"annotations": object,
}
for id, object in zip(ids, objects)
]
# process images and targets with feature extractor for DETR
processed = self.feature_extractor(
images=images_mixed,
annotations=targets,
return_tensors="pt",
)
return processed
def val_transform(self, batch):
"""Validation transform.
Just like Training transform, but without AugMix.
"""
# extract images, ids and objects from batch
images = batch["image"]
ids = batch["image_id"]
objects = batch["objects"]
# build targets for feature extractor
targets = [
{
"image_id": id,
"annotations": object,
}
for id, object in zip(ids, objects)
]
processed = self.feature_extractor(
images=images,
annotations=targets,
return_tensors="pt",
)
return processed
def predict_transform(self, batch):
"""Prediction transform.
Just like val_transform, but with images.
"""
processed = self.val_transform(batch)
# add images to dict
processed["images"] = batch["image"]
return processed
def collate_fn(self, examples):
"""Collate function.
Convert list of dicts to dict of Tensors.
"""
return {
"pixel_values": torch.stack([data["pixel_values"] for data in examples]),
"pixel_mask": torch.stack([data["pixel_mask"] for data in examples]),
"labels": [data["labels"] for data in examples],
}
def collate_fn_predict(self, examples):
"""Collate function.
Convert list of dicts to dict of Tensors.
"""
return {
"pixel_values": torch.stack([data["pixel_values"] for data in examples]),
"pixel_mask": torch.stack([data["pixel_mask"] for data in examples]),
"labels": [data["labels"] for data in examples],
"images": [data["images"] for data in examples],
}
def train_dataloader(self):
"""Training dataloader."""
loaders = {
"illumination": DataLoader(
self.illumination["train"].with_transform(self.val_transform),
shuffle=True,
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
"render": DataLoader(
self.render["train"].with_transform(self.val_transform),
shuffle=True,
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
"real": DataLoader(
self.real["train"].with_transform(self.val_transform),
shuffle=True,
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
}
return CombinedLoader(loaders, mode="max_size_cycle")
def val_dataloader(self):
"""Validation dataloader."""
loaders = {
"illumination": DataLoader(
self.illumination["test"].with_transform(self.val_transform),
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
"render": DataLoader(
self.render["test"].with_transform(self.val_transform),
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
"real": DataLoader(
self.real["test"].with_transform(self.val_transform),
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
}
return CombinedLoader(loaders, mode="max_size_cycle")
def predict_dataloader(self):
"""Prediction dataloader."""
return DataLoader(
self.predict_ds.with_transform(self.predict_transform),
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn_predict,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
)
if __name__ == "__main__":
# load data
dm = DETRDataModule()
dm.prepare_data()
ds = dm.train_dataloader()
for batch in ds:
print(batch)

View file

@ -0,0 +1,314 @@
import datasets
import torch
from lightning.pytorch import LightningDataModule
from lightning.pytorch.utilities import CombinedLoader
from torch.utils.data import DataLoader
from torchvision.transforms import AugMix
from transformers import DetrFeatureExtractor
class FasterRCNNDataModule(LightningDataModule):
"""PyTorch Lightning data module for Faster RCNN."""
def __init__(
self,
num_workers: int = 8,
batch_size: int = 5,
prefetch_factor: int = 2,
model_name: str = "facebook/detr-resnet-50",
persistent_workers: bool = True,
):
"""Constructor.
Args:
num_workers (int, optional): Number of workers.
batch_size (int, optional): Batch size.
prefetch_factor (int, optional): Prefetch factor.
val_split (float, optional): Validation split.
model_name (str, optional): Model name.
"""
super().__init__()
# save params
self.num_workers = num_workers
self.batch_size = batch_size
self.prefetch_factor = prefetch_factor
self.persistent_workers = persistent_workers
# get feature extractor
self.feature_extractor = DetrFeatureExtractor.from_pretrained(model_name)
def prepare_data(self):
"""Download data and prepare for training."""
# load datasets
self.illumination = datasets.load_dataset("src/dataset/multi_illumination.py", split="train")
self.render = datasets.load_dataset("src/dataset/synthetic.py", split="train")
self.real = datasets.load_dataset("src/dataset/antoine_laurent.py", split="train")
# split datasets
self.illumination = self.illumination.train_test_split(test_size=0.01)
self.render = self.render.train_test_split(test_size=0.01)
self.real = self.real.train_test_split(test_size=0.1)
# print some info
print(f"illumination: {self.illumination}")
print(f"render: {self.render}")
print(f"real: {self.real}")
# other datasets
self.predict_ds = datasets.load_dataset("src/dataset/predict.py", split="train")
# define AugMix transform
self.mix = AugMix()
# useful mappings
self.labels = self.real["test"].features["objects"][0]["category_id"].names
self.id2label = {k: v for k, v in enumerate(self.labels)}
self.label2id = {v: k for k, v in enumerate(self.labels)}
def train_transform(self, batch):
"""Training transform.
Args:
batch (dict): Batch precollated by HuggingFace datasets.
Structure is similar to the following:
{
"image": list[PIL.Image],
"image_id": list[int],
"objects": [
{
"bbox": list[float, 4],
"category_id": int,
}
]
}
Returns:
dict: Augmented and processed batch.
Structure is similar to the following:
{
"pixel_values": TensorType["batch", "canal", "width", "height"],
"pixel_mask": TensorType["batch", 1200, 1200],
"labels": List[Dict[str, TensorType["batch", "num_boxes", "num_labels"]]],
}
"""
# extract images, ids and objects from batch
images = batch["image"]
ids = batch["image_id"]
objects = batch["objects"]
# apply AugMix transform
images_mixed = [self.mix(image) for image in images]
# build targets for feature extractor
targets = [
{
"image_id": id,
"annotations": object,
}
for id, object in zip(ids, objects)
]
# process images and targets with feature extractor for DETR
processed = self.feature_extractor(
images=images_mixed,
annotations=targets,
return_tensors="pt",
)
for label in processed["labels"]:
# renamed "class_labels" to "labels"
# add 1 since 0 is reserved for background
label["labels"] = label["class_labels"] + 1
del label["class_labels"]
# format boxes from [xc, yc, w, h] to [x1, y1, x2, y2]
width_height = label["boxes"][:, 2:]
label["boxes"][:, :2] = label["boxes"][:, :2] - width_height / 2
label["boxes"][:, 2:] = label["boxes"][:, :2] + width_height / 2
# convert from normalized to absolute coordinates
label["boxes"][:, 0] *= label["size"][1]
label["boxes"][:, 1] *= label["size"][0]
label["boxes"][:, 2] *= label["size"][1]
label["boxes"][:, 3] *= label["size"][0]
return processed
def val_transform(self, batch):
"""Validation transform.
Just like Training transform, but without AugMix.
"""
# extract images, ids and objects from batch
images = batch["image"]
ids = batch["image_id"]
objects = batch["objects"]
# build targets for feature extractor
targets = [
{
"image_id": id,
"annotations": object,
}
for id, object in zip(ids, objects)
]
processed = self.feature_extractor(
images=images,
annotations=targets,
return_tensors="pt",
)
for label in processed["labels"]:
# renamed "class_labels" to "labels"
# add 1 since 0 is reserved for background
label["labels"] = label["class_labels"] + 1
del label["class_labels"]
# format boxes from [xcenter, ycenter, w, h] to [x1, y1, x2, y2]
center_x, center_y, width, height = label["boxes"].unbind(-1)
label["boxes"] = torch.stack(
# top left x, top left y, bottom right x, bottom right y
[
(center_x - 0.5 * width),
(center_y - 0.5 * height),
(center_x + 0.5 * width),
(center_y + 0.5 * height),
],
dim=-1,
)
# convert from normalized to absolute coordinates
label["boxes"][:, 0] *= label["size"][1]
label["boxes"][:, 1] *= label["size"][0]
label["boxes"][:, 2] *= label["size"][1]
label["boxes"][:, 3] *= label["size"][0]
return processed
def predict_transform(self, batch):
"""Prediction transform.
Just like val_transform, but with images.
"""
processed = self.val_transform(batch)
# add images to dict
processed["images"] = batch["image"]
return processed
def collate_fn(self, examples):
"""Collate function.
Convert list of dicts to dict of Tensors.
"""
return {
"pixel_values": torch.stack([data["pixel_values"] for data in examples]),
"labels": [data["labels"] for data in examples],
}
def collate_fn_predict(self, examples):
"""Collate function.
Convert list of dicts to dict of Tensors.
"""
return {
"pixel_values": torch.stack([data["pixel_values"] for data in examples]),
"labels": [data["labels"] for data in examples],
"images": [data["images"] for data in examples],
}
def train_dataloader(self):
"""Training dataloader."""
loaders = {
"illumination": DataLoader(
self.illumination["train"].with_transform(self.val_transform),
shuffle=True,
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
"render": DataLoader(
self.render["train"].with_transform(self.val_transform),
shuffle=True,
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
"real": DataLoader(
self.real["train"].with_transform(self.val_transform),
shuffle=True,
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
}
return CombinedLoader(loaders, mode="max_size_cycle")
def val_dataloader(self):
"""Validation dataloader."""
loaders = {
"illumination": DataLoader(
self.illumination["test"].with_transform(self.val_transform),
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
"render": DataLoader(
self.render["test"].with_transform(self.val_transform),
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
"real": DataLoader(
self.real["test"].with_transform(self.val_transform),
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
),
}
return CombinedLoader(loaders, mode="max_size_cycle")
def predict_dataloader(self):
"""Prediction dataloader."""
return DataLoader(
self.predict_ds.with_transform(self.predict_transform),
pin_memory=True,
persistent_workers=self.persistent_workers,
collate_fn=self.collate_fn_predict,
batch_size=self.batch_size,
num_workers=self.num_workers,
prefetch_factor=self.prefetch_factor,
)
if __name__ == "__main__":
# load data
dm = FasterRCNNDataModule()
dm.prepare_data()
ds = dm.train_dataloader()
for batch in ds:
print(batch)

View file

@ -0,0 +1,2 @@
from .DETR import DETRDataModule
from .FasterRCNN import FasterRCNNDataModule

View file

@ -0,0 +1,226 @@
import json
import pathlib
import cv2
import datasets
import numpy as np
prefix = "/data/local-files/?d=spheres/"
dataset_path = pathlib.Path("./dataset_antoine_laurent/")
annotation_path = dataset_path / "annotations.json" # from labelstudio
_VERSION = "2.0.0"
_DESCRIPTION = ""
_HOMEPAGE = ""
_LICENSE = ""
_NAMES = [
"Matte",
"Shiny",
"Chrome",
]
class SphereAntoineLaurent(datasets.GeneratorBasedBuilder):
def _info(self):
return datasets.DatasetInfo(
description=_DESCRIPTION,
version=_VERSION,
homepage=_HOMEPAGE,
license=_LICENSE,
features=datasets.Features(
{
"image_id": datasets.Value("int64"),
"image": datasets.Image(),
"width": datasets.Value("int32"),
"height": datasets.Value("int32"),
"objects": [
{
"category_id": datasets.ClassLabel(names=_NAMES),
"image_id": datasets.Value("int64"),
"id": datasets.Value("string"),
"area": datasets.Value("float32"),
"bbox": datasets.Sequence(datasets.Value("float32"), length=4),
"iscrowd": datasets.Value("bool"),
}
],
}
),
)
def _split_generators(self, dl_manager):
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
gen_kwargs={
"dataset_path": dataset_path,
"annotation_path": annotation_path,
},
),
]
def _generate_examples(self, dataset_path: pathlib.Path, annotation_path: pathlib.Path):
"""Generate images and labels for splits."""
with open(annotation_path, "r") as f:
tasks = json.load(f)
index = 0
for task in tasks:
image_id = task["id"]
image_name = task["data"]["img"]
image_name = image_name[len(prefix) :]
image_name = pathlib.Path(image_name)
# check image_name exists
assert (dataset_path / image_name).is_file()
# create annotation groups
annotation_groups: dict[str, list[dict]] = {}
for annotation in task["annotations"][0]["result"]:
id = annotation["id"]
if "parentID" in annotation:
parent_id = annotation["parentID"]
if parent_id not in annotation_groups:
annotation_groups[parent_id] = []
annotation_groups[parent_id].append(annotation)
else:
if id not in annotation_groups:
annotation_groups[id] = []
annotation_groups[id].append(annotation)
# check all annotations have same width and height
width = task["annotations"][0]["result"][0]["original_width"]
height = task["annotations"][0]["result"][0]["original_height"]
for annotation in task["annotations"][0]["result"]:
assert annotation["original_width"] == width
assert annotation["original_height"] == height
# check all childs of group have same label
labels = {}
for group_id, annotations in annotation_groups.items():
label = annotations[0]["value"]["keypointlabels"][0]
for annotation in annotations:
assert annotation["value"]["keypointlabels"][0] == label
# convert labels
if label == "White":
label = "Matte"
elif label == "Black":
label = "Shiny"
elif label == "Red":
label = "Shiny"
labels[group_id] = label
# compute bboxes
bboxes = {}
for group_id, annotations in annotation_groups.items():
# convert points to numpy array
points = np.array(
[
[
annotation["value"]["x"] / 100 * width,
annotation["value"]["y"] / 100 * height,
]
for annotation in annotations
],
dtype=np.float32,
)
# fit ellipse from points
ellipse = cv2.fitEllipse(points)
# extract ellipse parameters
x_C = ellipse[0][0]
y_C = ellipse[0][1]
a = ellipse[1][0] / 2
b = ellipse[1][1] / 2
theta = ellipse[2] * np.pi / 180
# sample ellipse points
t = np.linspace(0, 2 * np.pi, 100)
x = x_C + a * np.cos(t) * np.cos(theta) - b * np.sin(t) * np.sin(theta)
y = y_C + a * np.cos(t) * np.sin(theta) + b * np.sin(t) * np.cos(theta)
# get bounding box
xmin = np.min(x)
xmax = np.max(x)
ymin = np.min(y)
ymax = np.max(y)
w = xmax - xmin
h = ymax - ymin
# bboxe to coco format
# https://github.com/huggingface/transformers/blob/main/src/transformers/models/detr/image_processing_detr.py#L295
bboxes[group_id] = [xmin, ymin, w, h]
# compute areas
areas = {group_id: w * h for group_id, (_, _, w, h) in bboxes.items()}
# generate data
data = {
"image_id": image_id,
"image": str(dataset_path / image_name),
"width": width,
"height": height,
"objects": [
{
# "category_id": "White",
"category_id": labels[group_id],
"image_id": image_id,
"id": group_id,
"area": areas[group_id],
"bbox": bboxes[group_id],
"iscrowd": False,
}
for group_id in annotation_groups
],
}
yield index, data
index += 1
if __name__ == "__main__":
from PIL import ImageDraw
# load dataset
dataset = datasets.load_dataset("src/spheres.py", split="train")
print("dataset loaded")
labels = dataset.features["objects"][0]["category_id"].names
id2label = {k: v for k, v in enumerate(labels)}
label2id = {v: k for k, v in enumerate(labels)}
print(f"labels: {labels}")
print(f"id2label: {id2label}")
print(f"label2id: {label2id}")
print()
idx = 0
while True:
image = dataset[idx]["image"]
if "DSC_4234" in image.filename:
break
idx += 1
print(f"image path: {image.filename}")
print(f"data: {dataset[idx]}")
draw = ImageDraw.Draw(image)
for obj in dataset[idx]["objects"]:
bbox = (
obj["bbox"][0],
obj["bbox"][1],
obj["bbox"][0] + obj["bbox"][2],
obj["bbox"][1] + obj["bbox"][3],
)
draw.rectangle(bbox, outline="red", width=3)
draw.text(bbox[:2], text=id2label[obj["category_id"]], fill="black")
# save image
image.save("example_antoine_laurent.jpg")

View file

@ -0,0 +1,164 @@
import json
import pathlib
import datasets
dataset_path_train = pathlib.Path("./dataset_illumination/")
_VERSION = "2.0.0"
_DESCRIPTION = ""
_HOMEPAGE = ""
_LICENSE = ""
_NAMES = [
"Matte",
"Shiny",
"Chrome",
]
class SphereIllumination(datasets.GeneratorBasedBuilder):
def _info(self):
return datasets.DatasetInfo(
description=_DESCRIPTION,
version=_VERSION,
homepage=_HOMEPAGE,
license=_LICENSE,
features=datasets.Features(
{
"image_id": datasets.Value("int64"),
"image": datasets.Image(),
"width": datasets.Value("int32"),
"height": datasets.Value("int32"),
"objects": [
{
"category_id": datasets.ClassLabel(names=_NAMES),
"image_id": datasets.Value("int64"),
"id": datasets.Value("string"),
"area": datasets.Value("float32"),
"bbox": datasets.Sequence(datasets.Value("float32"), length=4),
"iscrowd": datasets.Value("bool"),
}
],
}
),
)
def _split_generators(self, dl_manager):
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
gen_kwargs={
"dataset_path": dataset_path_train,
},
),
]
def _generate_examples(self, dataset_path: pathlib.Path):
"""Generate images and labels for splits."""
width = 1500
height = 1000
original_width = 6020
original_height = 4024
# create png iterator
object_index = 0
jpgs = dataset_path.rglob("*.jpg")
for index, jpg in enumerate(jpgs):
# filter out probe images
if "probes" in jpg.parts:
continue
# filter out thumbnails
if "thumb" in jpg.stem:
continue
# open corresponding csv file
json_file = jpg.parent / "meta.json"
# read json
with open(json_file, "r") as f:
meta = json.load(f)
gray = (
(
meta["gray"]["bounding_box"]["x"] / original_width * width,
meta["gray"]["bounding_box"]["y"] / original_height * height,
meta["gray"]["bounding_box"]["w"] / original_width * width,
meta["gray"]["bounding_box"]["h"] / original_height * height,
),
"Matte",
)
chrome = (
(
meta["chrome"]["bounding_box"]["x"] / original_width * width,
meta["chrome"]["bounding_box"]["y"] / original_height * height,
meta["chrome"]["bounding_box"]["w"] / original_width * width,
meta["chrome"]["bounding_box"]["h"] / original_height * height,
),
"Chrome",
)
# generate data
data = {
"image_id": index,
"image": str(jpg),
"width": width,
"height": height,
"objects": [
{
"category_id": category,
"image_id": index,
"id": (object_index := object_index + 1),
"area": bbox[2] * bbox[3],
"bbox": bbox,
"iscrowd": False,
}
for bbox, category in [gray, chrome]
],
}
yield index, data
if __name__ == "__main__":
from PIL import ImageDraw
# load dataset
dataset = datasets.load_dataset("src/spheres_illumination.py", split="train")
dataset = dataset.shuffle()
labels = dataset.features["objects"][0]["category_id"].names
id2label = {k: v for k, v in enumerate(labels)}
label2id = {v: k for k, v in enumerate(labels)}
print(f"labels: {labels}")
print(f"id2label: {id2label}")
print(f"label2id: {label2id}")
print()
for idx in range(10):
image = dataset[idx]["image"]
print(f"image path: {image.filename}")
print(f"data: {dataset[idx]}")
draw = ImageDraw.Draw(image)
for obj in dataset[idx]["objects"]:
bbox = (
obj["bbox"][0],
obj["bbox"][1],
obj["bbox"][0] + obj["bbox"][2],
obj["bbox"][1] + obj["bbox"][3],
)
draw.rectangle(bbox, outline="red", width=3)
draw.text(bbox[:2], text=id2label[obj["category_id"]], fill="black")
# save image
image.save(f"example_illumination_{idx}.jpg")

95
src/dataset/predict.py Normal file
View file

@ -0,0 +1,95 @@
import pathlib
import datasets
dataset_path = pathlib.Path("./dataset_predict/")
_VERSION = "2.0.2"
_DESCRIPTION = ""
_HOMEPAGE = ""
_LICENSE = ""
_NAMES = [
"Matte",
"Shiny",
"Chrome",
]
class SpherePredict(datasets.GeneratorBasedBuilder):
def _info(self):
return datasets.DatasetInfo(
description=_DESCRIPTION,
version=_VERSION,
homepage=_HOMEPAGE,
license=_LICENSE,
features=datasets.Features(
{
"image_id": datasets.Value("int64"),
"image": datasets.Image(),
"objects": [
{
"category_id": datasets.ClassLabel(names=_NAMES),
"image_id": datasets.Value("int64"),
"id": datasets.Value("string"),
"area": datasets.Value("float32"),
"bbox": datasets.Sequence(datasets.Value("float32"), length=4),
"iscrowd": datasets.Value("bool"),
}
],
}
),
)
def _split_generators(self, dl_manager):
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
gen_kwargs={
"dataset_path": dataset_path,
},
)
]
def _generate_examples(self, dataset_path: pathlib.Path):
"""Generate images and labels for splits."""
# create png iterator
jpgs = dataset_path.rglob("*.jpg")
for index, jpg in enumerate(jpgs):
print(index, jpg, 2)
# generate data
data = {
"image_id": index,
"image": str(jpg),
"objects": [],
}
yield index, data
if __name__ == "__main__":
# load dataset
dataset = datasets.load_dataset("src/spheres_predict.py", split="train")
labels = dataset.features["objects"][0]["category_id"].names
id2label = {k: v for k, v in enumerate(labels)}
label2id = {v: k for k, v in enumerate(labels)}
print(f"labels: {labels}")
print(f"id2label: {id2label}")
print(f"label2id: {label2id}")
print()
for idx in range(10):
image = dataset[idx]["image"]
print(f"image path: {image.filename}")
print(f"data: {dataset[idx]}")
# save image
image.save(f"example_predict_{idx}.jpg")

169
src/dataset/synthetic.py Normal file
View file

@ -0,0 +1,169 @@
import pathlib
import datasets
dataset_path = pathlib.Path("./dataset_render/")
_VERSION = "2.0.0"
_DESCRIPTION = ""
_HOMEPAGE = ""
_LICENSE = ""
_NAMES = [
"Matte",
"Shiny",
"Chrome",
]
class SphereSynth(datasets.GeneratorBasedBuilder):
def _info(self):
return datasets.DatasetInfo(
description=_DESCRIPTION,
version=_VERSION,
homepage=_HOMEPAGE,
license=_LICENSE,
features=datasets.Features(
{
"image_id": datasets.Value("int64"),
"image": datasets.Image(),
"width": datasets.Value("int32"),
"height": datasets.Value("int32"),
"objects": [
{
"category_id": datasets.ClassLabel(names=_NAMES),
"image_id": datasets.Value("int64"),
"id": datasets.Value("string"),
"area": datasets.Value("float32"),
"bbox": datasets.Sequence(datasets.Value("float32"), length=4),
"iscrowd": datasets.Value("bool"),
}
],
}
),
)
def _split_generators(self, dl_manager):
return [
datasets.SplitGenerator(
name=datasets.Split.TRAIN,
gen_kwargs={
"dataset_path": dataset_path,
},
),
]
def _generate_examples(self, dataset_path: pathlib.Path):
"""Generate images and labels for splits."""
# create png iterator
width = 1200
height = 675
object_index = 0
pngs = dataset_path.glob("*.png")
for index, png in enumerate(pngs):
# open corresponding csv file
csv = dataset_path / (png.stem + ".csv")
# read csv lines
with open(csv, "r") as f:
lines = f.readlines()
lines = [line.strip().split(",") for line in lines]
lines = [
(
float(line[0]),
1 - float(line[1]),
float(line[2]),
1 - float(line[3]),
line[4].strip(),
)
for line in lines
]
bboxes = [
(
line[0] * width,
line[3] * height,
(line[2] - line[0]) * width,
(line[1] - line[3]) * height,
)
for line in lines
]
categories = []
for line in lines:
category = line[4]
if category == "White":
category = "Matte"
elif category == "Black":
category = "Shiny"
elif category == "Grey":
category = "Matte"
elif category == "Red":
category = "Shiny"
elif category == "Chrome":
category = "Chrome"
elif category == "Cyan":
category = "Shiny"
categories.append(category)
# generate data
data = {
"image_id": index,
"image": str(png),
"width": width,
"height": height,
"objects": [
{
"category_id": category,
"image_id": index,
"id": (object_index := object_index + 1),
"area": bbox[2] * bbox[3],
"bbox": bbox,
"iscrowd": False,
}
for bbox, category in zip(bboxes, categories)
],
}
yield index, data
if __name__ == "__main__":
from PIL import ImageDraw
# load dataset
dataset = datasets.load_dataset("src/spheres_synth.py", split="train")
labels = dataset.features["objects"][0]["category_id"].names
id2label = {k: v for k, v in enumerate(labels)}
label2id = {v: k for k, v in enumerate(labels)}
print(f"labels: {labels}")
print(f"id2label: {id2label}")
print(f"label2id: {label2id}")
print()
for idx in range(10):
image = dataset[idx]["image"]
print(f"image path: {image.filename}")
print(f"data: {dataset[idx]}")
draw = ImageDraw.Draw(image)
for obj in dataset[idx]["objects"]:
bbox = (
obj["bbox"][0],
obj["bbox"][1],
obj["bbox"][0] + obj["bbox"][2],
obj["bbox"][1] + obj["bbox"][3],
)
draw.rectangle(bbox, outline="red", width=3)
draw.text(bbox[:2], text=id2label[obj["category_id"]], fill="black")
# save image
image.save(f"example_synth_{idx}.jpg")

View file

@ -1,124 +0,0 @@
% CVPR 2022 Paper Template
% based on the CVPR template provided by Ming-Ming Cheng (https://github.com/MCG-NKU/CVPR_Template)
% modified and extended by Stefan Roth (stefan.roth@NOSPAMtu-darmstadt.de)
\documentclass[10pt,twocolumn,a4paper]{article}
%%%%%%%%% PAPER TYPE - PLEASE UPDATE FOR FINAL VERSION
%\usepackage[review]{cvpr} % To produce the REVIEW version
\usepackage{cvpr} % To produce the CAMERA-READY version
%\usepackage[pagenumbers]{cvpr} % To force page numbers, e.g. for an arXiv version
% Include other packages here, before hyperref.
\usepackage{graphicx}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{booktabs}
\usepackage[a4paper, hmargin=2cm, vmargin=3cm]{geometry}
% It is strongly recommended to use hyperref, especially for the review version.
% hyperref with option pagebackref eases the reviewers' job.
% Please disable hyperref *only* if you encounter grave issues, e.g. with the
% file validation for the camera-ready version.
%
% If you comment hyperref and then uncomment it, you should delete
% ReviewTempalte.aux before re-running LaTeX.
% (Or just hit 'q' on the first LaTeX run, let it finish, and you
% should be clear).
\usepackage[pagebackref,breaklinks,colorlinks]{hyperref}
% Support for easy cross-referencing
\usepackage[capitalize]{cleveref}
\crefname{section}{Sec.}{Secs.}
\Crefname{section}{Section}{Sections}
\Crefname{table}{Table}{Tables}
\crefname{table}{Tab.}{Tabs.}
\begin{document}
%%%%%%%%% TITLE
\title{Neural sphere detection in images for lighting calibration}
\author{
Laurent Fainsin\\
ENSEEIHT\\
{\tt\small laurent@fainsin.bzh}
}
\maketitle
%%%%%%%%% ABSTRACT
\begin{abstract}
We present a method for the automatic recognition of spherical markers in images using deep learning. The markers are used for precise lighting calibration required for photometric 3D-vision techniques such as RTI or photometric stereo. We use the Mask R-CNN model for instance segmentation, and train it on a dataset of synthetically generated images. We demonstrate that our method can accurately detect markers in real images and that it outperforms traditional methods.
\end{abstract}
\begin{keywords}
Sphere detection, Instance segmentation, Neural network, Mask R-CNN, Lighting calibration.
\end{keywords}
%%%%%%%%% BODY TEXT
\section{Introduction}
\label{sec:intro}
During my 2022 summer internship, as part of my engineering curriculum, I chose a research-oriented technical internship to discover this aspect of computer science engineering. I received an offer from my professor working at the halfway through the year to work with them at the Research Institute in Computer Science of Toulouse for the REVA team. As a research intern in computer vision, I worked on automatic recognition of spherical markers in images using deep learning for precise lighting calibration required for photometric stereo.
\section{Account of the work}
\label{sec:work}
The work mainly consisted of improving a previous method of marker detection, which was not based on a deep learning model, but used only traditional algorithms, it was a very manual process.
\subsection{Model}
For this purpose, many papers were analyzed to get an idea of the state of the art, thus multiple deep learning models were investigated, their performances and flexibility were compared. We decided to use Mask R-CNN, as it is a well-established model with good standard implementation for instance segmentation. As in any deep learning project, most of the work was spent training and fine-tuning hyper-parameters in order to detect the markers as best as possible.
\subsection{Dataset}
The work also consisted of investigating new ways to generate the training data, as there were no datasets available for our specific application. Clean photos used in photometric stereo are unsurprisingly rare. Our final training image set consisted of synthetics images generated via compositing. We used the 2017 COCO unlabelled images dataset, containing 123287 images in which we have embedded spherical markers. These markers originated from photographs of spheres in situ under various illuminations, and from synthetic renders from Blender. We present an example of such a picture in Figure \ref{fig:train}. Combined with various data augmentation transformations, this allowed us to easily obtain an image set of considerable size with the associated ground truth.
\begin{figure}[t]
\centering
%\fbox{\rule{0pt}{2in} \rule{0.9\linewidth}{0pt}}
\includegraphics[width=\linewidth]{image.jpg}
\caption{Example of synthetic data from our dataset: picture from COCO with composited spheres on top}
\label{fig:test}
\end{figure}
\begin{figure}[t]
\centering
%\fbox{\rule{0pt}{2in} \rule{0.9\linewidth}{0pt}}
\includegraphics[width=\linewidth]{RESULTAT2.png}
\caption{Inference output of a test image}
\label{fig:train}
\end{figure}
\subsection{Deploying}
The final task was to deploy the trained model to production. The model was first converted to the popular ONNX format. The popular 3D reconstruction software open-source Meshroom by AliceVision was then modified to use ONNXRuntime to use the model. Scientists and Archeologists are now able to compute automatically the light direction in their images when reconstructing scenes in which a white sphere is present, just like in Figure \ref{fig:test}.
\section{Feedback analysis}
\label{sec:feedback}
This internship was my first experience in applied research, and I learned a lot about working in a research environment. In particular, I learned how to use various deep learning frameworks (PyTorch, Weights \& Biases, etc.), how to train and fine-tune models, and how to evaluate their performances. I also learned how to generate synthetic data, which is a very important skill in the field of computer vision, where real data is often sparse.
I also learned a lot about the process of research itself, from the formulation of the problem to the publication of the results. In particular, I learned how to write a scientific paper, which is a very valuable skill for any computer scientist.
Finally, I learned how to work in a team of researchers, and how to communicate my work to other people. This is a very important skill for any computer scientist, as research is often a very collaborative effort.
\section{Conclusion}
\label{sec:conclusion}
Overall, I had a very positive experience during my internship. I learned a lot of new skills, and gained a better understanding of the research process. I would definitely recommend this type of internship to any computer science student who is interested in research. This internship was a very valuable experience for me, and I am very grateful to have had the opportunity to work in such a stimulating environment.
\subsection{Acknowledgement}
I would like to thank my supervisors, Jean Mélou and Jean-Denis Durou, for their guidance and support during my internship. I would also like to thank the REVA team, and the Research Institute in Computer Science of Toulouse, for their hospitality and for providing me with the resources I needed to complete my work.
%%%%%%%%% REFERENCES
{\small
\bibliographystyle{ieee_fullname}
\bibliography{egbib}
\nocite{*}
}
\end{document}

42
src/main.py Normal file
View file

@ -0,0 +1,42 @@
from datamodule import DETRDataModule, FasterRCNNDataModule # noqa: F401
from lightning.pytorch.callbacks import (
ModelCheckpoint,
RichModelSummary,
RichProgressBar,
)
from lightning.pytorch.cli import LightningCLI
from module import DETR, FasterRCNN # noqa: F401
class MyLightningCLI(LightningCLI):
"""Custom Lightning CLI to define default arguments."""
def add_arguments_to_parser(self, parser):
"""Add arguments to parser."""
parser.set_defaults(
{
"trainer.max_steps": 5000,
"trainer.max_epochs": 1,
"trainer.accelerator": "gpu",
"trainer.devices": "[0]",
"trainer.strategy": "auto",
"trainer.log_every_n_steps": 25,
"trainer.val_check_interval": 200,
"trainer.num_sanity_val_steps": 10,
"trainer.benchmark": True,
"trainer.callbacks": [
RichProgressBar(),
RichModelSummary(max_depth=2),
ModelCheckpoint(mode="min", monitor="val_loss_real"),
ModelCheckpoint(save_on_train_epoch_end=True),
],
}
)
if __name__ == "__main__":
cli = MyLightningCLI(
model_class=DETR,
datamodule_class=DETRDataModule,
seed_everything_default=69420,
)

191
src/module/DETR.py Normal file
View file

@ -0,0 +1,191 @@
import torch
from lightning.pytorch import LightningModule
from PIL import ImageDraw
from transformers import (
DetrForObjectDetection,
get_cosine_with_hard_restarts_schedule_with_warmup,
)
class DETR(LightningModule):
"""PyTorch Lightning module for DETR."""
def __init__(
self,
lr: float = 1e-4,
lr_backbone: float = 1e-5,
weight_decay: float = 1e-4,
num_queries: int = 100,
warmup_steps: int = 0,
num_labels: int = 3,
prediction_threshold: float = 0.9,
):
"""Constructor.
Args:
lr (float, optional): Learning rate.
lr_backbone (float, optional): Learning rate for backbone.
weight_decay (float, optional): Weight decay.
num_queries (int, optional): Number of queries.
warmup_steps (int, optional): Number of warmup steps.
num_labels (int, optional): Number of labels.
prediction_threshold (float, optional): Prediction threshold.
"""
super().__init__()
# get DETR model
self.net = DetrForObjectDetection.from_pretrained(
"facebook/detr-resnet-50",
ignore_mismatched_sizes=True,
num_queries=num_queries,
num_labels=num_labels,
)
torch.compile(self.net)
# cf https://github.com/PyTorchLightning/pytorch-lightning/pull/1896
self.lr = lr
self.lr_backbone = lr_backbone
self.weight_decay = weight_decay
self.warmup_steps = warmup_steps
self.prediction_threshold = prediction_threshold
self.save_hyperparameters()
def forward(self, pixel_values, pixel_mask, **kwargs):
"""Forward pass."""
return self.net(
pixel_values=pixel_values,
pixel_mask=pixel_mask,
**kwargs,
)
def common_step(self, batchs, batch_idx):
"""Common step for training and validation.
Args:
batch (dict): Batch from dataloader (after collate_fn).
Structure is similar to the following:
{
"pixel_values": TensorType["batch", "canal", "width", "height"],
"pixel_mask": TensorType["batch", 1200, 1200],
"labels": List[Dict[str, TensorType["batch", "num_boxes", "num_labels"]]], # TODO: check this type
}
batch_idx (int): Batch index.
Returns:
tuple: Loss and loss dict.
"""
# intialize outputs
outputs = {k: {"loss": None, "loss_dict": None} for k in batchs.keys()}
# for each dataloader
for dataloader_name, batch in batchs.items():
# extract pixel_values, pixel_mask and labels from batch
pixel_values = batch["pixel_values"]
pixel_mask = batch["pixel_mask"]
labels = [{k: v.to(self.device) for k, v in t.items()} for t in batch["labels"]]
# forward pass
model_output = self(pixel_values=pixel_values, pixel_mask=pixel_mask, labels=labels)
# get loss
outputs[dataloader_name] = {
"loss": model_output.loss,
"loss_dict": model_output.loss_dict,
}
return outputs
def training_step(self, batch, batch_idx):
"""Training step."""
outputs = self.common_step(batch, batch_idx)
# logs metrics for each training_step
loss = 0
for dataloader_name, output in outputs.items():
loss += output["loss"]
self.log(f"train_loss_{dataloader_name}", output["loss"])
for k, v in output["loss_dict"].items():
self.log(f"train_loss_{k}_{dataloader_name}", v.item())
self.log("lr", self.optimizers().param_groups[0]["lr"])
self.log("lr_backbone", self.optimizers().param_groups[1]["lr"])
return loss
def validation_step(self, batch, batch_idx, dataloader_idx=None):
"""Validation step."""
outputs = self.common_step(batch, batch_idx)
# logs metrics for each validation_step
loss = 0
for dataloader_name, output in outputs.items():
loss += output["loss"]
self.log(f"val_loss_{dataloader_name}", output["loss"])
for k, v in output["loss_dict"].items():
self.log(f"val_loss_{k}_{dataloader_name}", v.item())
return loss
def predict_step(self, batch, batch_idx, dataloader_idx=None):
"""Predict step."""
# extract pixel_values and pixelmask from batch
pixel_values = batch["pixel_values"]
pixel_mask = batch["pixel_mask"]
images = batch["images"]
from transformers import AutoImageProcessor
image_processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50")
# forward pass
outputs = self(pixel_values=pixel_values, pixel_mask=pixel_mask)
# postprocess outputs
sizes = torch.tensor([image.size[::-1] for image in images], device=self.device)
processed_outputs = image_processor.post_process_object_detection(
outputs, threshold=self.prediction_threshold, target_sizes=sizes
)
for i, image in enumerate(images):
# create ImageDraw object to draw on image
draw = ImageDraw.Draw(image)
# draw predicted bboxes
for bbox, label, score in zip(
processed_outputs[i]["boxes"].cpu().detach().numpy(),
processed_outputs[i]["labels"].cpu().detach().numpy(),
processed_outputs[i]["scores"].cpu().detach().numpy(),
):
if label == 0:
outline = "red"
elif label == 1:
outline = "blue"
else:
outline = "green"
draw.rectangle(bbox, outline=outline, width=5)
draw.text((bbox[0], bbox[1]), f"{score:0.4f}", fill="black", width=15)
# save image to image.png using PIL
image.save(f"image2_{batch_idx}_{i}.jpg")
def configure_optimizers(self):
"""Configure optimizers."""
param_dicts = [
{
"params": [p for n, p in self.named_parameters() if "backbone" not in n and p.requires_grad],
},
{
"params": [p for n, p in self.named_parameters() if "backbone" in n and p.requires_grad],
"lr": self.lr_backbone,
},
]
optimizer = torch.optim.AdamW(param_dicts, lr=self.lr, weight_decay=self.weight_decay)
scheduler = get_cosine_with_hard_restarts_schedule_with_warmup(
optimizer,
num_warmup_steps=self.warmup_steps,
num_training_steps=self.trainer.estimated_stepping_batches,
)
return [optimizer], [{"scheduler": scheduler, "interval": "step"}]

114
src/module/FasterRCNN.py Normal file
View file

@ -0,0 +1,114 @@
import torch
import torchvision
from lightning.pytorch import LightningModule
from PIL import ImageDraw
from torchvision.models.detection.faster_rcnn import FasterRCNN_ResNet50_FPN_Weights, FastRCNNPredictor
def get_model_instance_segmentation(n_classes: int):
"""Returns a Torchvision FasterRCNN model for finetunning.
Args:
n_classes (int): number of classes the model should predict, background excluded
"""
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(
weights=FasterRCNN_ResNet50_FPN_Weights.DEFAULT,
box_detections_per_img=10, # cap numbers of detections, else oom
)
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, n_classes + 1)
return model
class FasterRCNN(LightningModule):
"""Faster R-CNN Pytorch Lightning Module, encapsulating common PyTorch functions."""
def __init__(
self,
lr: float = 1e-4,
weight_decay: float = 1e-4,
num_labels: int = 3,
):
"""Constructor, build model, save hyperparameters."""
super().__init__()
# get Mask R-CNN model
self.net = get_model_instance_segmentation(num_labels)
# hyperparameters
self.lr = lr
self.weight_decay = weight_decay
self.num_labels = num_labels
self.save_hyperparameters()
def forward(self, imgs, **kwargs):
"""Forward pass."""
return self.net(imgs, **kwargs)
def common_step(self, batchs, batch_idx):
# intialize outputs
outputs = {}
# for each dataloader
for dataloader_name, batch in batchs.items():
# extract pixel_values and labels from batch
images = batch["pixel_values"]
targets = batch["labels"]
# forward pass
model_output = self(images, targets=targets)
# get loss
outputs[dataloader_name] = {
"loss": sum(model_output.values()),
"loss_dict": model_output,
}
return outputs
def training_step(self, batch, batch_idx):
outputs = self.common_step(batch, batch_idx)
# logs metrics for each training_step
loss = 0
for dataloader_name, output in outputs.items():
loss += output["loss"]
self.log(f"train_loss_{dataloader_name}", output["loss"])
for k, v in output["loss_dict"].items():
self.log(f"train_loss_{k}_{dataloader_name}", v.item())
self.log("lr", self.optimizers().param_groups[0]["lr"])
return loss
def validation_step(self, batch, batch_idx):
self.train()
with torch.no_grad():
outputs = self.common_step(batch, batch_idx)
self.eval()
# logs metrics for each validation_step
loss = 0
for dataloader_name, output in outputs.items():
loss += output["loss"]
self.log(f"val_loss_{dataloader_name}", output["loss"])
for k, v in output["loss_dict"].items():
self.log(f"val_loss_{k}_{dataloader_name}", v.item())
return loss
def configure_optimizers(self):
"""PyTorch optimizers and Schedulers.
Returns:
dictionnary for PyTorch Lightning optimizer/scheduler configuration
"""
optimizer = torch.optim.AdamW(self.net.parameters(), lr=self.lr, weight_decay=self.weight_decay)
return {
"optimizer": optimizer,
}

2
src/module/__init__.py Normal file
View file

@ -0,0 +1,2 @@
from .DETR import DETR
from .FasterRCNN import FasterRCNN

Binary file not shown.

View file

@ -1,312 +0,0 @@
\documentclass[
12pt,
a4paper
]{article}
% Packages
\usepackage{fontspec}
\usepackage{libertinus-otf}
\usepackage[a4paper, hmargin=2cm, vmargin=3cm]{geometry}
\usepackage{graphicx}
\usepackage{microtype}
\usepackage{amsmath}
\usepackage[numbers]{natbib}
% pdfx loads both hyperref and xcolor internally
% \usepackage{hyperref}
% \usepackage{xcolor}
\usepackage[a-3u]{pdfx}
% We use \hypersetup to pass options to hyperref
\hypersetup{
colorlinks = true,
breaklinks = true,
}
\setlength{\parindent}{0cm}
\graphicspath{{../assets/}}
\usepackage{lastpage}
\usepackage{fancyhdr}
\pagestyle{fancy}
\renewcommand{\headrulewidth}{0pt}
\fancyhead{}
\cfoot{}
\rfoot{\hypersetup{hidelinks}\thepage/\pageref{LastPage}}
\title{
\vspace{5cm}
\textbf{Bibliographie de projet long}
}
\author{
Laurent Fainsin \\
{\tt\small laurent@fainsin.bzh}
}
\date{
\vspace{10.5cm}
Département Sciences du Numérique \\
Troisième année \\
2022 — 2023
}
\begin{document}
\begin{figure}[t]
\centering
\includegraphics[width=5cm]{inp_n7.jpg}
\end{figure}
\maketitle
\thispagestyle{empty}
\newpage
{
\hypersetup{hidelinks}
\tableofcontents
}
\newpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
The field of 3D reconstruction techniques in photography, such as Reflectance Transformation Imaging (RTI)~\cite{giachetti2018} and Photometric Stereo~\cite{durou2020}, often requires a precise understanding of the lighting conditions in the scene being captured. One common method for calibrating the lighting is to include one or more spheres in the scene, as shown in the left example of Figure~\ref{fig:intro}. However, manually outlining these spheres can be tedious and time-consuming, especially in the field of visual effects where the presence of chrome spheres is prevalent~\cite{jahirul_grey_2021}. This task can be made more efficient by using deep learning methods for detection. The goal of this project is to develop a neural network that can accurately detect both matte and shiny spheres in a scene, that could then be implemented in standard pipelines such as AliceVision Meshroom~\cite{alicevision2021}.
\begin{figure}[ht]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.3\linewidth]{previous_work/matte.jpg} &
\includegraphics[height=0.3\linewidth]{previous_work/shiny.jpg}
\end{tabular}
\caption{Left: a scene with matte spheres. Right: a scene with a shiny sphere.}
\label{fig:intro}
\end{figure}
\section{Previous work}
Previous work by Laurent Fainsin et al. in~\cite{spheredetect} attempted to address this problem by using a neural network called Mask R-CNN~\cite{MaskRCNN} for instance segmentation of spheres in images. However, this approach is limited in its ability to detect shiny spheres, as demonstrated in the right image of Figure~\ref{fig:previouswork}. The network was trained on images of matte spheres and was unable to generalize to shiny spheres, which highlights the need for further research in this area.
\begin{figure}[ht]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.3\linewidth]{previous_work/matte_inference.png} &
\includegraphics[height=0.3\linewidth]{previous_work/shiny_inference.jpg}
\end{tabular}
\caption{Mask R-CNN~\cite{MaskRCNN} inferences from~\cite{spheredetect} on Figure~\ref{fig:intro}.}
\label{fig:previouswork}
\end{figure}
In the field of deep learning, the specialized task of automatically detecting or segmenting spheres in scenes lacks a direct solution. Despite this, findings from studies in unrelated areas~\cite{dror_recognition_2003,qiu_describing_2021} indicate that deep neural networks may possess the capability to perform this task, offering hope for a performant solution.
\section{Datasets}
In~\cite{spheredetect}, it is explained that obtaining clean photographs with spherical markers for use in 3D reconstruction techniques is unsurprisingly rare. To address this issue, the authors of the paper created a dataset for training their model using custom python and blender scripts. This involved compositing known spherical markers (real or synthetic) onto background images from the COCO dataset~\cite{COCO}. The resulting dataset can be seen in Figure~\ref{fig:spheredetect_dataset}.
\begin{figure}[ht]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.3\linewidth]{previous_work/bear.jpg} &
\includegraphics[height=0.3\linewidth]{previous_work/plush.jpg}
\end{tabular}
\caption{Example of the synthetic dataset used in~\cite{spheredetect}.}
\label{fig:spheredetect_dataset}
\end{figure}
Additionally, synthetic images of chrome spheres can also be generated using free (CC0 1.0 Universal Public Domain Dedication) environment maps from PolyHaven~\cite{polyhaven}. These environment maps provide a wide range of realistic lighting conditions and can be used to simulate different lighting scenarios, such as different times of day, weather conditions, or indoor lighting setups. This can help to further increase the diversity of the dataset and make the model more robust to different lighting conditions, which is crucial for the task of detecting chrome sphere markers.
\subsection{Antoine Laurent}
Antoine Laurent, a PhD candidate at INP of Toulouse, is working on the field of 3D reconstruction techniques in photography with the REVA team (IRIT) and on the preservation of archaeological sites with the TRACES PSH team. He is also an active member of the scientific team for the Chauvet cave project, where he travels around France to take high-resolution photographs of cave paintings, menhir statues, and other historical monuments.
\begin{figure}[ht]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.3\linewidth]{antoine_laurent/cheveaux.jpg} &
\includegraphics[height=0.3\linewidth]{antoine_laurent/mammouths.jpg}
\end{tabular}
\caption{Example of clean photographs with 3D spherical markers from Antoine Laurent.}
\label{fig:antoine_laurent_dataset}
\end{figure}
He has compiled a dataset consisting of 400+ photographs, all of which contain 3D spherical markers, which are used to calibrate the lighting conditions and aid in the 3D reconstruction of these historical sites. These images will be our basis for our datasetn, and can be seen in Figure~\ref{fig:antoine_laurent_dataset}.
\newpage
\subsection{DeepLight}
DeepLight~\cite{legendre_deeplight_2019} is a research paper from Google that presents a deep learning-based approach for estimating the lighting conditions in mixed reality (MR) scenes captured by mobile devices. The goal of this research is to enhance the realism of MR by providing accurate estimates of the lighting conditions in the real-world scene.
\begin{figure}[ht]
\centering
\includegraphics[height=0.4\linewidth]{deeplight/Prober_Crop_small.jpg}
\includegraphics[height=0.4\linewidth]{deeplight/NAVID_20181022_104053_1393_frame_small.jpg}
\includegraphics[height=0.4\linewidth]{deeplight/Prober_figure_small.jpg}
\caption{Dataset acquisition technique from~\cite{legendre_deeplight_2019}.}
\label{fig:deeplight_dataset}
\end{figure}
The authors propose a deep learning-based model called DeepLight, which takes an RGB image captured by a mobile device as input and estimates the lighting conditions in the scene, including the color and direction of the light sources. The model is trained on a dataset of real-world images captured in various lighting conditions and the direction of lights are extracted from spherical markers as shown in Figure~\ref{fig:deeplight_dataset}. The authors demonstrated that the model can estimate the lighting conditions in new unseen images with high accuracy. This dataset could be useful for training our model to detect chrome spheres in images as it contains a wide range of lighting conditions.
\subsection{Multi-Illumination Images in the Wild}
In the paper "A Dataset of Multi-Illumination Images in the Wild"~\cite{murmann_dataset_2019}, the authors present a dataset containing over 1000 real-world scenes and their corresponding panoptic segmentation, captured under 25 different lighting conditions. This dataset can be used as a valuable resource for various computer vision tasks such as relighting, image recognition, object detection and image segmentation. The dataset, which is composed of a wide variety of lighting conditions, can be useful in training models to detect chrome spheres in images, as it would allow the model to be robust to different scenarios, improving its performance in real-world applications.
\begin{figure}[ht]
\centering
\begin{tabular}{cc}
\includegraphics[height=0.3\linewidth]{mip/dir_7_mip2.jpg} &
\includegraphics[height=0.3\linewidth]{mip/materials_mip2.png}
\end{tabular}
\caption{Example data from~\cite{murmann_dataset_2019}.}
\label{fig:murmann_dataset}
\end{figure}
\subsection{Labelling \& Versionning}
Label Studio~\cite{Label_Studio} is an open source web-based annotation tool that allows multiple annotators to label data simultaneously and provides a user-friendly interface for creating annotation tasks. It also enables to manage annotation projects, assign tasks to different annotators, and view the progress of the annotation process. It also allows to version the data and can handle different annotation formats.
The output of such annotators can be integrated with HuggingFace Datasets~\cite{lhoest-etal-2021-datasets} library, which allows to load, preprocess, share and version datasets, and easily reproduce experiments. This library has built-in support for a wide range of datasets and can handle different file formats, making it easy to work with data from multiple sources. By integrating these tools, one can have a powerful pipeline for annotation, versioning, and sharing datasets, which can improve reproducibility and collaboration in computer vision research and development.
\section{Models}
Computer vision encompasses a range of tasks, including classification, classification with localization, object detection, semantic segmentation, instance segmentation, and panoptic segmentation, as illustrated in Figure~\ref{fig:tasks}.
Each of these tasks involves different objectives and challenges, and advances in these areas have greatly improved the ability of computers to understand and interpret visual information. For example, classification tasks aim to identify the class of an object in an image, while object detection tasks seek to locate and classify multiple objects within an image. Semantic segmentation and instance segmentation focus on understanding the relationships between objects and their parts, and panoptic segmentation seeks to merge these tasks into a single comprehensive solution. We will examine a variety of models for our computer vision problem.
\begin{figure}[ht]
\centering
\includegraphics[height=0.35\linewidth]{tasks.png}
\caption{The different types of tasks in Computer Vision.}
\label{fig:tasks}
\end{figure}
\subsection{Mask R-CNN}
In~\cite{spheredetect}, the authors use Mask R-CNN~\cite{MaskRCNN} as a base model for their task. Mask R-CNN is a neural network that is able to perform instance segmentation, which is the task of detecting and segmenting objects in an image.
\begin{figure}[ht]
\centering
\includegraphics[height=0.3\linewidth]{MaskRCNN.pdf}
\caption{The Mask-RCNN~\cite{MaskRCNN} architecture.}
\label{fig:maskrcnn}
\end{figure}
The network is composed of two parts: a backbone network and a region proposal network (RPN). The backbone network is a convolutional neural network that is used to extract features from the input image. The RPN is a fully convolutional network that is used to generate region proposals, which are bounding boxes that are used to crop the input image. The RPN is then used to generate a mask for each region proposal, which is used to segment the object in the image.
The network is trained using a loss function that is composed of three terms: the classification loss, the bounding box regression loss, and the mask loss. The classification loss is used to train the network to classify each region proposal as either a sphere or not a sphere. The bounding box regression loss is used to train the network to regress the bounding box of each region proposal. The mask loss is used to train the network to generate a mask for each region proposal. The original network was trained using the COCO dataset~\cite{COCO}.
The authors of the paper~\cite{spheredetect} achieved favorable results using the network on matte spheres, however, its performance declined when shiny spheres were introduced. This can be attributed to the fact that convolutional neural networks typically extract local features from images. Observing non local features such as the interior and exterior of a chrome sphere, as defined by a "distortion" effect, may be necessary to accurately identify it.
\subsection{Ellipse R-CNN}
To detect spheres in images, it is sufficient to estimate the center and radius of their projected circles. However, due to the perspective nature of photographs, the circles are often distorted and appear as ellipses.
\begin{figure}[ht]
\centering
\includegraphics[height=0.3\linewidth]{EllipseRCNN.png}
\caption{The Ellipse R-CNN~\cite{dong_ellipse_2021} architecture.}
\label{fig:ellipsercnn}
\end{figure}
The Ellipse R-CNN~\cite{dong_ellipse_2021} is a modified version of the Mask R-CNN~\cite{MaskRCNN} which can detect ellipses in images, it addresses this issue by using an additional branch in the network to predict the axes of the ellipse and its orientation, which allows for more accurate detection of objects. It also has the feature of handling occlusion, by predicting the segmentation mask for each ellipse, in addition it can handle overlapping and occluded objects. This makes it an ideal choice for detecting spheres in real-world images with complex backgrounds and variable lighting conditions.
\subsection{GPN}
Gaussian Proposal Networks (GPNs) is a novel extension to Region Proposal Networks (RPNs), for detecting lesion bounding ellipses. The main goal of its original paper~\cite{li_detecting_2019} was to improve lesion detection systems that are commonly used in computed tomography (CT) scans, as lesions are often elliptical objects. RPNs are widely used in lesion detection, but they only propose bounding boxes without fully leveraging the elliptical geometry of lesions.
\begin{figure}[ht]
\centering
\includegraphics[height=0.4\linewidth]{GPN.png}
\caption{The GPN~\cite{li_detecting_2019} architecture.}
\label{fig:gpn}
\end{figure}
GPNs represent bounding ellipses as 2D Gaussian distributions on the image plane and minimize the Kullback-Leibler (KL) divergence between the proposed Gaussian and the ground truth Gaussian for object localization.
GPNs could be an alternative to Ellipse R-CNN~\cite{dong_ellipse_2021} for detecting ellipses in images, but it's architecture is more complex, it could be tricky to implement and deploy to production.
\subsection{DETR \& DINO}
DETR (DEtection TRansformer)~\cite{carion_end--end_2020} is a new method proposed by Facebook that views object detection as a direct set prediction problem. The main goal of DETR is to streamline the detection pipeline by removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode prior knowledge about the task.
DETR uses a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture, as seen in Figure~\ref{fig:detr}. Given a fixed small set of learned object queries, the model reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. This makes the model conceptually simple and does not require a specialized library, unlike many other modern detectors.
DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster R-CNN~\cite{ren_faster_2016} baseline on the challenging COCO~\cite{COCO} object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner and it significantly outperforms competitive baselines.
\begin{figure}[ht]
\centering
\includegraphics[height=0.2\linewidth]{DETR.pdf}
\caption{The DETR~\cite{carion_end--end_2020} architecture.}
\label{fig:detr}
\end{figure}
DINO (DETR with Improved deNoising anchOr boxes)~\cite{zhang_dino_2022} is a state-of-the-art object detector that improves on the performance and efficiency of previous DETR-like models. It utilizes a contrastive denoising training method, mixed query selection for anchor initialization, and a look-forward twice scheme for box prediction. DINO achieves a significant improvement in performance compared to the previous best DETR-like model DN-DETR~\cite{li_dn-detr_2022}. Additionally, it scales well both in terms of model size and data size compared to other models on the leaderboard.
\begin{figure}[ht]
\centering
\includegraphics[height=0.3\linewidth]{DINO.pdf}
\caption{The DINO~\cite{zhang_dino_2022} architecture.}
\label{fig:dino}
\end{figure}
\subsection{Mask2Former}
Mask2Former~\cite{cheng_masked-attention_2022} is a recent development in object detection and instance segmentation tasks. It leverages the strengths of two popular models in this field: Transformer-based architectures, such as DETR~\cite{carion_end--end_2020}, and fully convolutional networks (FCN), like Mask R-CNN~\cite{MaskRCNN}.
\begin{figure}[ht]
\centering
\includegraphics[height=0.4\linewidth]{Mask2Former.pdf}
\caption{The Mask2Former~\cite{cheng_masked-attention_2022} architecture.}
\label{fig:mask2former}
\end{figure}
Similar to DETR, Mask2Former views object detection as a direct set prediction problem, streamlining the detection pipeline and removing the need for hand-designed components like non-maximum suppression and anchor generation. Unlike DETR, however, Mask2Former also uses a fully convolutional network to perform instance segmentation, outputting a mask for each detected object. This combination of a transformer-based architecture and an FCN provides a balance between the speed and accuracy of both models.
Compared to Mask R-CNN, Mask2Former has a simpler architecture, with fewer components and a more straightforward pipeline. This simplicity leads to improved efficiency, making Mask2Former a good choice for real-time applications. The use of a transformer-based architecture also provides an advantage in handling complex scenes, where objects may have arbitrary shapes and sizes.
\section{Training}
For the training process, we plan to utilize PyTorch Lightning~\cite{Falcon_PyTorch_Lightning_2019}, a high-level library for PyTorch~\cite{NEURIPS2019_9015}, and the HuggingFace Transformers~\cite{wolf-etal-2020-transformers} library for our transformer model. The optimizer we plan to use is AdamW~\cite{loshchilov_decoupled_2019}, a variation of the Adam~\cite{kingma_adam_2017} optimizer that is well-suited for training deep learning models. We aim to ensure reproducibility by using Nix~\cite{nix} for our setup and we will use Poetry~\cite{poetry} for managing Python dependencies. This combination of tools is expected to streamline the training process and ensure reliable results.
\subsection{Loss functions}
In computer vision models such as Faster R-CNN~\cite{ren_faster_2016} and Mask R-CNN~\cite{MaskRCNN}, loss functions play a crucial role in the training process. They define the objective that the model aims to minimize during training, and the optimization of the loss function leads to the convergence of the model to a desired performance.
Faster R-CNN uses two loss functions: a classification loss and a regression loss. The classification loss measures the difference between the predicted object class and the ground truth class. It is usually calculated using the cross-entropy loss function. The regression loss measures the difference between the predicted bounding box and the ground truth bounding box. It is usually calculated using the smooth L1 loss function, which is a differentiable approximation of the L1 loss function.
Mask R-CNN, on the other hand, adds a segmentation loss to the losses used in Faster R-CNN. The segmentation loss measures the difference between the predicted segmentation mask and the ground truth mask. It is usually calculated using the binary cross-entropy loss function. The binary cross-entropy loss function measures the difference between the predicted binary mask and the ground truth binary mask.
DETR uses a bipartite matching loss for training. The bipartite matching loss measures the difference between the predicted set of detections and the ground truth set of objects. It is calculated as the sum of pairwise distances between the predicted and ground truth detections, where the distance between two detections is defined as the negative IoU between their bounding boxes. The bipartite matching loss is designed to handle the permutation invariance of the detections, which is important for the detection of objects with arbitrary number and order.
\subsection{Metrics}
In object detection and instance segmentation tasks, metrics such as DICE, IoU, or mAP are commonly used to evaluate the performance of a computer vision model.
Mean Average Precision (mAP) is a widely used metric in object detection. It represents the average of the Average Precision (AP) values for each object class in a dataset. The AP is the area under the Precision-Recall curve, which is a graphical representation of the precision and recall values of a model at different thresholds. mAP provides a comprehensive measure of the overall performance of a model in detecting objects of different classes in a dataset.
Intersection over Union (IoU), also known as Jaccard index, is another widely used metric in object detection. It measures the similarity between the predicted bounding box and the ground truth bounding box by calculating the ratio of the area of their intersection to the area of their union. A high IoU value indicates a well-aligned predicted bounding box with the ground truth.
In instance segmentation, the Dice Coefficient (DICE) is a widely used metric. It measures the similarity between the predicted segmentation mask and the ground truth mask by calculating the ratio of twice the area of their intersection to the sum of their areas. A DICE value of 1 indicates a perfect match between the predicted and ground truth masks.
These metrics are available in the TorchMetrics~\cite{TorchMetrics_2022} library and provide valuable insights into the performance of object detection and instance segmentation models, enabling the identification of areas for improvement and guiding further development.
\subsection{Experiment tracking}
To keep track of our experiments and their results, we will utilize Weights \& Biases (W\&B)~\cite{wandb} or Aim~\cite{Arakelyan_Aim_2020}. W\&B is a popular experiment tracking tool that provides a simple interface for logging and visualizing metrics, models, and artifacts. Aim is a collaborative machine learning platform that provides a unified way to track, compare, and explain experiments across teams and tools. By utilizing these tools, we aim to efficiently track our experiments and compare results. This will allow us to make data-driven decisions and achieve better results if we have enough time.
\section{Deployment}
For deployment, we plan to use the ONNX~\cite{ONNX} format. This format provides a standard for interoperability between different AI frameworks and helps ensure compatibility with a wide range of deployment scenarios. To ensure the deployment process is seamless, we will carefully choose an architecture that is exportable, though most popular architectures are compatible with ONNX. Our model will be run in production using ONNXRuntime~\cite{ONNX_Runtime_2018}, a framework that allows for efficient inference using ONNX models. This combination of tools and formats will ensure that our model can be deployed quickly and easily in a variety of production environments such as AliceVision Meshroom~\cite{alicevision2021}.
\section{Conclusion}
In conclusion, the detection of matte spheres has been explored and is possible, however, the automatic detection of chrome spheres has not been fully investigated. The initial step towards this goal would be to evaluate the capabilities of transformer-based architectures, such as DETR, in detecting chrome spheres. If successful, further improvements can include the prediction of bounding ellipses instead of just bounding boxes (modifications to the architecture already allows to detect angled bounding boxes~\cite{dai_ao2-detr_2022}), exporting the model to the ONNX format, and deploying it inside the Alicevision Meshroom software.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newpage
\addcontentsline{toc}{section}{References}
\bibliography{zotero,qcav,softs}
\bibliographystyle{plainnat}
\end{document}

View file

@ -1,14 +0,0 @@
\Author{Laurent Fainsin}
\Title{
Bibliographie de projet long
}
\Language{English}
\Keywords{}
\Publisher{Self-Published}
\Subject{
Bibliography
}
\Date{2023-01-24}
\PublicationType{Bibliography}
\Source{}
\URLlink{}

View file

@ -1,57 +0,0 @@
@inproceedings{MaskRCNN,
author = {He, Kaiming and Gkioxari, Georgia and Dollár, Piotr and Girshick, Ross},
booktitle = {Proceedings of ICCV},
title = {{Mask R-CNN}},
year = {2017},
doi = {10.1109/ICCV.2017.322}
}
@inproceedings{CoCo,
author = {Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro
and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C. Lawrence},
title = {{Microsoft COCO: Common Objects in Context}},
booktitle = {Proceedings of ECCV},
year = {2014}
}
@inproceedings{girshick2015fast,
author = {Girshick, Ross},
title = {{Fast R-CNN}},
booktitle = {Proceedings of ICCV},
year = {2015}
}
@incollection{durou2020,
author = {Durou, Jean-Denis and Falcone, Maurizio and Qu{\'e}au, Yvain and Tozza, Silvia},
title = {{A Comprehensive Introduction to Photometric 3D-reconstruction}},
booktitle = {{Advances in Photometric 3D-Reconstruction}},
pages = {1--29},
publisher = {{Springer}},
collection = {{Advances in Computer Vision and Pattern Recognition}},
year = {2020}
}
@article{giachetti2018,
author = {Giachetti, Andrea and Ciortan, Irina Mihaela and Daffara, Claudia and Marchioro, Giacomo and Pintus, Ruggero and Gobbetti, Enrico},
title = {A novel framework for highlight reflectance transformation imaging},
journal = {CVIU},
volume = {168},
pages = {118-131},
year = {2018}
}
@inproceedings{spheredetect,
author = {Laurent Fainsin and Jean Mélou and Lilian Calvet and Antoine Laurent and Axel Carlier and Jean-Denis Durou},
title = {Neural sphere detection in images for lighting calibration},
booktitle = {Proceedings of QCAV},
year = {2023}
}
@inproceedings{alicevision2021,
title = {{A}liceVision {M}eshroom: An open-source {3D} reconstruction pipeline},
author = {Carsten Griwodz and Simone Gasparini and Lilian Calvet and Pierre Gurdjos and Fabien Castan and Benoit Maujean and Gregoire De Lillo and Yann Lanthony},
booktitle = {Proceedings of the 12th ACM Multimedia Systems Conference - {MMSys '21}},
doi = {10.1145/3458305.3478443},
publisher = {ACM Press},
year = {2021}
}

View file

@ -1,94 +0,0 @@
\documentclass[]{spie} %>>> use for US letter paper
%\documentclass[a4paper]{spie} %>>> use this instead for A4 paper
%\documentclass[nocompress]{spie} %>>> to avoid compression of citations
\renewcommand{\baselinestretch}{1.0} % Change to 1.65 for double spacing
\usepackage{amsmath,amsfonts,amssymb}
\usepackage{graphicx}
\usepackage[colorlinks=true, allcolors=blue]{hyperref}
\title{Neural sphere detection in images for lighting calibration}
\author{Laurent \textsc{Fainsin}}
\author{Jean \textsc{M\'elou}}
\author{Lilian \textsc{Calvet}}
\author{Axel \textsc{Carlier}}
\author{Jean-Denis \textsc{Durou}}
\affil{IRIT, UMR CNRS 5505, Universit{\'e} de Toulouse, France}
% Option to view page numbers
\pagestyle{empty} % change to \pagestyle{plain} for page numbers
\setcounter{page}{301} % Set start page numbering at e.g. 301
\begin{document}
\maketitle
\begin{abstract}
The detection of spheres in images is useful for photometric 3D-vision techniques such as RTI~\cite{giachetti2018} or photometric stereo~\cite{durou2020}, for which a precise calibration of the lighting is required. We propose to train a neural network called Mask R-CNN for this task, and show that the segmentation of any number of spheres in an image using this network is at least as accurate, and much faster, than manual segmentation.
\end{abstract}
% Include a list of keywords after the abstract
\keywords{Sphere detection, Instance segmentation, Neural network, Mask R-CNN, Lighting calibration.}
\section{Methodology}
\label{sec:methodo}
Our training dataset consists of synthetics images generated via compositing. We used the 2017 COCO~\cite{CoCo} unlabelled images dataset, containing 123287 images in which we have embedded spherical markers. These markers originated from photographs of spheres in situ under various illuminations, and from synthetic renders from Blender. We present an example of such picture in Figure~\ref{fig:train}. Combined with various data augmentation transformations, this allowed us to easily obtain an image set of considerable size with the associated ground truth.
The Mask R-CNN~\cite{MaskRCNN} neural network is particularly well-suited to our problem since it aims at an instance segmentation, which will allow us to perform different treatments on each of the detected spheres. Indeed, detection networks like Faster R-CNN have two outputs: the class of the detected object and its bounding box. Mask R-CNN adds a third branch (see Figure~\ref{fig:maskRCNN}) which allows us to obtain the mask of the object.
\begin{figure}[!h]
\centering
\includegraphics[width=0.5\linewidth]{Figures/MaskRCNN.png}
\caption{Third branch of Mask R-CNN, which allows instance segmentation (image extracted from~\cite{MaskRCNN}).}
\label{fig:maskRCNN}
\end{figure}
We chose the official PyTorch implementation of Mask R-CNN from the TorchVision module. This implementation performs additional transforms to our images before feeding them to the model. Our images are thus resized and normalized appropriately. Indeed, some of the images are authentic archaeological images that are used for metrological purposes and are therefore very large.
We used the original Mask R-CNN loss function: $L = L_\text{cls} + L_\text{box} + L_\text{mask}$. As $L_\text{cls}$ concerns classification and is a log-loss, it is not of much interest to us for the moment since we currently have only one class. On the other hand, $L_\text{box}$ robustly measures the adequacy of the estimated bounding box with respect to the ground truth~\cite{girshick2015fast} via a smooth L1 loss, whereas $L_\text{mask}$ evaluates the resulting mask using an average binary cross-entropy loss.
We used the mean Average Precision (mAP) as our main metric. It is a classical metric in object detection, based on the principle of Intersection over Union (IoU). The network was trained using an Adam optimizer with a learning rate of $1.10^{-3}$, a train batch size of $6$, and an unlimited number of epochs as we opted for an early stopping strategy on the mAP, with a patience of $5$ and minimum delta of $0.01$. We ultimately obtain a bounding box mAP of about 0.8, which indicates a good detection of our spheres.
\begin{figure}[!h]
\centering
\begin{tabular}{ccc}
\includegraphics[width=0.3\linewidth]{Figures/Train/1/image.jpg} &
\includegraphics[width=0.3\linewidth]{Figures/Train/1/MASK.PNG} &
\includegraphics[width=0.3\linewidth]{Figures/Train/1/result.png}
\end{tabular}
\caption{Example of synthetic data from our dataset. From left to right: picture from COCO with composited spheres on top; generated ground truth mask of the spheres (each color denotes an instance); inference output of our network.}
\label{fig:train}
\end{figure}
\section{Results}
\label{sec:conclusion}
Once segmented, the silhouette of a sphere can indeed give us a lot of information about the luminous environment of the 3D-scene. In the particular case where the sphere is matte, the brightest point is the one where the normal points towards the light source. The left image in Figure \ref{fig:results} shows an example of capture made in a painted cave, where such a sphere has been placed near the wall, in order to implement photometric stereo.
\begin{figure}[!h]
\centering
\begin{tabular}{ccc}
\includegraphics[height=0.25\linewidth]{Figures/Test/RESULTAT2.png} &
\includegraphics[height=0.25\linewidth]{Figures/Results/4.png} &
\includegraphics[height=0.25\linewidth]{Figures/Results/2b.png}
\end{tabular}
\caption{Inference outputs of three test images: the chrome sphere in the right image has not been detected.}
\label{fig:results}
\end{figure}
\section{Conclusion and Perspectives}
\label{sec:conclusion}
In this paper, we present a new method for calibration spheres detection using deep learning, which is necessary for several 3D-reconstruction techniques such as RTI or photometric stereo. This is a rather simple task (Hough transform does the trick), but problems arise when a robust detection is required, as cast shadows or any circular patterns create false positives. We therefore propose a neural network based approach, which is much faster than manual detection, and even more accurate, in practice, when shadows are located near the silhouette boundary.
We deliberately put aside the classification allowed by the Mask R-CNN neural network. We therefore hope to be able to use this aspect to detect more types of spheres, especially the chrome spheres that are used in the post-production industry to collect a complete mapping of the light environment (such a sphere has not been detected in the right image in Figure \ref{fig:results}).
\bibliography{biblio} % bibliography data in report.bib
\bibliographystyle{spiebib} % makes bibtex use spiebib.bst
\end{document}

View file

@ -1,160 +0,0 @@
@inproceedings{wolf-etal-2020-transformers,
title = {Transformers: State-of-the-Art Natural Language Processing},
author = {Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush},
booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
month = oct,
year = {2020},
address = {Online},
publisher = {Association for Computational Linguistics},
url = {https://www.aclweb.org/anthology/2020.emnlp-demos.6},
pages = {38--45}
}
@inproceedings{lhoest-etal-2021-datasets,
title = {Datasets: A Community Library for Natural Language Processing},
author = {Lhoest, Quentin and
Villanova del Moral, Albert and
Jernite, Yacine and
Thakur, Abhishek and
von Platen, Patrick and
Patil, Suraj and
Chaumond, Julien and
Drame, Mariama and
Plu, Julien and
Tunstall, Lewis and
Davison, Joe and
{\v{S}}a{\v{s}}ko, Mario and
Chhablani, Gunjan and
Malik, Bhavitvya and
Brandeis, Simon and
Le Scao, Teven and
Sanh, Victor and
Xu, Canwen and
Patry, Nicolas and
McMillan-Major, Angelina and
Schmid, Philipp and
Gugger, Sylvain and
Delangue, Cl{\'e}ment and
Matussi{\`e}re, Th{\'e}o and
Debut, Lysandre and
Bekman, Stas and
Cistac, Pierric and
Goehringer, Thibault and
Mustar, Victor and
Lagunas, Fran{\c{c}}ois and
Rush, Alexander and
Wolf, Thomas},
booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
month = nov,
year = {2021},
address = {Online and Punta Cana, Dominican Republic},
publisher = {Association for Computational Linguistics},
url = {https://aclanthology.org/2021.emnlp-demo.21},
pages = {175--184},
abstract = {The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small datasets as for internet-scale corpora. The design of the library incorporates a distributed, community-driven approach to adding datasets and documenting usage. After a year of development, the library now includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects and shared tasks. The library is available at https://github.com/huggingface/datasets.},
eprint = {2109.02846},
archiveprefix = {arXiv},
primaryclass = {cs.CL}
}
@software{ONNX,
title = {{ONNX}: Open Neural Network Exchange},
url = {https://github.com/onnx/onnx},
license = {Apache-2.0},
version = {1.13.0},
author = {{ONNX community}},
year = {2018-2023}
}
@software{ONNX_Runtime_2018,
author = {ONNX Runtime developers},
license = {MIT},
month = {11},
title = {{ONNX Runtime}},
url = {https://github.com/microsoft/onnxruntime},
year = {2018}
}
@software{Arakelyan_Aim_2020,
author = {Arakelyan, Gor and Soghomonyan, Gevorg and {The Aim team}},
doi = {10.5281/zenodo.6536395},
license = {Apache-2.0},
month = {6},
title = {{Aim}},
url = {https://github.com/aimhubio/aim},
version = {3.9.3},
year = {2020}
}
@software{Label_Studio,
title = {{Label Studio}: Data labeling software},
url = {https://github.com/heartexlabs/label-studio},
license = {Apache-2.0},
version = {1.7.1},
author = {{Maxim Tkachenko} and {Mikhail Malyuk} and {Andrey Holmanyuk} and {Nikolai Liubimov}},
year = {2020-2022}
}
@software{wandb,
title = {{Weights \& Biases}: Track, visualize, and share your machine learning experiments},
url = {https://github.com/wandb/wandb},
license = {MIT},
version = {0.13.9},
author = {{Wandb team}},
year = {2023}
}
@software{Falcon_PyTorch_Lightning_2019,
author = {Falcon, William and {The PyTorch Lightning team}},
doi = {10.5281/zenodo.3828935},
license = {Apache-2.0},
month = {3},
title = {{PyTorch Lightning}},
url = {https://github.com/Lightning-AI/lightning},
version = {1.4},
year = {2019}
}
@software{TorchMetrics_2022,
author = {{Nicki Skafte Detlefsen} and {Jiri Borovec} and {Justus Schock} and {Ananya Harsh} and {Teddy Koker} and {Luca Di Liello} and {Daniel Stancl} and {Changsheng Quan} and {Maxim Grechkin} and {William Falcon}},
doi = {10.21105/joss.04101},
license = {Apache-2.0},
month = {2},
title = {{TorchMetrics - Measuring Reproducibility in PyTorch}},
url = {https://github.com/Lightning-AI/metrics},
year = {2022}
}
@incollection{NEURIPS2019_9015,
title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
booktitle = {Advances in Neural Information Processing Systems 32},
pages = {8024--8035},
year = {2019},
publisher = {Curran Associates, Inc.},
url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}
}
@software{polyhaven,
title = {{Poly Haven}: 3D models for everyone},
url = {https://polyhaven.com/},
license = {CC-BY-NC-4.0},
author = {{Poly Haven team}},
year = {2021}
}
@software{nix,
title = {Nix: The purely functional package manager},
url = {https://nixos.org/},
author = {Eelco Dolstra and NixOS Foundation},
license = {MIT},
year = {2013-2023}
}
@sofware{poetry,
title = {Poetry: Python dependency management and packaging made easy},
url = {https://github.com/python-poetry/poetry/},
author = {Sébastien Eustace},
license = {MIT},
year = {2018-2023}
}

View file

@ -1,479 +0,0 @@
@misc{van_strien_training_2022,
title = {Training an object detection model using {Hugging} {Face}},
url = {https://danielvanstrien.xyz/huggingface/huggingface-datasets/transformers/2022/08/16/detr-object-detection.html},
abstract = {training a Detr object detection model using Hugging Face transformers and datasets},
language = {en},
urldate = {2023-01-17},
journal = {Daniel van Strien},
author = {Van Strien, Daniel},
month = aug,
year = {2022},
file = {Snapshot:/home/laurent/Zotero/storage/DXQJISMX/detr-object-detection.html:text/html},
}
@article{dror_recognition_2003,
title = {Recognition of {Surface} {Reflectance} {Properties} from a {Single} {Image} under {Unknown} {Real}-{World} {Illumination}},
abstract = {This paper describes a machine vision system that classifies reflectance properties of surfaces such as metal, plastic, or paper, under unknown real-world illumination. We demonstrate performance of our algorithm for surfaces of arbitrary geometry. Reflectance estimation under arbitrary omnidirectional illumination proves highly underconstrained. Our reflectance estimation algorithm succeeds by learning relationships between surface reflectance and certain statistics computed from an observed image, which depend on statistical regularities in the spatial structure of real-world illumination. Although the algorithm assumes known geometry, its statistical nature makes it robust to inaccurate geometry estimates.},
language = {en},
author = {Dror, Ron O and Adelson, Edward H and Willsky, Alan S},
year = {2003},
file = {Dror et al. - Recognition of Surface Reflectance Properties from .pdf:/home/laurent/Zotero/storage/HJXFDDT6/Dror et al. - Recognition of Surface Reflectance Properties from .pdf:application/pdf},
}
@article{legendre_deeplight_2019,
title = {{DeepLight}: {Learning} {Illumination} for {Unconstrained} {Mobile} {Mixed} {Reality}},
abstract = {We present a learning-based method to infer plausible high dynamic range (HDR), omnidirectional illumination given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field of view (FOV). For training data, we collect videos of various reflective spheres placed within the cameras FOV, leaving most of the background unoccluded, leveraging that materials with diverse reflectance functions reveal different lighting cues in a single exposure. We train a deep neural network to regress from the LDR background image to HDR lighting by matching the LDR ground truth sphere images to those rendered with the predicted illumination using imagebased relighting, which is differentiable. Our inference runs at interactive frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed reality. Training on automatically exposed and white-balanced videos, we improve the realism of rendered objects compared to the state-of-the art methods for both indoor and outdoor scenes.},
language = {en},
author = {LeGendre, Chloe and Ma, Wan-Chun and Fyffe, Graham and Flynn, John and Charbonnel, Laurent and Busch, Jay and Debevec, Paul},
year = {2019},
file = {LeGendre et al. - DeepLight Learning Illumination for Unconstrained.pdf:/home/laurent/Zotero/storage/7FGL25G5/LeGendre et al. - DeepLight Learning Illumination for Unconstrained.pdf:application/pdf},
}
@misc{tazi_fine-tuning_nodate,
title = {Fine-tuning {DETR} for license plates detection},
url = {https://kaggle.com/code/nouamane/fine-tuning-detr-for-license-plates-detection},
abstract = {Explore and run machine learning code with Kaggle Notebooks {\textbar} Using data from multiple data sources},
language = {en},
urldate = {2023-01-17},
author = {Tazi, Nouamane},
file = {Snapshot:/home/laurent/Zotero/storage/WHFVB3QC/fine-tuning-detr-for-license-plates-detection.html:text/html},
}
@inproceedings{murmann_dataset_2019,
address = {Seoul, Korea (South)},
title = {A {Dataset} of {Multi}-{Illumination} {Images} in the {Wild}},
isbn = {978-1-72814-803-8},
url = {https://ieeexplore.ieee.org/document/9008252/},
doi = {10.1109/ICCV.2019.00418},
abstract = {Collections of images under a single, uncontrolled illumination [42] have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation [26, 43, 18]. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. The data simply does not contain the necessary supervisory signals. Multiillumination datasets are notoriously hard to capture, so the data is typically collected at small scale, in controlled environments, either using multiple light sources [10, 53], or robotic gantries [8, 20]. This leads to image collections that are not representative of the variety and complexity of real-world scenes. We introduce a new multi-illumination dataset of more than 1000 real scenes, each captured in high dynamic range and high resolution, under 25 lighting conditions. We demonstrate the richness of this dataset by training state-of-the-art models for three challenging applications: single-image illumination estimation, image relighting, and mixed-illuminant white balance.},
language = {en},
urldate = {2023-01-17},
booktitle = {2019 {IEEE}/{CVF} {International} {Conference} on {Computer} {Vision} ({ICCV})},
publisher = {IEEE},
author = {Murmann, Lukas and Gharbi, Michael and Aittala, Miika and Durand, Fredo},
month = oct,
year = {2019},
pages = {4079--4088},
file = {Murmann et al. - 2019 - A Dataset of Multi-Illumination Images in the Wild.pdf:/home/laurent/Zotero/storage/KH9HA9SQ/Murmann et al. - 2019 - A Dataset of Multi-Illumination Images in the Wild.pdf:application/pdf},
}
@misc{arora_annotated_2021,
title = {The {Annotated} {DETR}},
url = {https://amaarora.github.io/2021/07/26/annotateddetr.html},
abstract = {This is a place where I write freely and try to uncomplicate the complicated for myself and everyone else through Python code.},
language = {en},
urldate = {2023-01-17},
journal = {Committed towards better future},
author = {Arora, Aman},
month = jul,
year = {2021},
file = {Snapshot:/home/laurent/Zotero/storage/G78PSBHE/annotateddetr.html:text/html},
}
@misc{carion_end--end_2020,
title = {End-to-{End} {Object} {Detection} with {Transformers}},
url = {http://arxiv.org/abs/2005.12872},
doi = {10.48550/arXiv.2005.12872},
abstract = {We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive baselines. Training code and pretrained models are available at https://github.com/facebookresearch/detr.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey},
month = may,
year = {2020},
note = {arXiv:2005.12872 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/KBRPD4CU/Carion et al. - 2020 - End-to-End Object Detection with Transformers.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/6445LQV5/2005.html:text/html},
}
@misc{li_detecting_2019,
title = {Detecting {Lesion} {Bounding} {Ellipses} {With} {Gaussian} {Proposal} {Networks}},
url = {http://arxiv.org/abs/1902.09658},
doi = {10.48550/arXiv.1902.09658},
abstract = {Lesions characterized by computed tomography (CT) scans, are arguably often elliptical objects. However, current lesion detection systems are predominantly adopted from the popular Region Proposal Networks (RPNs) that only propose bounding boxes without fully leveraging the elliptical geometry of lesions. In this paper, we present Gaussian Proposal Networks (GPNs), a novel extension to RPNs, to detect lesion bounding ellipses. Instead of directly regressing the rotation angle of the ellipse as the common practice, GPN represents bounding ellipses as 2D Gaussian distributions on the image plain and minimizes the Kullback-Leibler (KL) divergence between the proposed Gaussian and the ground truth Gaussian for object localization. We show the KL divergence loss approximately incarnates the regression loss in the RPN framework when the rotation angle is 0. Experiments on the DeepLesion dataset show that GPN significantly outperforms RPN for lesion bounding ellipse detection thanks to lower localization error. GPN is open sourced at https://github.com/baidu-research/GPN},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Li, Yi},
month = feb,
year = {2019},
note = {arXiv:1902.09658 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/IB8AWGHV/Li - 2019 - Detecting Lesion Bounding Ellipses With Gaussian P.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/ZGKBBB98/1902.html:text/html},
}
@misc{noauthor_detr_nodate,
title = {{DETR}},
url = {https://huggingface.co/docs/transformers/model_doc/detr},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/2AQYDSL3/detr.html:text/html},
}
@misc{noauthor_opencv_nodate,
title = {{OpenCV}: {Camera} {Calibration}},
url = {https://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html},
urldate = {2023-01-17},
file = {OpenCV\: Camera Calibration:/home/laurent/Zotero/storage/7C3DT2WU/tutorial_py_calibration.html:text/html},
}
@misc{jahirul_grey_2021,
title = {The {Grey}, the {Chrome} and the {Macbeth} {Chart} {CAVE} {Academy}},
url = {https://caveacademy.com/wiki/onset-production/data-acquisition/data-acquisition-training/the-grey-the-chrome-and-the-macbeth-chart/},
language = {en-US},
urldate = {2023-01-17},
author = {Jahirul, Amin},
month = jul,
year = {2021},
file = {Snapshot:/home/laurent/Zotero/storage/TM2TJKMH/the-grey-the-chrome-and-the-macbeth-chart.html:text/html},
}
@misc{doppenberg_lunar_2022,
title = {Lunar {Orbit} {Navigation} {Using} {Ellipse} {R}-{CNN} and {Crater} {Pattern} {Matching}},
copyright = {MIT},
url = {https://github.com/wdoppenberg/crater-detection},
abstract = {Autonomous Lunar Orbit Navigation Using Ellipse R-CNN and Crater Pattern Matching},
urldate = {2023-01-17},
author = {Doppenberg, Wouter},
month = aug,
year = {2022},
note = {original-date: 2020-10-19T16:32:29Z},
keywords = {crater-detection, ellipse-rcnn, faster-rcnn, space-engineering},
}
@misc{doppenberg_ellipse_2022,
title = {Ellipse {R}-{CNN}},
copyright = {MIT},
url = {https://github.com/wdoppenberg/ellipse-rcnn},
abstract = {A PyTorch implementation of Ellipse R-CNN},
urldate = {2023-01-17},
author = {Doppenberg, Wouter},
month = dec,
year = {2022},
note = {original-date: 2021-06-25T09:21:44Z},
keywords = {ellipse-rcnn, deep-learning, pytorch, pytorch-lightning, region-based},
}
@misc{wok_finetune_2022,
title = {Finetune {DETR}},
copyright = {MIT},
url = {https://github.com/woctezuma/finetune-detr},
abstract = {Fine-tune Facebook's DETR (DEtection TRansformer) on Colaboratory.},
urldate = {2023-01-17},
author = {Wok},
month = dec,
year = {2022},
note = {original-date: 2020-08-03T17:17:35Z},
keywords = {balloon, balloons, colab, colab-notebook, colaboratory, detr, facebook, finetune, finetunes, finetuning, google-colab, google-colab-notebook, google-colaboratory, instance, instance-segmentation, instances, segementation, segment},
}
@misc{rogge_transformers_2020,
title = {Transformers {Tutorials}"},
copyright = {MIT},
url = {https://github.com/NielsRogge/Transformers-Tutorials},
abstract = {This repository contains demos I made with the Transformers library by HuggingFace.},
urldate = {2023-01-17},
author = {Rogge, Niels},
month = sep,
year = {2020},
doi = {10.5281/zenodo.1234},
}
@misc{noauthor_recommendations_2020,
title = {Recommendations for training {Detr} on custom dataset? · {Issue} \#9 · facebookresearch/detr},
shorttitle = {Recommendations for training {Detr} on custom dataset?},
url = {https://github.com/facebookresearch/detr/issues/9},
abstract = {Very impressed with the all new innovative architecture in Detr! Can you clarify recommendations for training on a custom dataset? Should we build a model similar to demo and train, or better to us...},
language = {en},
urldate = {2023-01-17},
journal = {GitHub},
month = may,
year = {2020},
file = {Snapshot:/home/laurent/Zotero/storage/G2S6584X/9.html:text/html},
}
@misc{noauthor_auto_nodate,
title = {Auto {Classes}},
url = {https://huggingface.co/docs/transformers/model_doc/auto},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
}
@misc{noauthor_swin_nodate,
title = {Swin {Transformer}},
url = {https://huggingface.co/docs/transformers/v4.24.0/en/model_doc/swin},
abstract = {Were on a journey to advance and democratize artificial intelligence through open source and open science.},
urldate = {2023-01-17},
file = {Snapshot:/home/laurent/Zotero/storage/K2NDEY49/swin.html:text/html},
}
@misc{rajesh_pytorch_2022,
title = {{PyTorch} {Implementations} of various state of the art architectures.},
url = {https://github.com/04RR/SOTA-Vision},
abstract = {Implementation of various state of the art architectures used in computer vision.},
urldate = {2023-01-17},
author = {Rajesh, Rohit},
month = sep,
year = {2022},
note = {original-date: 2021-05-02T03:32:10Z},
keywords = {deep-learning, pytorch, deep-learning-algorithms, pytorch-implementation, transformer-architecture},
}
@misc{mmdetection_contributors_openmmlab_2018,
title = {{OpenMMLab} {Detection} {Toolbox} and {Benchmark}},
copyright = {Apache-2.0},
url = {https://github.com/open-mmlab/mmdetection},
abstract = {OpenMMLab Detection Toolbox and Benchmark},
urldate = {2023-01-17},
author = {{MMDetection Contributors}},
month = aug,
year = {2018},
note = {original-date: 2018-08-22T07:06:06Z},
}
@misc{noauthor_awesome_2023,
title = {Awesome {Detection} {Transformer}},
url = {https://github.com/IDEA-Research/awesome-detection-transformer},
abstract = {Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)},
urldate = {2023-01-17},
publisher = {IDEA-Research},
month = jan,
year = {2023},
note = {original-date: 2022-03-09T05:11:49Z},
}
@misc{noauthor_miscellaneous_nodate,
title = {Miscellaneous {Transformations} and {Projections}},
url = {http://paulbourke.net/geometry/transformationprojection/},
urldate = {2023-01-17},
file = {Miscellaneous Transformations and Projections:/home/laurent/Zotero/storage/WP7ZDCKF/transformationprojection.html:text/html},
}
@article{jun-fang_wu_nonmetric_2010,
title = {Nonmetric calibration of camera lens distortion using concentric circles pattern},
url = {http://ieeexplore.ieee.org/document/5535290/},
doi = {10.1109/MACE.2010.5535290},
abstract = {A method of distortion calibration for camera is proposed. The distortion center and distortion coefficients are estimated separately. The planar concentric circles are used as the calibration pattern. By analyzing the geometrical and projective characters of concentric circles, we deduce that the line connecting the centroids of distorted concentric circles must go through the distortion center. This is utilized to compute the distortion parameters and the solution in the sense of least square are obtained. The proposed approach is entirely noniterative, therefore it keeps away from the procedure of iterative optimization. On the other hand, it is nonmetric, thus it is low cost. Experiments on both synthetic and real image data are reported. The results show our method behaves excellently. Moreover, the capability of our method to resist noise is satisfying.},
urldate = {2023-01-17},
journal = {2010 International Conference on Mechanic Automation and Control Engineering},
author = {{Jun-Fang Wu} and {Gui-Xiong Liu}},
month = jun,
year = {2010},
note = {Conference Name: 2010 International Conference on Mechanic Automation and Control Engineering (MACE)
ISBN: 9781424477371
Place: Wuhan, China
Publisher: IEEE},
pages = {3338--3341},
annote = {[TLDR] The proposed approach is entirely noniterative, therefore it keeps away from the procedure of iterative optimization and is nonmetric, thus it is low cost and the capability of the method to resist noise is satisfying.},
}
@misc{qiu_describing_2021,
title = {Describing and {Localizing} {Multiple} {Changes} with {Transformers}},
url = {http://arxiv.org/abs/2103.14146},
doi = {10.48550/arXiv.2103.14146},
abstract = {Change captioning tasks aim to detect changes in image pairs observed before and after a scene change and generate a natural language description of the changes. Existing change captioning studies have mainly focused on a single change.However, detecting and describing multiple changed parts in image pairs is essential for enhancing adaptability to complex scenarios. We solve the above issues from three aspects: (i) We propose a simulation-based multi-change captioning dataset; (ii) We benchmark existing state-of-the-art methods of single change captioning on multi-change captioning; (iii) We further propose Multi-Change Captioning transformers (MCCFormers) that identify change regions by densely correlating different regions in image pairs and dynamically determines the related change regions with words in sentences. The proposed method obtained the highest scores on four conventional change captioning evaluation metrics for multi-change captioning. Additionally, our proposed method can separate attention maps for each change and performs well with respect to change localization. Moreover, the proposed framework outperformed the previous state-of-the-art methods on an existing change captioning benchmark, CLEVR-Change, by a large margin (+6.1 on BLEU-4 and +9.7 on CIDEr scores), indicating its general ability in change captioning tasks.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Qiu, Yue and Yamamoto, Shintaro and Nakashima, Kodai and Suzuki, Ryota and Iwata, Kenji and Kataoka, Hirokatsu and Satoh, Yutaka},
month = sep,
year = {2021},
note = {arXiv:2103.14146 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
annote = {Comment: Accepted by ICCV2021. 18 pages, 15 figures, project page: https://cvpaperchallenge.github.io/Describing-and-Localizing-Multiple-Change-with-Transformers/},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/6GLDC5C7/Qiu et al. - 2021 - Describing and Localizing Multiple Changes with Tr.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/4ZUPCEKT/2103.html:text/html},
}
@misc{lahoud_3d_2022,
title = {{3D} {Vision} with {Transformers}: {A} {Survey}},
shorttitle = {{3D} {Vision} with {Transformers}},
url = {http://arxiv.org/abs/2208.04309},
doi = {10.48550/arXiv.2208.04309},
abstract = {The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field. The transformer has been used as a replacement for the widely used convolution operators, due to its ability to learn long-range dependencies. This replacement was proven to be successful in numerous tasks, in which several state-of-the-art methods rely on transformers for better learning. In computer vision, the 3D field has also witnessed an increase in employing the transformer for 3D convolution neural networks and multi-layer perceptron networks. Although a number of surveys have focused on transformers in vision in general, 3D vision requires special attention due to the difference in data representation and processing when compared to 2D vision. In this work, we present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks, including classification, segmentation, detection, completion, pose estimation, and others. We discuss transformer design in 3D vision, which allows it to process data with various 3D representations. For each application, we highlight key properties and contributions of proposed transformer-based methods. To assess the competitiveness of these methods, we compare their performance to common non-transformer methods on 12 3D benchmarks. We conclude the survey by discussing different open directions and challenges for transformers in 3D vision. In addition to the presented papers, we aim to frequently update the latest relevant papers along with their corresponding implementations at: https://github.com/lahoud/3d-vision-transformers.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Lahoud, Jean and Cao, Jiale and Khan, Fahad Shahbaz and Cholakkal, Hisham and Anwer, Rao Muhammad and Khan, Salman and Yang, Ming-Hsuan},
month = aug,
year = {2022},
note = {arXiv:2208.04309 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/AN3SNSVC/Lahoud et al. - 2022 - 3D Vision with Transformers A Survey.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/6BXWCFI5/2208.html:text/html},
}
@article{dong_ellipse_2021,
title = {Ellipse {R}-{CNN}: {Learning} to {Infer} {Elliptical} {Object} from {Clustering} and {Occlusion}},
volume = {30},
issn = {1057-7149, 1941-0042},
shorttitle = {Ellipse {R}-{CNN}},
url = {http://arxiv.org/abs/2001.11584},
doi = {10.1109/TIP.2021.3050673},
abstract = {Images of heavily occluded objects in cluttered scenes, such as fruit clusters in trees, are hard to segment. To further retrieve the 3D size and 6D pose of each individual object in such cases, bounding boxes are not reliable from multiple views since only a little portion of the object's geometry is captured. We introduce the first CNN-based ellipse detector, called Ellipse R-CNN, to represent and infer occluded objects as ellipses. We first propose a robust and compact ellipse regression based on the Mask R-CNN architecture for elliptical object detection. Our method can infer the parameters of multiple elliptical objects even they are occluded by other neighboring objects. For better occlusion handling, we exploit refined feature regions for the regression stage, and integrate the U-Net structure for learning different occlusion patterns to compute the final detection score. The correctness of ellipse regression is validated through experiments performed on synthetic data of clustered ellipses. We further quantitatively and qualitatively demonstrate that our approach outperforms the state-of-the-art model (i.e., Mask R-CNN followed by ellipse fitting) and its three variants on both synthetic and real datasets of occluded and clustered elliptical objects.},
urldate = {2023-01-17},
journal = {IEEE Transactions on Image Processing},
author = {Dong, Wenbo and Roy, Pravakar and Peng, Cheng and Isler, Volkan},
year = {2021},
note = {arXiv:2001.11584 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics},
pages = {2193--2206},
annote = {Comment: 18 pages, 20 figures, 7 tables},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/QERXUH24/Dong et al. - 2021 - Ellipse R-CNN Learning to Infer Elliptical Object.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/KNUA7S3S/2001.html:text/html},
}
@misc{haven_hdris_nodate,
title = {{HDRIs}},
url = {https://polyhaven.com/hdris/},
abstract = {Hundreds of free HDRI environments, ready to use for any purpose.},
language = {en},
urldate = {2023-01-17},
journal = {Poly Haven},
author = {Haven, Poly},
}
@misc{zhang_dino_2022,
title = {{DINO}: {DETR} with {Improved} {DeNoising} {Anchor} {Boxes} for {End}-to-{End} {Object} {Detection}},
shorttitle = {{DINO}},
url = {http://arxiv.org/abs/2203.03605},
doi = {10.48550/arXiv.2203.03605},
abstract = {We present DINO ({\textbackslash}textbf\{D\}ETR with {\textbackslash}textbf\{I\}mproved de{\textbackslash}textbf\{N\}oising anch{\textbackslash}textbf\{O\}r boxes), a state-of-the-art end-to-end object detector. \% in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrastive way for denoising training, a mixed query selection method for anchor initialization, and a look forward twice scheme for box prediction. DINO achieves \$49.4\$AP in \$12\$ epochs and \$51.3\$AP in \$24\$ epochs on COCO with a ResNet-50 backbone and multi-scale features, yielding a significant improvement of \${\textbackslash}textbf\{+6.0\}\${\textbackslash}textbf\{AP\} and \${\textbackslash}textbf\{+2.7\}\${\textbackslash}textbf\{AP\}, respectively, compared to DN-DETR, the previous best DETR-like model. DINO scales well in both model size and data size. Without bells and whistles, after pre-training on the Objects365 dataset with a SwinL backbone, DINO obtains the best results on both COCO {\textbackslash}texttt\{val2017\} (\${\textbackslash}textbf\{63.2\}\${\textbackslash}textbf\{AP\}) and {\textbackslash}texttt\{test-dev\} ({\textbackslash}textbf\{\${\textbackslash}textbf\{63.3\}\$AP\}). Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results. Our code will be available at {\textbackslash}url\{https://github.com/IDEACVR/DINO\}.},
urldate = {2023-01-17},
publisher = {arXiv},
author = {Zhang, Hao and Li, Feng and Liu, Shilong and Zhang, Lei and Su, Hang and Zhu, Jun and Ni, Lionel M. and Shum, Heung-Yeung},
month = jul,
year = {2022},
note = {arXiv:2203.03605 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/NFL7ASJI/Zhang et al. - 2022 - DINO DETR with Improved DeNoising Anchor Boxes fo.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/IJEI9W7E/2203.html:text/html},
}
@article{legendre_supplemental_nodate,
title = {Supplemental {Materials} for {DeepLight}: {Learning} {Illumination} for {Unconstrained} {Mobile} {Mixed} {Reality}},
language = {en},
author = {LeGendre, Chloe and Ma, Wan-Chun and Fyffe, Graham and Flynn, John and Charbonnel, Laurent and Busch, Jay and Debevec, Paul},
file = {LeGendre et al. - Supplemental Materials for DeepLight Learning Ill.pdf:/home/laurent/Zotero/storage/BKVSXXYE/LeGendre et al. - Supplemental Materials for DeepLight Learning Ill.pdf:application/pdf},
}
@misc{noauthor_multi_nodate,
title = {Multi {Illumination} {Dataset}},
url = {https://projects.csail.mit.edu/illumination/databrowser/},
urldate = {2023-01-24},
}
@misc{cheng_masked-attention_2022,
title = {Masked-attention {Mask} {Transformer} for {Universal} {Image} {Segmentation}},
url = {http://arxiv.org/abs/2112.01527},
doi = {10.48550/arXiv.2112.01527},
abstract = {Image segmentation is about grouping pixels with different semantics, e.g., category or instance membership, where each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing specialized architectures for each task. We present Masked-attention Mask Transformer (Mask2Former), a new architecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components include masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most notably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU on ADE20K).},
urldate = {2023-01-25},
publisher = {arXiv},
author = {Cheng, Bowen and Misra, Ishan and Schwing, Alexander G. and Kirillov, Alexander and Girdhar, Rohit},
month = jun,
year = {2022},
note = {arXiv:2112.01527 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Artificial Intelligence},
annote = {Comment: CVPR 2022. Project page/code/models: https://bowenc0221.github.io/mask2former},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/9XS7V8FP/Cheng et al. - 2022 - Masked-attention Mask Transformer for Universal Im.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/LC5ZEEIC/2112.html:text/html},
}
@misc{dai_ao2-detr_2022,
title = {{AO2}-{DETR}: {Arbitrary}-{Oriented} {Object} {Detection} {Transformer}},
shorttitle = {{AO2}-{DETR}},
url = {http://arxiv.org/abs/2205.12785},
abstract = {Arbitrary-oriented object detection (AOOD) is a challenging task to detect objects in the wild with arbitrary orientations and cluttered arrangements. Existing approaches are mainly based on anchor-based boxes or dense points, which rely on complicated hand-designed processing steps and inductive bias, such as anchor generation, transformation, and non-maximum suppression reasoning. Recently, the emerging transformer-based approaches view object detection as a direct set prediction problem that effectively removes the need for handdesigned components and inductive biases. In this paper, we propose an Arbitrary-Oriented Object DEtection TRansformer framework, termed AO2-DETR, which comprises three dedicated components. More precisely, an oriented proposal generation mechanism is proposed to explicitly generate oriented proposals, which provides better positional priors for pooling features to modulate the cross-attention in the transformer decoder. An adaptive oriented proposal refinement module is introduced to extract rotation-invariant region features and eliminate the misalignment between region features and objects. And a rotationaware set matching loss is used to ensure the one-to-one matching process for direct set prediction without duplicate predictions. Our method considerably simplifies the overall pipeline and presents a new AOOD paradigm. Comprehensive experiments on several challenging datasets show that our method achieves superior performance on the AOOD task.},
language = {en},
urldate = {2023-01-25},
publisher = {arXiv},
author = {Dai, Linhui and Liu, Hong and Tang, Hao and Wu, Zhiwei and Song, Pinhao},
month = may,
year = {2022},
note = {arXiv:2205.12785 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
file = {Dai et al. - 2022 - AO2-DETR Arbitrary-Oriented Object Detection Tran.pdf:/home/laurent/Zotero/storage/BL5QA9W7/Dai et al. - 2022 - AO2-DETR Arbitrary-Oriented Object Detection Tran.pdf:application/pdf},
}
@misc{mmrotate_contributors_openmmlab_2022,
title = {{OpenMMLab} rotated object detection toolbox and benchmark},
copyright = {Apache-2.0},
url = {https://github.com/open-mmlab/mmrotate},
abstract = {AO2-DETR: Arbitrary-Oriented Object Detection Transformer},
urldate = {2023-01-25},
author = {{MMRotate Contributors}},
month = feb,
year = {2022},
note = {original-date: 2022-05-26T01:38:15Z},
}
@misc{loshchilov_decoupled_2019,
title = {Decoupled {Weight} {Decay} {Regularization}},
url = {http://arxiv.org/abs/1711.05101},
doi = {10.48550/arXiv.1711.05101},
abstract = {L\$\_2\$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is {\textbackslash}emph\{not\} the case for adaptive gradient algorithms, such as Adam. While common implementations of these algorithms employ L\$\_2\$ regularization (often calling it "weight decay" in what may be misleading due to the inequivalence we expose), we propose a simple modification to recover the original formulation of weight decay regularization by {\textbackslash}emph\{decoupling\} the weight decay from the optimization steps taken w.r.t. the loss function. We provide empirical evidence that our proposed modification (i) decouples the optimal choice of weight decay factor from the setting of the learning rate for both standard SGD and Adam and (ii) substantially improves Adam's generalization performance, allowing it to compete with SGD with momentum on image classification datasets (on which it was previously typically outperformed by the latter). Our proposed decoupled weight decay has already been adopted by many researchers, and the community has implemented it in TensorFlow and PyTorch; the complete source code for our experiments is available at https://github.com/loshchil/AdamW-and-SGDW},
urldate = {2023-01-29},
publisher = {arXiv},
author = {Loshchilov, Ilya and Hutter, Frank},
month = jan,
year = {2019},
note = {arXiv:1711.05101 [cs, math]},
keywords = {Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Mathematics - Optimization and Control},
annote = {Comment: Published as a conference paper at ICLR 2019},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/JJ33N7CY/Loshchilov and Hutter - 2019 - Decoupled Weight Decay Regularization.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/R3Y868LM/1711.html:text/html},
}
@misc{kingma_adam_2017,
title = {Adam: {A} {Method} for {Stochastic} {Optimization}},
shorttitle = {Adam},
url = {http://arxiv.org/abs/1412.6980},
doi = {10.48550/arXiv.1412.6980},
abstract = {We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.},
urldate = {2023-01-29},
publisher = {arXiv},
author = {Kingma, Diederik P. and Ba, Jimmy},
month = jan,
year = {2017},
note = {arXiv:1412.6980 [cs]},
keywords = {Computer Science - Machine Learning},
annote = {Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/EQ38Q4BJ/Kingma and Ba - 2017 - Adam A Method for Stochastic Optimization.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/JSNDPECJ/1412.html:text/html},
}
@misc{ren_faster_2016,
title = {Faster {R}-{CNN}: {Towards} {Real}-{Time} {Object} {Detection} with {Region} {Proposal} {Networks}},
shorttitle = {Faster {R}-{CNN}},
url = {http://arxiv.org/abs/1506.01497},
doi = {10.48550/arXiv.1506.01497},
abstract = {State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.},
urldate = {2023-02-06},
publisher = {arXiv},
author = {Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
month = jan,
year = {2016},
note = {arXiv:1506.01497 [cs]},
keywords = {Computer Science - Computer Vision and Pattern Recognition},
annote = {Comment: Extended tech report},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/SHZFG4RW/Ren et al. - 2016 - Faster R-CNN Towards Real-Time Object Detection w.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/F3VRI6F7/1506.html:text/html},
}
@misc{noauthor_end--end_2023,
title = {End-to-{End} {Detection} {Transformer} ({DETR})},
url = {https://neuralception.com/objectdetection-detr/},
abstract = {A brief explanation of how the detection transformer (DETR) and self-attention work.},
language = {en},
urldate = {2023-02-06},
month = feb,
year = {2023},
file = {Snapshot:/home/laurent/Zotero/storage/CQBYUSC4/objectdetection-detr.html:text/html},
}
@misc{li_dn-detr_2022,
title = {{DN}-{DETR}: {Accelerate} {DETR} {Training} by {Introducing} {Query} {DeNoising}},
shorttitle = {{DN}-{DETR}},
url = {http://arxiv.org/abs/2203.01305},
doi = {10.48550/arXiv.2203.01305},
abstract = {We present in this paper a novel denoising training method to speedup DETR (DEtection TRansformer) training and offer a deepened understanding of the slow convergence issue of DETR-like methods. We show that the slow convergence results from the instability of bipartite graph matching which causes inconsistent optimization goals in early training stages. To address this issue, except for the Hungarian loss, our method additionally feeds ground-truth bounding boxes with noises into Transformer decoder and trains the model to reconstruct the original boxes, which effectively reduces the bipartite graph matching difficulty and leads to a faster convergence. Our method is universal and can be easily plugged into any DETR-like methods by adding dozens of lines of code to achieve a remarkable improvement. As a result, our DN-DETR results in a remarkable improvement (\$+1.9\$AP) under the same setting and achieves the best result (AP \$43.4\$ and \$48.6\$ with \$12\$ and \$50\$ epochs of training respectively) among DETR-like methods with ResNet-\$50\$ backbone. Compared with the baseline under the same setting, DN-DETR achieves comparable performance with \$50{\textbackslash}\%\$ training epochs. Code is available at {\textbackslash}url\{https://github.com/FengLi-ust/DN-DETR\}.},
urldate = {2023-02-06},
publisher = {arXiv},
author = {Li, Feng and Zhang, Hao and Liu, Shilong and Guo, Jian and Ni, Lionel M. and Zhang, Lei},
month = dec,
year = {2022},
note = {arXiv:2203.01305 [cs]},
keywords = {Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition},
annote = {Comment: Extended version from CVPR 2022},
file = {arXiv Fulltext PDF:/home/laurent/Zotero/storage/N7NA2XDB/Li et al. - 2022 - DN-DETR Accelerate DETR Training by Introducing Q.pdf:application/pdf;arXiv.org Snapshot:/home/laurent/Zotero/storage/DM3P2FKW/2203.html:text/html},
}