PyTorch chose to make it Any because they expect its users' code
to be "highly dynamic": https://github.com/pytorch/pytorch/pull/104321
It is not the case for us, in Refiners having untyped code
goes contrary to one of our core principles.
Note that there is currently an open PR in PyTorch to
return `Module | Tensor`, but in practice this is not always
correct either: https://github.com/pytorch/pytorch/pull/115074
I also moved Residuals-related code from SD1 to latent_diffusion
because SDXL should not depend on SD1.
And replaced the remaining Sum-Identity layers by Residual.
The tolerance used to compare SAM's ViT models has been tweaked: for
some reasons there is a small difference (in float32) in the neck layer
(first conv2D)
Co-authored-by: Cédric Deltheil <cedric@deltheil.me>