PyTorch chose to make it Any because they expect its users' code
to be "highly dynamic": https://github.com/pytorch/pytorch/pull/104321
It is not the case for us, in Refiners having untyped code
goes contrary to one of our core principles.
Note that there is currently an open PR in PyTorch to
return `Module | Tensor`, but in practice this is not always
correct either: https://github.com/pytorch/pytorch/pull/115074
I also moved Residuals-related code from SD1 to latent_diffusion
because SDXL should not depend on SD1.