* closure for all optimizers
* rename hook and take care of alternating backwards
* add comment
* training_loop_fix
* closure whenever possible
* training_loop
* simple tests that count backward calls
* fix test to work with closure
* remove debugging statement
* better place
* check grads after backward
* start fixing manual optimization
* skip step when result returned by closure was None
* fix gradient clipping test to work with closure
* attribute dict result only for automatic optimization
* adjust backward calls in accelerator
* adjust where to call gradient clipping
* adjust backward calls in tests
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* pass kwargs to xla optimizer
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Added getstate/setstate method for torch.save serialization, added additional Optional Typing to results object
* Added tests to ensure torch.save does not fail
* Added flags to ensure compatible ddp cpu environment
* Removed torch version check due to minimum already being 1.3, reduced epochs for speed
* Moved tests to separate file
* Update to accelerator, move to ddp_spawn to prevent hanging ddp