* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* Fixes#2455
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* added early stop tpu test
* Add Github Action to run TPU tests.
* Trigger new Github Actions run.
* Clean up more comments.
* Use different fixed version of ml-testing-accelerators and update config to match.
* use cluster in us-central1-a
* Run 'gcloud logging read' directly without 'echo' to preserve newlines.
* cat coverage.xml on the TPU VM side and upload xml on the Github Action side
* Use new commit on ml-testing-accelerators so command runs fully.
* Preserve newlines in the xml and use if: always() temporarily to upload codecov
* Use pytorch_lightning for coverage instead of pytorch-lightning
* Remove the debug cat of coverage xml
* Apply suggestions from code review
* jsonnet rename
* name
* add codecov flags
* add codecov flags
* codecov
* codecov
* revert codecov
* Clean up after apt-get and remove old TODOs.
* More codefactor cleanups.
* drone
* drone
* disable codecov
* cleaning
* docker py versions
* docker py 3.7
* readme
* bash
* docker
* freeze conda
* py3.6
* Stop using apt-get clean.
* Dont rm pytorch-lightning
* Update docker/tpu/Dockerfile
* Longer timeout in the Github Action to wait for GKE to finish.
* job1
* job2
* job3
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* fix and test for ddp block logging rank > 0
* rename
* use the dummy logger
* dummy logger test
* set the logger in model
* decorator for rank zero experiment
* simplify check
* simplify
* fix problem with None in checkpoint path
* revert configure logger
* unused import
* offline
* try rank 0 decorator in checkpoint
* try fix test
* imgs
* add asserts to make sure log zero only saves checkpoints
* add asserts to make sure log zero only saves checkpoints
* add asserts to make sure log zero only saves checkpoints
* add asserts to make sure log zero only saves checkpoints
* add asserts to make sure log zero only saves checkpoints
* fix tpu tests
* fix tpu tests
Co-authored-by: William Falcon <waf2107@columbia.edu>
* Adding importing ipywidgets before importing tqdm.auto to make sure ipywidgets is installed.
* Updated CHANGELOG.md
* Updated ipywidgets importing checks to @awaelchli comments.
Co-authored-by: William Falcon <waf2107@columbia.edu>
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* cannot pass an int as default_save_path
* refactor log message
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* fix test with new epoch indexing
* fix progress bar totals
* fix off by one error (see #2289) epoch starts at 0 now
* added missing imports
* fix hpc_save folderpath
* fix formatting
* fix tests
* small fixes from a rebase
* fix
* tmpdir
* tmpdir
* tmpdir
* wandb
* fix merge conflict
* add back evaluation after training
* test_resume_early_stopping_from_checkpoint TODO
* undo the horovod check
* update changelog
* remove a duplicate test from merge error
* try fix dp_resume test
* add the logger fix from master
* try remove default_root_dir
* try mocking numpy
* try import numpy in docs test
* fix wandb test
* pep 8 fix
* skip if no amp
* dont mock when doctesting
* install extra
* fix the resume ES test
* undo conf.py changes
* revert remove comet pickle from test
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update weights_loading.rst
* Update weights_loading.rst
* Update weights_loading.rst
* renamed flag
* renamed flag
* revert the None check in logger experiment name/version
* add the old comments
* _experiment
* test chckpointing on DDP
* skip the ddp test on windows
* cloudpickle
* renamed flag
* renamed flag
* parentheses for clarity
* apply suggestion max epochs
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
`save_top_k` should be an `int` and have been mentioned as `save_top_k=True` in the snippet provided under 'Saving and Loading Weights' docs. Changed it to its default value (1) to make it consistent.
Signed-off-by: Kshitij Patil <kshitijpatil98@gmail.com>