* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* cannot pass an int as default_save_path
* refactor log message
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* fix test with new epoch indexing
* fix progress bar totals
* fix off by one error (see #2289) epoch starts at 0 now
* added missing imports
* fix hpc_save folderpath
* fix formatting
* fix tests
* small fixes from a rebase
* fix
* tmpdir
* tmpdir
* tmpdir
* wandb
* fix merge conflict
* add back evaluation after training
* test_resume_early_stopping_from_checkpoint TODO
* undo the horovod check
* update changelog
* remove a duplicate test from merge error
* try fix dp_resume test
* add the logger fix from master
* try remove default_root_dir
* try mocking numpy
* try import numpy in docs test
* fix wandb test
* pep 8 fix
* skip if no amp
* dont mock when doctesting
* install extra
* fix the resume ES test
* undo conf.py changes
* revert remove comet pickle from test
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update weights_loading.rst
* Update weights_loading.rst
* Update weights_loading.rst
* renamed flag
* renamed flag
* revert the None check in logger experiment name/version
* add the old comments
* _experiment
* test chckpointing on DDP
* skip the ddp test on windows
* cloudpickle
* renamed flag
* renamed flag
* parentheses for clarity
* apply suggestion max epochs
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* no cov
* no cov
* ReduceOp
* group
* reduce_op.sum
* Update sklearns.py
* formatting
* horovod
* Apply suggestions from code review
* horovod
* horovod
* horovod
* horovod
* ci
* print
* ci
* timeout
* timeout
* time
* fix
* distributed cpu
* pipes
* time
* cpu
* spawn
* spawn
* spawn
* tp
* separate
* os
* os
* npm
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix meta tags creating empty lines
* pyright
* node
* fix httpserver address
* drop tutils.default_trainer_options
* imports
* Better fix for load_from_checkpoint() not working with absolute path on Windows (#2294)
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* drop duplicate
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: airium <airium@outlook.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: AIRIUM <38249940+airium@users.noreply.github.com>
* added tpu params test
* added tests
* removed xla imports
* added test cases for TPU
* fix pep 8 issues
* refactorings and comments
* add message to MisconfigurationException
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* test if device is set correctly
* added TPU device check
removed mark.spawn
* removed device selection
* remove xla_device call
* readded spawn due to test failures
* add TODO for tpu check
* Apply suggestions from code review
* Apply suggestions from code review
* flake8
* added tpu args to cli tests
* added support for tpu_core selection via cli
* fixed flake formatting
* replaced default_save_path with default_root_dir
* added check for data type for tpu_cores
* fixed flake indent
* protected
* protected
* added tpu params test
* added tests
* removed xla imports
* test if device is set correctly
* added support for tpu_core selection via cli
* replaced default_save_path with default_root_dir
* added check for data type for tpu_cores
* chlog
* fixed tpu cores error
* rebased with latest changes
* flake fix
* Update pytorch_lightning/trainer/distrib_parts.py
added suggesstion
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* filter valid args
* error on unknown manual args
* added test
* changelog
* update docs and doctest
* simplify
* doctest
* doctest
* doctest
* better test with mock check for init call
* fstring
* extend test
* skip test on 3.6 not working
Co-authored-by: William Falcon <waf2107@columbia.edu>
* `add_argparse_args` method fixed (argument types added)
* CHANGELOG.md upd
* autopep8 fixes
* --gpus=0 removed from test (for ci tests)
* typo fixed
* reduce on plateau scheduler fixed
* Trainer cli related tests moved to test_trainer_cli.py
* refactored: get_init_arguments_and_types is a public classmethod of the Trainer now
* test_get_init_arguments_and_types added
* autopep8 fixes
* Trainer cli related tests moved to test_trainer_cli.py
* refactored: get_init_arguments_and_types is a public classmethod of the Trainer now
* test_get_init_arguments_and_types added
* autopep8 fixes
* Trainer cli related tests moved to test_trainer_cli.py
* refactored: get_init_arguments_and_types is a public classmethod of the Trainer now
* test_get_init_arguments_and_types added
* autopep8 fixes
* Trainer cli related tests moved to test_trainer_cli.py
* test_get_init_arguments_and_types added
* autopep8 fixes
* Apply suggestions from code review
* cosmetics
* cosmetics
* Update pytorch_lightning/trainer/trainer.py
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* `Trainer.get_init_arguments_and_types` now returns arg types wrapped in tuples (not in sets)
* deprecated args are now ignored in argparser
* get_deprecated_arg_names small refactor
* get_deprecated_arg_names bug fixed
* Trainer cli related tests moved to test_trainer_cli.py
* refactored: get_init_arguments_and_types is a public classmethod of the Trainer now
* test_get_init_arguments_and_types added
* autopep8 fixes
* Trainer cli related tests moved to test_trainer_cli.py
* autopep8 fixes
* Trainer cli related tests moved to test_trainer_cli.py
* Trainer cli related tests moved to test_trainer_cli.py
* test_get_init_arguments_and_types added
* autopep8 fixes
* autopep8 fixes
* Apply suggestions from code review
* cosmetics
* cosmetics
* Update pytorch_lightning/trainer/trainer.py
Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>
* `Trainer.get_init_arguments_and_types` now returns arg types wrapped in tuples (not in sets)
* deprecated args are now ignored in argparser
* get_deprecated_arg_names small refactor
* get_deprecated_arg_names bug fixed
* Update pytorch_lightning/trainer/trainer.py
Co-Authored-By: Joe Davison <joe@huggingface.co>
* Update pytorch_lightning/trainer/trainer.py
Co-Authored-By: Joe Davison <joe@huggingface.co>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Joe Davison <joe@huggingface.co>
Co-authored-by: William Falcon <waf2107@columbia.edu>