* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* cannot pass an int as default_save_path
* refactor log message
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* add state_dict for early stopping
* move best attr after monitor_op defined
* improve early stopping and model checkpoint callbacks
* fix formatting
* fix attr init order
* clean up setting of default_root_dir attr
* logger needs default root dir set first
* reorg trainer init
* remove direct references to checkpoint callback
* more fixes
* more bugfixes
* run callbacks at epoch end
* update tests to use on epoch end
* PR cleanup
* address failing tests
* refactor for homogeneity
* fix merge conflict
* separate tests
* tests for early stopping bug regressions
* small fixes
* revert model checkpoint change
* typo fix
* fix tests
* update train loop
* fix test case
* appease the linter
* fix some doctests
* move config to callback
* fixes from rebase
* fixes from rebase
* chlog
* docs
* reformat
* formatting
* fix
* fix
* fixes from rebase
* add new test for patience
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update pytorch_lightning/callbacks/model_checkpoint.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update tests/callbacks/test_early_stopping.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix formatting
* remove enable_early_stop attribute
* fix test with new epoch indexing
* fix progress bar totals
* fix off by one error (see #2289) epoch starts at 0 now
* added missing imports
* fix hpc_save folderpath
* fix formatting
* fix tests
* small fixes from a rebase
* fix
* tmpdir
* tmpdir
* tmpdir
* wandb
* fix merge conflict
* add back evaluation after training
* test_resume_early_stopping_from_checkpoint TODO
* undo the horovod check
* update changelog
* remove a duplicate test from merge error
* try fix dp_resume test
* add the logger fix from master
* try remove default_root_dir
* try mocking numpy
* try import numpy in docs test
* fix wandb test
* pep 8 fix
* skip if no amp
* dont mock when doctesting
* install extra
* fix the resume ES test
* undo conf.py changes
* revert remove comet pickle from test
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Update weights_loading.rst
* Update weights_loading.rst
* Update weights_loading.rst
* renamed flag
* renamed flag
* revert the None check in logger experiment name/version
* add the old comments
* _experiment
* test chckpointing on DDP
* skip the ddp test on windows
* cloudpickle
* renamed flag
* renamed flag
* parentheses for clarity
* apply suggestion max epochs
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* no cov
* no cov
* ReduceOp
* group
* reduce_op.sum
* Update sklearns.py
* formatting
* horovod
* Apply suggestions from code review
* horovod
* horovod
* horovod
* horovod
* ci
* print
* ci
* timeout
* timeout
* time
* fix
* distributed cpu
* pipes
* time
* cpu
* spawn
* spawn
* spawn
* tp
* separate
* os
* os
* npm
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix meta tags creating empty lines
* pyright
* node
* fix httpserver address
* drop tutils.default_trainer_options
* imports
* Better fix for load_from_checkpoint() not working with absolute path on Windows (#2294)
* Fix load_from_checkpoint() not working with URL on Windows
* Update CHANGELOG
* Update CHANGELOG.md
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* drop duplicate
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: airium <airium@outlook.com>
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: AIRIUM <38249940+airium@users.noreply.github.com>
* deal with NotImplementedError raised by torchtext
* deal with NotImplementedError raised by torchtext
* Added tests for dataloader which raise NotImplementedError in __len__()
* Fixed some typos
* enabled tests for dataloader raising NotImplementedError in __len__ and corrected match string for raised exception
* deleted empty line for style compliance
* refactored CustomNotImplementedErrorDataloader to derive from CustomInfDataloader
* enabled reduced number of not_implemented_error dataloader test to reduce runtime for continuous integration
* reduced test number of not_implemented_error dataloader test further to reduce test time
* reduced test number of not_implemented_error dataloader test to one to reduce test time
* disabled all not_implemented_error dataloader test to see if test pass in time
* added __next__ with a reduced number (5) of elements after which CustomNotImplementedErrorDataloader stops to speedup test.
* enabling all not_implemented_error dataloader test
* added brief description of change and relation of torchtext
* CustomNotImplementedErrorDataloader reduced number of batches served to 2.
* Update CHANGELOG.md
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Apply suggestions from code review
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Disable parallelism in dataloader
Suspect that it might cause pytest to hang more frequent
* added max_steps=None to Trainer in not_implemented_error dataloader tests
* rearranged not_implemented_error test in file to group them together
* disabled parallel data loading
Reason: testing if that stops the test framework from hanging.
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Thomas Schaaf <tschaaf@cs.cmu.edu>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* added tpu params test
* added tests
* removed xla imports
* added test cases for TPU
* fix pep 8 issues
* refactorings and comments
* add message to MisconfigurationException
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* test if device is set correctly
* added TPU device check
removed mark.spawn
* removed device selection
* remove xla_device call
* readded spawn due to test failures
* add TODO for tpu check
* Apply suggestions from code review
* Apply suggestions from code review
* flake8
* added tpu args to cli tests
* added support for tpu_core selection via cli
* fixed flake formatting
* replaced default_save_path with default_root_dir
* added check for data type for tpu_cores
* fixed flake indent
* protected
* protected
* added tpu params test
* added tests
* removed xla imports
* test if device is set correctly
* added support for tpu_core selection via cli
* replaced default_save_path with default_root_dir
* added check for data type for tpu_cores
* chlog
* fixed tpu cores error
* rebased with latest changes
* flake fix
* Update pytorch_lightning/trainer/distrib_parts.py
added suggesstion
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* Init fix num_batches
* Fix num_batches in case of multiple dataloaders
* Apply suggestions from code review
* Changes based on suggestions
* Flake8
* Add test to check num_batches
* generalize dataloader percent check test
* fix formatting
* remove hparams
* tests
* CHANGELOG
* Update CHANGELOG.md
* max_batches can be int
* conflict and rebase
* add back the test
fix
fix message
0.0 works
Revert "fix message"
This reverts commit 839cacf8b8610f4e697e654ef6f3d2501bf23984.
* update changelog
* Update CHANGELOG.md
* Fix num batches in case of multiple dataloaders and percent_check (#1920)
* git conflict
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* missing union
* doc update suggestion by @rohitgr7
* extend test
* changelog
* docs add note about multiple loaders
* update changelog
* remove unused variable
Co-authored-by: rohitgr7 <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Fixed average_precision metric, parenthesis were missing. Added test test that failed with the old implementation
* Modified CHANGELOG.md
* Update CHANGELOG.md
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Revert "deprecated: epoch indexing from 1 (#2206)"
This reverts commit f94b919b
* chlog
* grad index
* Apply suggestions from code review
* tests
* fix
* test
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* drop train_percent_check
* chlog
* deprecated
* deprecated
* deprecated
* tests
* tests
* Apply suggestions from code review
* tests
* hydra support
* tests
* hydra support
* hydra support
* hydra support
* tests
* typo
* typo
* Update test_dataloaders.py
* docs
* docs
* docs
* docs
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fixed percent check for val/test
* fixed percent check for val/test
* fixed percent check for val/test
* fixed percent check for val/test
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* overfit_pct now uses train loaders for val and test and does not shuffle
* add on fit_start on fit_end hooks
* add on fit_start on fit_end hooks
* add on fit_start on fit_end hooks
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* First attempt at auto-moving data for inference
* Correct my copypaste errors
* Correct for if device is CPU
* Get rid of the WIP code I accidentally added
* Add tests
* Make tests more foolproof
* Make sure we stick with pep8 formatting
* Clarify docs a little
* Apply suggestions from code review
* Get everything working again hopefully
* refactor and added hook
variant a
variant b
add test
revert rename
add changelog
docs
* move changelog entry to top
* Move data transfer to utilities
* Add back in warnings for autotransfer
* Get rid of the test code I ended up accidentally commiting again
* Add docs any changelog
* Correct PR number in Changelog
* Correct changelog
* Update data.py
* Update test_cpu.py
* make a decorator
* type hint
* changelog
* changelog
* remove old function
* import
* test for decorator
* fix test
* remove old test
* doctest
* apply decorator directly
* convert doctest to code block
* prevent side effects in tests
* fix merge
* update forward docs
* update docs
* added docs in section "deployment / prediction"
* update changelog
Co-authored-by: Hengjian Jia <henryjia18@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* missed
* format
* math
* req
* notes
* fix CI
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* allow loading checkpoints from urls
* tmpdir_server fixture
* test cases for loading checkpoints from url
* dir => root_dir
* default map_location to None
* test case for resume_from_checkpoint
* changelog
* doc update
* monkeypatch TORCH_HOME to avoid caching
* Use a threading server with random ports so that it is easier to clean up
* test fixes
* pep8 fix
* ThreadingHTTPServer support in 3.6
* pep8 fix
* fix changelog
* separate tests for urls
* typo
Co-authored-by: Peter Yu <2057325+yukw777@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* do not include local vars in auto collection
* add test
* add test for model with "self" renamed to "obj"
* skip decorator
* changelog
* changelog
* update docs
* remove obsolete child collection
* generalize **args, **kwargs names
* docs
* also update varargs passed in
* Revert "also update varargs passed in"
This reverts commit 3d7a30dbee07a513ee13e1cc3e08ca5ccdb85734.
* update test
* black
Added throught black.toml other options are hard so far
No caching for black github action
Moved from black.toml to pyproject.toml
Exclude not only yml but also yaml
Update pyproject.toml
Co-authored-by: Thomas Johansen <thomasjo@gmail.com>
Update .github/workflows/code-formatting-check.yml
mergify
Remove formating check
E231 error ignoring because of black formating
Updated CONTRIBUTING to the master
* Update .github/workflows/code-formatting-check.yml
* Bump black to 19.10b0 version
* resolved incorrect merge of CONTRIBUTING,
Black skipping string normalization
* Minor fixes in CONTRIBUTING, two typos
* Update setup.cfg
* chlog
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* refactor and added hook
variant a
variant b
add test
revert rename
add changelog
docs
* resolve merge duplication
* overridden typo
* fix test
* tpu id
* raise if TPU not available
* re-use apply_to_collection function for parsing collections
* comment
* make utility function available to user
* documentation
* move changelog entry to top
* fix tpu transfer call
* fix call
* remove hardcoded string
* improve test
* call model hook by default
* Apply suggestions from code review
* rename utility function
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* use parallel loader
* Revert "use parallel loader"
This reverts commit ed6e7583
* select tpu id for pl
* condition if tpu_id is None
* added info to changelog
* Revert "condition if tpu_id is None"
This reverts commit 1fb6e586
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix(wandb): use same logger on multiple training loops
New training loops reset step to 0 which would previously try to overwrite logs
fix#2015
* docs(changelog.md): add reference to PR 2055
* Add an additional attribute to ModelCheckpoint to keep track of the best model's path
Currently, only the best metric value is directly tracked. This new attribute will help in uses cases where the trained model needs to be used or tracked right after training.
* Add small description and usage example to docs
* Fix PEP8 issues
* Fix doctest example
* Fix expected output in doctest
* Apply suggestions from code review
* Show example as code block instead of doctest
* Apply suggestions from code review
* Update CHANGELOG.md
* Rename `ModelCheckpoint.best` to `ModelCheckpoint.best_model_score`
Also rename `ModelCheckpoint.best_model` (added in this PR) to `ModelCheckpoint.best_model_path`, for consistency, and `kth_best_model` to `kth_best_model_path`.
* Update pytorch_lightning/trainer/training_io.py
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Add warning when loading checkpoint from an old version
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* fix chlog
* test for #1729
* hist
* update
* Document use case of passing test dataloaders to Trainer.test() (#1992)
* Issue 1990 Doc patch.
* Codeblock directive.
* Update to reflect current state of pytorch-lightning
* Final grammar cleaning. I hope these commits are squashed.
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: authman <uapatira@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* code rule
* Apply suggestions from code review
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* chlog
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
* fixed undesired behaviour due to dict.fromkeys
* a test for log length consistency
* runtime-warn if no schedulers are configured
* chlog
* move
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* filter valid args
* error on unknown manual args
* added test
* changelog
* update docs and doctest
* simplify
* doctest
* doctest
* doctest
* better test with mock check for init call
* fstring
* extend test
* skip test on 3.6 not working
Co-authored-by: William Falcon <waf2107@columbia.edu>
* FixesPyTorchLightning/pytorch-lightning#490
`EarlyStopping` should check the metric of interest `on_validation_end` rather than `on_epoch_end`.
In a normal scenario, this does not cause a problem, but in combination with `check_val_every_n_epoch>1` in the `Trainer` it results in a warning or in a `RuntimeError` depending on `strict`.
* Highlighted that ES callback runs on val epochs in docstring
* Updated EarlyStopping in rst doc
* Update early_stopping.py
* Update early_stopping.rst
* Update early_stopping.rst
* Update early_stopping.rst
* Update early_stopping.rst
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* Update docs/source/early_stopping.rst
* fix doctest indentation warning
* Train loop calls early_stop.on_validation_end
* chlog
Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* saves model every epoch
* implement test for save_last
* Update CHANGELOG.md
* Update CHANGELOG.md
* changes test description
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
* Allow dataloaders without sampler field present
Sometimes we have a custom dataloader that doesn't have a sampler, better to check that the field is there before reading it.
* chlog
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
* Add flag to `dump_checkpoint` for only including weights
`ModelCheckpoint` then passes `self.save_weights_only` to the save function.
* Fix tests and add changelog entry
* Add check and descriptive message when training state is restored from a weights only checkpoint
Also add a test for making sure `ModelCheckpoint.save_weights_only` works as expected.
* Fix weights-only test to properly match expected exception
* Apply suggestions from code review
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
The changes are quite local and limited in nature -- viz., checking for
some indicator environment variables. We check for (SLURM_LOCALID,
NODE_RANK, GROUP_RANK) in order. If multiple are found set, a warning is
logged.
This patch also fixes a minor bug with comparing the `WORLD_SIZE`
environment variable. This can be a string type.
* Fixed typing annotation by adding boolean type. After that Profiler flag will be added to argparse.
* Updated CHANGELOG.md
* Updated git_init_arguments_and_types() to pass doctests.
* Added doctest example to add_argparse_parser()
* Option to provide seed to random generators to ensure reproducibility
I added small function in utilities which imports torch, numpy, python
random and sets seed for all of the libraries to ensure reproducibility
of results.
* Apply recommendations from core contributors on seeding
1. Moved the seeding code to another file
2. Make deterministic as a parameter for trainer class
3. Add assertions for seeding numpy
4. Added warnings
5. torch.manual_seed should be enough for seeding torch
* Revert "Apply recommendations from core contributors on seeding"
This reverts commit a213c8e6882eec8a9e7408b9418926d2db7c5461.
* Revert "Revert "Apply recommendations from core contributors on seeding""
This reverts commit 59b2da53c62878de7aab0aa3feb3115e105eea06.
* Change in test, for correct seeding
* Allow seed equal to 0
* Allow seed to be uint32.max
* Added deterministic to benchmarks
* Cuda manual seed as in benchmark seeding
* Seeding should be done before model initialization
* cuda manual_seed is not necessary
* Fixing seed test_cpu_lbfgs
On some seeds seems like lbfgs doesn't converge.
So I fixed the seed during testing.
* rebasing issue with old reproducibility.py
* Improved documentation and ability to seed before initializing Train
class
* Change in docs
* Removed seed from trainer, update for documentation
* Typo in the docs
* Added seed_everything to _all_
* Fixing old changes
* Model initialization should be earlier then Trainer
* Update pytorch_lightning/trainer/__init__.py
From Example to testcode
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
* Fixing according to the contributors suggestions
* Moving horovod deterministic to Trainer class
* deterministic flag affects horovod docs update
* Improved static typing
* Added deterministic to test runners of horovod
It is failing on some versions, not very predictable
* static seeds for horovod tests
* Change for reset_seed function in tests
* Seeding horovod using reset_seed from tutils
* Update pytorch_lightning/trainer/__init__.py
* chlog
* Update trainer.py
* change "testcode" to "Example" in trainer init documentation
* Update pytorch_lightning/trainer/seed.py, first line in comment
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: William Falcon <waf2107@columbia.edu>
* missing
* RC
* tol
* Apply suggestions from code review
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* test
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
* update prog. bar metrics on train epoch end
* changelog
* wip test
* more thorough testing
* comments
* update docs
* move test
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
* Fix Horovod backend to disable progress bar on all ranks except 0
* Add join barriers
* Added changelog
* Make protected and add verbosity
* Refactor to disable progress bar callback in train
* Removed vebose setting
* Add cache check for Horovod
* Test run again
* Updated comment
* Always skip cache for Horovod
* Only reinstall when necessary
* Added separate step
* Fixed spacing
* Skip Python 3.8
* params
* drop acc
* Fix Horovod distributed backend to set the root_gpu
* Fixed test
* Fixed tests
* Fixed lint
* Set root_gpu during initialization
* chlog
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
* diable val and test shuffling
* diable val and test shuffling
* diable val and test shuffling
* diable val and test shuffling
* log
* condition
* shuffle
* refactor
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>