Commit Graph

4859 Commits

Author SHA1 Message Date
thomas chaton 1a6dcbd422
[bugfix] Resolve Kineto Profiler for Conda (#7376) 2021-05-05 11:54:16 +00:00
ananthsub 98670c83a9
Deprecate`truncated_bptt_steps` flag on Trainer in favor of same setting on the LightningModule (#7323)
* deprecate-tbptt-trainer

* Update CHANGELOG.md

* Update lightning.py

* test

* Update lightning.py

* Update training_loop.py

* Update training_loop.py

* Update lightning.py

* Update training_loop.py

* Update training_loop.py

* update docs

* Update accelerator.py

* Update accelerator.py

* more docs

* tweaks

* chlog

* comments

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2021-05-05 11:21:00 +01:00
Jirka Borovec 573a5a8a34
update building latest XLA 1.8 (#7359)
* wip

* XLA

* .
2021-05-05 10:01:03 +01:00
William Falcon a4abb62482
Update README.md 2021-05-04 21:54:33 -05:00
Christfried Focke 763a9a9495
Fix Namespace loading in PyYAML 5.4.x (#6673)
* Fix Namespace loading in PyYAML 5.4.x

* Remove OmegaConf reference from PyYAML requirements

* Max allowed version for pyyaml

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-04 22:56:11 +00:00
Kaushik B e21b7a62d7
Add ddp_find_unused_parameters_false to Registry (#7224) 2021-05-04 22:40:00 +00:00
Jirka Borovec df579a842a
set min PT version for legacy (#7358) 2021-05-04 17:50:12 -04:00
Jirka Borovec bac4656eca
fix readme badges (#7354)
* fix readme badges

* Apply suggestions from code review

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
2021-05-04 16:37:26 -04:00
Carlos Mocholí 374ff750f5
Pass `current_epoch`/`global_step` as monitor candidates [1/2] (#7344)
* Pass `current_epoch`/`global_step` as monitor candidates

* Formatting

* Fix deprecated test

* Update CHANGELOG
2021-05-04 16:05:40 -04:00
Jirka Borovec bc06623ff0
temp suspend NVIDIA CI build (#7350)
* temp suspend NVIDIA CI build

* just skip

* todo

* if: false
2021-05-04 15:22:02 -04:00
Jirka Borovec 839b206164
add CI event published (#7353) 2021-05-04 14:32:16 -04:00
Louis Taylor b64aea637c
CI: move azure-pipelines config to separate directory (#7276)
* CI: move azure pipelines to separate directory

This removes some extra clutter in the top level as we add more
pipelines.

* rename

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-04 10:50:16 -04:00
Ethan Harris 2a740ebe77
Fix support for dataloader with None batches (#7342)
* Fix Dataloader None batch

* Fix Dataloader None batch

* Update CHANGELOG.md

* Fix breaking test

* Address comments
2021-05-04 12:24:03 +00:00
ramonemiliani93 5db832f181
Fix auto scaling mode when calling tune method on trainer. (#7321)
* Add test for non-existing mode, the test should fail if something different from `power` or `binsearch` is passed.

* Add newline.

* Apply fix

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update tests/tuner/test_scale_batch_size.py

* Update pytorch_lightning/tuner/batch_size_scaling.py

Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-05-04 12:03:51 +00:00
ananthsub 69cf63e2fd
Update trainer.py (#7340) 2021-05-04 11:11:27 +00:00
Carlos Mocholí 8c0ea92af2
`TrainerState` refactor [5/5] (#7173)
* `TrainerState` refactor

* flake8

* Update finished check

* Test cleanup

* Fix tests

* Fixes

* Reorder

* flake8

* Update CHANGELOG

* Better docs

* Better docs

* Remove default

* Update tests

* Bad merge
2021-05-04 12:50:56 +02:00
Adrian Wälchli a6aa1a0f82
make gpus=str in Trainer consistent with command line parsing of string (#6388)
* string gpu input

* update docs

* deprecation warning

* Revert "update docs"

This reverts commit c5f3893413.

* deprecation

* add changelog

* update parser

* update warning

* implement v1.5 behavior ahead of time

* formatting

* set accelerator in test to avoid different warning

* add warning

* remove todo warn

* Update pytorch_lightning/utilities/device_parser.py

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

* resolve flake8

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: tchaton <thomas@grid.ai>
2021-05-04 09:56:27 +00:00
Boris Dayma 2a20102321
fix(wandb): allow custom init args (#6989)
* feat(wandb): allow custom init args

* style: pep8

* fix: get dict args

* refactor: simplify init args

* test: test init args

* style: pep8

* docs: update CHANGELOG

* test: check default resume value

* fix: default value of anonymous

* fix: respect order of parameters

* feat: use look-up table for anonymous

* yapf formatting

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-04 09:45:36 +00:00
Hemil Desai 82c19e1444
Update LR schedulers only when their corresponding Optimizer is being… (#4868)
* Update LR schedulers only when their corresponding Optimizer is being used.

In the case when optimizer frequencies are specified,
the LR scheduler corresponding to a particular optimizer is updated
only when that optimizer is being used in the training loop or epoch.

* pep8speak fixes

* Fix failing tests

* Add docs

* PR Feedback

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* formatting fix

* PR Feedback - part 2

* More PR feedback

* Apply suggestions from code review

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Add typing imports

* Stronger tests and fixes related to that

* Add more tests plus PR feedback

* Make optimizer_freq_cumsum a cached property

@cached_property is only available after Python 3.8 so had to do it manually.

* Fix tests

* Apply suggestions from code review

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Avoid mutable defaults

* Parametrize lr scheduling tests

* PR feedback

* Apply suggestions from code review

* spell

* Apply suggestions from code review

* flake8

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2021-05-04 09:37:40 +00:00
Adrian Wälchli b780af51be
update test for resume_from_checkpoint on missing file (#7255) 2021-05-04 09:16:34 +00:00
Louis Taylor d413bab5ac
Add initial IPU CI job (#7251)
This adds an azure-pipelines job so we can verify the runners are
connected correctly. Since the IPU branch isn't merged, it won't yet
give any actual IPU test coverage.
2021-05-04 08:19:41 +00:00
Carlos Mocholí 3fdb61ac1b
Replace `_DataModuleWrapper` with `__new__` [1/2] (#7289)
* Remove `_DataModuleWrapper`

* Update pytorch_lightning/core/datamodule.py

* Update pytorch_lightning/core/datamodule.py

* Replace `__reduce__` with `__getstate__`
2021-05-04 08:00:24 +00:00
Leonard Lausen 597b309f2e
Fix `Trainer.plugins` type declaration (#7288)
* Fix trainer.plugins type declaration

* Don't ClusterEnvironment(Plugin)

* fix import error, yapf formatter

* Add test

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-04 08:42:57 +02:00
SpontaneousDuck f135debb6a
Clarify logger flag (#7190)
* Clarify logger flag

Clarify behavior of boolean values on the logger flag for Trainer.

* Update docs/source/common/trainer.rst

* doc

Co-authored-by: Ethan Harris <ewah1g13@soton.ac.uk>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2021-05-04 00:21:28 +00:00
Daniel Mesejo-León 6da747e775
Deprecate `LightningModule.datamodule` reference in favor of the trainer one (#6929) (#7168)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-04 00:01:41 +00:00
Adrian Wälchli 3e8db4142b
add forgotten test in #7240 (#7283)
^
2021-05-03 23:56:30 +00:00
Carlos Mocholí c6a171b776
Fix requirements/adjust_versions.py (#7149)
Co-authored-by: jirka <jirka.borovec@seznam.cz>
2021-05-04 01:06:28 +02:00
Kaushik B 6d7c6d6403
Update Accelerator Connector for Registry (#7214) 2021-05-03 21:03:21 +00:00
ananthsub b7a444883c
Remove model.trainer call inside of dataloading mixin (#7317)
* Update data_loading.py

* Update data_loading.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-03 13:53:54 -07:00
Mauricio Villegas 78a6fd5588
Example and documentation for LightningCLI linking model and data arguments (#7299)
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-05-03 20:45:46 +00:00
Adrian Wälchli bf1394a472
improve early stopping verbose logging (#6811) 2021-05-03 20:20:48 +00:00
ananthsub 393b252ef0
Update CODEOWNERS (#7302)
* Update CODEOWNERS

* @carmocca

* @borda

* Update CODEOWNERS

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2021-05-03 14:17:27 -05:00
ananthsub 14c552bb92
[bugfix] Fix dataloading for iterable datasets and limit_train_batches (#7306)
* bugfix-dataloading

* rm-logs

* Update CHANGELOG.md

* Update test_dataloaders.py

* Update test_dataloaders.py

* Update training_loop.py

* Update test_dataloaders.py

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update test_dataloaders.py

* Update training_loop.py

* Update training_loop.py

* comments

* address comments

* more tests

* Update progress.py

* Update test_dataloaders.py

* Update test_dataloaders.py

* Update training_loop.py

* Update training_loop.py

* test ckpt fix?

* update again
2021-05-03 19:50:26 +01:00
Adrian Wälchli 7636d422fa
Update DeepSpeed version requirement in Dockerfile (#7326)
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
2021-05-03 20:21:19 +02:00
ananthsub 39274273a4
Update accelerator.py (#7318) 2021-05-03 11:17:26 -04:00
Carlos Mocholí badd0bba30 Move trainer functions (#7295) 2021-05-03 09:26:38 -04:00
Adrian Wälchli e0c64f0ef6
Fix Adagrad optimizer not working with DDP/GPU (#7277)
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
2021-05-03 03:57:17 +05:30
William Falcon 29357ba94e
Update README.md 2021-05-01 13:55:07 -04:00
Kaushik B 490cc57809
Device updates for TPU Pod (#7243) 2021-04-30 23:14:06 +05:30
thomas chaton 16d6c9828d
[bugfix] Apex never instantiated. (#7274)
* update

* update

* update apex

* update

* update

* update

* remove test.py

* update

* update

* update on comments

* update changelog

* update

* update

* typo
2021-04-30 13:16:28 -04:00
ananthsub 44fd01734c
Move grad_norm to a dedicated utilities file (#7292)
* rm-grad-norm-mixin

* Update grads.py

* Update CHANGELOG.md

* Apply suggestions from code review

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update docstrings

* Update __init__.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-30 09:19:22 -07:00
ananthsub e407edba36
[fix] Attach train+val dataloaders to trainer in trainer loop (#7207)
* Update training_loop.py

* Update test_dataloaders.py

* changelog

* delay reload

* go back

* comments

* Update training_loop.py

* Update test_dataloaders.py

* Update tests/trainer/test_dataloaders.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2021-04-30 09:01:31 -07:00
thomas chaton 80b9ca0e38
[bugfix] Add reloading support using BaseFinetuning (#7253)
* update

* wip

* udpate

* update

* update

* update

* resolve bug

* update on comments

* update on comments

* update

* update

* formatting

* add comments

* update on comments

* update

* Update pytorch_lightning/callbacks/base.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* update

* update

* Typing and minor changes

* Refactor

* Fix deprecated test

* Broken commit

* Fix broken commit

* flake8

* Update CHANGELOG

* update on comments

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
2021-04-30 11:14:43 -04:00
Carlos Mocholí 5af086ab9f
Attach data refactor and tuner bugs [4/n] (#7258)
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 13:54:58 +00:00
Adrian Wälchli ea2287e723
update training type plugin docs regarding result caching (#7261)
* add docs

* typo

* update
2021-04-30 13:03:10 +00:00
Adrian Wälchli b9b3fa371f
fix case where an IterableDataset doesn't produce a batch for an epoch (#7294)
* wip

* fix

* add test

* refactor + test

* rm

* formatting

* update changelog

* doc

* docstring

* remove unused import

* Update CHANGELOG.md

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 12:45:55 +00:00
ananthsub 969e857690
Rename `trainer._launch` to `trainer._run` (#7265)
* rename-run

* fix
2021-04-30 13:39:02 +01:00
Adrian Wälchli 8232de427a
fix save_hyperparameters(container) if container is empty (#7268)
* fix

* add tests

* changelog

* fix test
2021-04-30 13:38:42 +01:00
PythicCoder 8bffa4f0ca
Updated docs to fix typo and update grid status (#7270)
* Updated docs to fix typo and update grid status

* Update docs/source/starter/new-project.rst

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Update docs/source/starter/new-project.rst

* Update docs/source/starter/new-project.rst

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
2021-04-30 12:45:17 +01:00
Kaushik B ac92b57e2b
No need of warning when saved callback_states is None (#7293) 2021-04-30 10:48:53 +00:00