Commit Graph

3889 Commits

Author SHA1 Message Date
Jirka Borovec e1955e3c89
isolate PL debugger in tests (#4643)
* isolate PL debugger in tests

* miss

Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-14 11:22:56 +00:00
Justus Schock e04e7c9ecc
Makes automatic optimization a model attribute (#4602)
* Makes automatic optimization a model attribute

* Update trainer.py

* remove setting property in model

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update pytorch_lightning/trainer/trainer.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Update trainer.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-14 11:13:42 +06:30
Justus Schock 144a5c9913
Increase parity to match logging refactor (#4651)
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-14 10:33:30 +06:30
Espen Haugsdal fa88905af0
Fix docs typo: train_batch => val_batch (#4659)
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-14 08:23:11 +06:30
ananthsub d096a2ea6d
Fix setup callback hook to pass LightningModule through (#4608)
* Fix setup callback hook

* Update CHANGELOG.md

* Update test_trainer.py

* Update test_trainer.py

* Update test_trainer.py

* fix chlog

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-13 19:34:46 -05:00
Nathan Painchaud 2d78d9b84a
CI: Added isort import check for the code on pull-request (#4242)
* added isort CI job and updated isort config

* changed CI check output from files to full diff

* added isort pre-commit hook

* Added missing first party and restricted files affected by isort

* Applied isort to root-level, docs and benchmarks

* Apply suggestions from code review

Co-authored-by: Nathan Painchaud <nathanpainchaud@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-11-13 22:57:46 +01:00
Jeff Yang baa8558cc0
logger docs and api docs (#3950)
* logger and api docs

* remove gpu_usage_logger, lr_logger

* update docstring

* fix wandb example

* remove step result

* charts

* add some charts info

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-13 20:35:54 +05:30
Jirka Borovec 7940ea5aaf
CI: TPU drop install horovod (#4622)
Co-authored-by: chaton <thomas@grid.ai>
2020-11-13 11:33:52 +01:00
chaton 4018237c30
[FEAT] Add lambda closure to manual_optimizer_step (#4618)
* added lambda_closure

* move to types

* add 2 new tests

* make example more complex

* add complex example to doc

* added more tests

* resolve doc

* typo

* update

* update tpu optimizer_step

* Apply suggestions from code review

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-12 19:22:06 +00:00
Sean Naren bacabaebaf
Sharded Accelerator 1/n: Expose clip gradients to plugins via abstract class (#4639)
* Added abstract precision plugin to expose clip_gradients function, use within accelerator to clip gradients

* Exclude model from override, keep optimizer (needed for sharded clip gradients), add override for O2 support apex

* Fix doc

* Applied codereview changes

* Refactored clip function to encapsulate tpu changes with tpu accelerator. Default to standard clip function for vanilla torch

* Pass correct grad clip val

* Moved var to property

* Apply code review suggestions
2020-11-12 17:18:09 +00:00
chaton 4a01fd048c
[FIX] Average Pbar Metrics (#4534)
* wip

* update

* normalize loss

* update test

* resolve bug

* update test and add TODO

* make sure it can be sync

* add TODO

* update sol
2020-11-12 15:59:01 +00:00
Jirka Borovec bd6c413829
Conda: PT 1.8 (#3833)
* PT 1.8

* unfreeze PT

* drop nightly from full

* add PT 1.8 to workflow

* readme table

* cuda

* skip cuda

* test 1.8

* unfreeze torch vision

Co-authored-by: ydcjeff <ydcjeff@outlook.com>
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-12 15:03:43 +01:00
chaton 35f00df176
[FEAT] Add pytest section to Contribution how to ? (#4633)
* update contributing

* formatting
2020-11-12 11:48:54 +00:00
Jeff Yang 79fc92647c
[make] Create Makefile (#4620)
* [make] Create Makefile

* exclude makefile

* contributing info

* rm .run_local_test.sh
2020-11-12 09:25:31 +00:00
Jirka Borovec 396a18eb78
update changelog after 1.0.6 (#4624)
* update changelog after 1.0.6

* fix formatting
2020-11-12 09:21:57 +01:00
Marc Ferradou bff99ee159
Small typo correction on CONTRIBUTING.md (#4625)
* Update CONTRIBUTING.md

Small typo correction.

* Update .github/CONTRIBUTING.md

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
2020-11-12 08:59:33 +01:00
Sean Naren 33470ba605
Prevent crash if sync_dist=True on CPU (#4626)
* Added test/fix for sync_dist raising NotImplementedError

* Fixed comments/formatting

* Revert base class change, enforce sync tensors across accelerators, added GPU test
2020-11-11 22:04:05 +00:00
chaton 3d202f9ecc
[FEAT] Refactor logging 3/3 [v1] (#4552)
* wip

* wip check how many tests break

* wip

* resolve some bugs

* resolve more bugs

* resolve 2 bugs

* resolve

* temp fix

* update

* remove useless code

* remove result

* try to resolve bug

* update changelog

* formatting

* remove pl

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-11 17:05:24 +00:00
chaton 514cb22bd7
[Fix] Move log value to cpu. (#4592)
* move value to cpu to save memory

* update

* move to cpu

* try something

* update

* update

* add back out_dict.update({k: v})

* add move_metrics_to_cpu

* update

* Update pytorch_lightning/utilities/memory.py

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>

* resolve comments

* Update pytorch_lightning/core/step_result.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Justus Schock <12886177+justusschock@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-10 21:13:41 +00:00
chaton 7e08b0d710
[bug-fix] DDP and automatic_optimization=False (#4485)
* resolve bug

* add self._running_manual_optim

* update

* update tests

* update lightning module

* resolve bug

* update tests

* update

* resolve pep8

* update

* replace by `ddp_spawn`

* temporary fix

* update

* update

* move update to training_loop

* make both ddp_spawn

* introduce `manual_optimizer_step`

* update changelog

* added changelog wrong place

* add force_optimizer_step

* update docstring for tests

* update optimizer_step

* update zero_grad

* resolve flake8

* move update into manual_optimizer_step

* add zero_grad

* remove zero_grad tests

* remove manual_backward in AMP, it doesn't help

* update

* loosen tests

* update

* update doc

* add TODO

* Removed unnecessary get model from native amp

* Remove try except with pytest raise

* Add seed, clean up imports, remove try catch to reproduce error

* update code

* update test

* revert back

* formatting

* Update pytorch_lightning/core/lightning.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-10 19:44:51 +00:00
Jirka Borovec abf1d4b992
fix mock pkgs in docs (#4591)
* fix mock pkgs in docs

* sphinx

* CI

Co-authored-by: chaton <thomas@grid.ai>
2020-11-10 14:57:21 +01:00
maxjeblick 343d19fa86
Find parameters which are specified in the LightningDataModule, only (#4347)
* search for attribute in datamodule if not found elsewhere

* add test for datamodule

* add lightning_getattr test for datamodule

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
2020-11-10 14:01:20 +01:00
Diedre Carmo 470e2945fc
fix logged keys in mlflow logger (#4412)
* [#4411] fix gpu_log_memory with mlflow logger

* sanitize parenthesis instead of removing for all loggers

* apply regex for mlflow key sanitization

* replace ',' with '.' typo

* add single warning and test

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-11-10 17:20:25 +05:30
Roger Shieh 11415faade
[req] Set min version for skimage for tests (#4598)
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-10 17:16:37 +06:30
Kai Zhang 30ad3e2ad3
Replace a MisconfigurationException with warning in ModelCheckpoint callback (#4560)
* replace MisconfigurationException with warning

* update test

* check raising UserWarning

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-10 10:44:43 +01:00
Nicki Skafte 465ec752f8
Metric ddp bugfix (#4482)
* changes

* fix spelling

* small note

* trying to fix ddp test

* fix ddp

* fix for test

* suggestion

* CHANGELOG

* Update pytorch_lightning/metrics/metric.py

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Sean Naren <sean@grid.ai>
2020-11-10 09:16:31 +01:00
Nicki Skafte 4f3160ba2e
Skip tuner algorithms on fast dev (#3903)
* skip on fast dev

* fix error

* changelog

* fix recursive issue

* combine tests

* pep8

* move logic to base funcs

* fix mistake

* Update pytorch_lightning/tuner/lr_finder.py

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* pep

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Nicki Skafte <nugginea@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-11-10 00:34:42 +01:00
tarepan 41c9bee4f0
Fix load disparity between normal and hpc (#4526)
* Add missing load functionality in hpc

* Add general file load for hpc

* Add mark in CHANGELOG

* Fix Typo Li**hg**tning

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Refactor line separation

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Fix entangled fixation commit

* Fix naming of restore_model_states

* Fix amp restore place

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: chaton <thomas@grid.ai>
2020-11-09 17:26:38 +00:00
Jeff Yang 23719e3c05
[dockers] install nvidia-dali-cudaXXX (#4532)
* [dockers] install nvidia-dali-cuda100

* Apply suggestions from code review

* build DALI

* build DALI

* build DALI

* dali from source

* dali from source

* use binaries

* qq

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-09 21:18:24 +06:30
Stef | ステフ 4a6721af25
Missing TorchScript trace's update (#4586)
Co-authored-by: stef-ubuntu <stef@webempath.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
2020-11-09 15:01:13 +01:00
Akihiro Nitta 45a695969a
Fix docstring (#4585)
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-09 19:52:47 +06:30
Jan Beitner e01190e919
Adding pytorch-forecasting to community examples (#4575)
PyTorch Forecasting is a new library that is designed for time series forecasting practitioners and researchers alike.
It is based on the awesome work on PyTorch Lightning. Thanks a lot for creating such an asset!

Have a look at the documentation for more information.

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-09 12:33:44 +01:00
Nicki Skafte 01a925d333
[Docs] Note on running metric in dp (#4494)
* note

* Update docs/source/metrics.rst

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jeff Yang <ydcjeff@outlook.com>
2020-11-09 11:30:28 +01:00
William Falcon ee35907170
Accelerator docs (#4583)
* accelerator docs

* accelerator docs
2020-11-08 17:24:41 -05:00
William Falcon 3ba48d3bc4
ref: unify slurm and TE under backendPlugin 5/n" (#4582)
* ref: unify slurm and TE under backendPlugin 4/n

* ref: unify slurm and TE under backendPlugin 5/n
2020-11-08 16:20:19 -05:00
William Falcon 624f5b5938
ref: unify slurm and TE under backendPlugin 3/n (#4581) 2020-11-08 15:32:37 -05:00
William Falcon bfaf014096
ref: unify slurm and TE under backendPlugin 2/n (#4580) 2020-11-08 15:07:16 -05:00
William Falcon 0f64f15f52
ref: unify slurm and TE under backendPlugin 1/n (#4578)
* ref: unify slurm and TE under backendPlugin

* ref: unify slurm and TE under backendPlugin
2020-11-08 14:28:55 -05:00
William Falcon 09a51697ed
Adds shortcut for path to log (#4573)
* added log_dir shortcut to trainer properties for writing logs

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut

* added log_dir shortcut
2020-11-08 12:16:22 -05:00
William Falcon f63fec9323
updated trainer docs (#4571) 2020-11-07 15:41:02 -05:00
William Falcon e0bdf8124b
updated trainer docs (#4570) 2020-11-07 14:53:04 -05:00
William Falcon bb356a73cb
added trainer api docs (#4569) 2020-11-07 14:18:45 -05:00
chaton 854c13673b
add congratulations at the end of our notebooks (#4555)
* add congratulations at the end of our notebooks

* udpate image
2020-11-07 12:05:29 +00:00
Indrayana Rustandi 6e5f232f5c
Add Dali MNIST example (#3721)
* add MNIST DALI example, update README.md

* Fix PEP8 warnings

* reformatted using black

* add mnist_dali to test_examples.py

* Add documentation as docstrings

* add nvidia-pyindex and nvidia-dali-cuda100

* replace nvidia-pyindex with --extra-index-url

* mark mnist_dali test as Linux and GPU only

* adjust CUDA docker and examples.txt, fix import error in test_examples.py

* adjust the GPU check

* Exit when DALI is not available

* remove requirements-examples.txt and DALI pip install

* Refactored example, moved to new logging api, added runtime check for test and dali script

* Patch to reflect the mnist example module

* add req.

* Apply suggestions from code review

* Removed requirement as it breaks CPU install, added note in README to install DALI

* add DALI to Drone

* test examples

* Apply suggestions from code review

* imports

* ABC

* cuda

* cuda

* pip DALI

* Move build into init function

Co-authored-by: SeanNaren <sean@grid.ai>
Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-06 14:53:46 +00:00
Jeff Yang f3dfb98444
[ci] tag v1.4.1 for pypa/gh-action-pypi-publish (#4548) 2020-11-06 10:48:27 +00:00
cool425589 5e09fd31e9
show progressbar only on progress_rank 0 on ddp_slurm (#4437)
Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-11-06 01:36:22 +01:00
chaton 9c8701f2e2
[feat] Logging refactor 2/n - train (#4495)
* update logging

* solve more bugs

* replace Mapping by Dict

* update on comments

* resolve pep8

* Apply suggestions from code review

Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

* typo

* update for coverage

* update test

* update

* Update tests/models/test_hooks.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* Update tests/models/test_hooks.py

Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>

* update on comments

* remove deepcopy

* remove useless look for

* another small optim

* extra optim

* remove lastest optim, can be source of bug

* resolve bug

* add docstring

* optimize coverage

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/connectors/logger_connector/epoch_result_store.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/trainer/logging_tests/test_distributed_logging.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/evaluation_loop.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/trainer/logging/test_logger_connector.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/trainer/logging_tests/test_train_loop_logging_1_0.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* update on comments

* update

* update on comments

* update parity speed

* get it down to 0.65

* update

* 0.8 max_dif

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: ananthsub <ananth.subramaniam@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-11-05 22:27:04 +00:00
Jirka Borovec 62ea4614f3
update PR template (#4523)
* update PR template

* Update .github/PULL_REQUEST_TEMPLATE.md

Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>

* Apply suggestions from code review

Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>

Co-authored-by: chaton <thomas@grid.ai>
Co-authored-by: Roger Shieh <sh.rog@protonmail.ch>
Co-authored-by: edenlightning <66261195+edenlightning@users.noreply.github.com>
2020-11-05 22:05:27 +01:00
Travis Addair 51cc7a89ee
Horovod: fixed early stopping and added metrics aggregation (#3775)
* Fixed early stopping for Horovod

* Refactored to sync_dist_if_available

* Bump min Horovod version to support hvd.is_initialized

* Changelog

* Added back change for Horovod

* Removed redundant checks for initialization

* Implement metrics gathering for Horovod

* Added test for EvalResult

* Renamed ddp_sync_on_step -> dist_sync_on_step

* Added metric test for Horovod

* Added option pass callable allgather function to metric base class

* Added dist_sync_fn

* Fixed calls to private _sync_dist

* Fixed Horovod test

* Added sync_tensor to the distributed backend

* Skip Windows

* Insert test path

* Removed redundant import

* Updated drone

* Unset HOROVOD_GPU_ALLREDUCE

* Unset

* No cache dir

* No uninstall

* Unset variables

* Uninstall Horovod during initialization

* Replaced more references to ddp_sync_on_step

* Fixed imports

* Fixed attribute

* Added back default

* Lint

* Added back docstring

* Made gather_all_tensors default

* Added whitespace

* Update tests/models/test_horovod.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/metrics/metric.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update CHANGELOG.md

Co-authored-by: Teddy Koker <teddy.koker@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-11-05 12:52:02 -05:00
Jeff Yang e81707ba02
[dockers] use inline cache (#4511)
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Nicki Skafte <skaftenicki@gmail.com>
Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>
2020-11-04 23:08:17 +01:00