Commit Graph

2639 Commits

Author SHA1 Message Date
Jirka Borovec 944ffba305
join coverage (#2460)
* join coverage

* full TPU test

* codecov

* typo

* report

* docker

* timeout

* base

* show

* cd dir

* req

* docker

* docker

* docker

* coverage

* upload

* drop main

* report

* report

* python

* upload

* drone

* drone

* drone

* drone

* drone

* drone

* drone

* drone

* drone
2020-07-04 10:22:58 -04:00
William Falcon e5a979990e
Hang (#2488)
* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test
2020-07-03 15:16:45 -04:00
Jirka Borovec fc61c200c0
DDp interpreter (#2482)
* interpreter

* chlog
2020-07-03 13:23:30 -04:00
zcain117 6d9c7bf0b0
Add link to TPU Pods tutorial. (#2477) 2020-07-03 00:57:17 -04:00
William Falcon 020c332ae9
Clean up (#2467)
* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* Fixes #2455

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test

* added early stop tpu test
2020-07-03 00:38:29 -04:00
Jirka Borovec e77add3301
fix gpu example (#2466)
* fix gpu example

* make cpu_template and gpu_template differnt

Co-authored-by: Adrian Wälchli <adrian.waelchli@inf.unibe.ch>
2020-07-03 00:17:18 -04:00
William Falcon 0697dd306d
Fixes #2455 (#2463) 2020-07-02 07:18:58 -04:00
William Falcon afdfba1dc6
removed auto val reduce (#2462) 2020-07-02 07:04:18 -04:00
zcain117 1a40963d1d
Add Github Action to run TPU tests. (#2376)
* Add Github Action to run TPU tests.

* Trigger new Github Actions run.

* Clean up more comments.

* Use different fixed version of ml-testing-accelerators and update config to match.

* use cluster in us-central1-a

* Run 'gcloud logging read' directly without 'echo' to preserve newlines.

* cat coverage.xml on the TPU VM side and upload xml on the Github Action side

* Use new commit on ml-testing-accelerators so command runs fully.

* Preserve newlines in the xml and use if: always() temporarily to upload codecov

* Use pytorch_lightning for coverage instead of pytorch-lightning

* Remove the debug cat of coverage xml

* Apply suggestions from code review

* jsonnet rename

* name

* add codecov flags

* add codecov flags

* codecov

* codecov

* revert codecov

* Clean up after apt-get and remove old TODOs.

* More codefactor cleanups.

* drone

* drone

* disable codecov

* cleaning

* docker py versions

* docker py 3.7

* readme

* bash

* docker

* freeze conda

* py3.6

* Stop using apt-get clean.

* Dont rm pytorch-lightning

* Update docker/tpu/Dockerfile

* Longer timeout in the Github Action to wait for GKE to finish.

* job1

* job2

* job3

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-07-01 21:44:19 -04:00
Jirka Borovec dcd6000be7
continue (#2450) 2020-07-01 08:35:51 -04:00
Jirka Borovec 7f1eab4cad
try adding coverage (#2441)
* add coverage, test failing

* fix test

* badges

* typo

* freeze conda
2020-07-01 08:00:36 -04:00
Jirka Borovec 695e0514f8
cleaning (#2449) 2020-07-01 07:56:10 -04:00
Adrian Wälchli 927f305f7e
Warn user when IterableDataset has __len__ defined (#2437)
* add warning when getting checking len

* added test

* changelog

* pep

* do not show warning below 1.4

* try version parse

* comments

* xfail

* Update requirements/base.txt

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/trainer/data_loading.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* version

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
2020-07-01 07:53:19 -04:00
William Falcon 325852c6df
enabled no returns from eval (#2446)
* enabled no returns from eval

* fixed docs

* fixed docs

* fixed docs

* fixed docs

* fixed docs

* fixed docs

* fixed docs

* fixed docs

* fixed docs

* fixed docs

* fixed docs

* fixed docs
2020-07-01 07:38:00 -04:00
Llannelongue fa2233f56f
Corrected typo `python -m pip pre-commit install` (#2447) 2020-07-01 07:02:02 -04:00
Jirka Borovec ded8a56bb3
missing changes in chlog (#2430)
* missing

* miss
2020-06-30 22:45:50 -04:00
Jirka Borovec e268061614
Pure package & base tests (#2418)
* base tests

* pil

* wip

* wip

* wip

* ignore

* ignore

* win

* link

* win

* cpu

* Apply suggestions from code review

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-06-30 19:35:54 -04:00
Adrian Wälchli 145670f893
fix logging on rank 0 only (#2425)
* fix and test for ddp block logging rank > 0

* rename

* use the dummy logger

* dummy logger test

* set the logger in  model

* decorator for rank zero experiment

* simplify check

* simplify

* fix problem with None in checkpoint path

* revert configure logger

* unused import

* offline

* try rank 0 decorator in checkpoint

* try fix test

* imgs

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* add asserts to make sure log zero only saves checkpoints

* fix tpu tests

* fix tpu tests

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-30 18:09:16 -04:00
William Falcon 04e68f022f fix tpu tests 2020-06-30 17:20:35 -04:00
William Falcon fc26078e39 fix tpu tests 2020-06-30 17:20:18 -04:00
Oliver Neumann 1a54ed6ad9
Checking ipywidgets is installed for ensure tqdm working (#2417)
* Adding importing ipywidgets before importing tqdm.auto to make sure ipywidgets is installed.

* Updated CHANGELOG.md

* Updated ipywidgets importing checks to @awaelchli comments.

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-30 16:59:35 -04:00
William Falcon 309ed75c5d
added reduce ddp results on eval (#2434)
* added reduce ddp results on eval

* added reduce ddp results on eval

* added reduce ddp results on eval

* added reduce ddp results on eval

* added reduce ddp results on eval

* added reduce ddp results on eval

* added reduce ddp results on eval

* added reduce ddp results on eval

* added reduce ddp results on eval

* added reduce ddp results on eval

* added reduce ddp results on eval
2020-06-30 16:15:35 -04:00
William Falcon e8bb4165b7
Fix apex scaling with decoupled backward (#2433)
* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs

* fix outputs
2020-06-30 14:51:39 -04:00
Jirka Borovec d4a02e3bd8
tests: drop CircleCI (#2412)
* drop CircleCI

* add PT testing

* fix

* cpu

* conda

* conda

* req

* base

* conda

* conda

* conda

* conda

* conda

* conda

* conda

* name

* req

* info

* tests

* pt 1.6

* drop 1.6

* info
2020-06-30 10:56:05 -04:00
William Falcon a42a0e16dd
Fixes train outputs (#2428)
* fix outputs

* fix outputs
2020-06-30 10:03:49 -04:00
Jirka Borovec a75398530c
continue (#2416) 2020-06-29 21:00:52 +02:00
Jirka Borovec dec074c2e7
typo (#2415) 2020-06-29 07:36:56 -04:00
Jirka Borovec 02d6045cac
release (#2414) 2020-06-29 07:21:28 -04:00
William Falcon 33b92557f5
Update __init__.py 2020-06-29 06:59:35 -04:00
William Falcon 92d1e75b26 fix batch typo 2020-06-29 06:54:21 -04:00
William Falcon 593837e1da fix amp wrong call 2020-06-29 06:46:19 -04:00
Jirka Borovec 3ff695510e
missing changes (#2283)
* missing

* RC1

* RC1

* format
2020-06-29 06:34:19 -04:00
William Falcon 58f03f3076
Update README.md 2020-06-28 22:44:58 -04:00
William Falcon 8f07b77fc0
Update __init__.py 2020-06-28 22:08:51 -04:00
Adrian Wälchli 25ee51bc57
Continue Jeremy's early stopping PR #1504 (#2391)
* add state_dict for early stopping

* move best attr after monitor_op defined

* improve early stopping and model checkpoint callbacks

* fix formatting

* fix attr init order

* clean up setting of default_root_dir attr

* logger needs default root dir set first

* reorg trainer init

* remove direct references to checkpoint callback

* more fixes

* more bugfixes

* run callbacks at epoch end

* update tests to use on epoch end

* PR cleanup

* address failing tests

* refactor for homogeneity

* fix merge conflict

* separate tests

* tests for early stopping bug regressions

* small fixes

* revert model checkpoint change

* typo fix

* fix tests

* update train loop

* cannot pass an int as default_save_path

* refactor log message

* fix test case

* appease the linter

* fix some doctests

* move config to callback

* fixes from rebase

* fixes from rebase

* chlog

* docs

* reformat

* formatting

* fix

* fix

* fixes from rebase

* add new test for patience

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/callbacks/test_early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix formatting

* remove enable_early_stop attribute

* add state_dict for early stopping

* move best attr after monitor_op defined

* improve early stopping and model checkpoint callbacks

* fix formatting

* fix attr init order

* clean up setting of default_root_dir attr

* logger needs default root dir set first

* reorg trainer init

* remove direct references to checkpoint callback

* more fixes

* more bugfixes

* run callbacks at epoch end

* update tests to use on epoch end

* PR cleanup

* address failing tests

* refactor for homogeneity

* fix merge conflict

* separate tests

* tests for early stopping bug regressions

* small fixes

* revert model checkpoint change

* typo fix

* fix tests

* update train loop

* fix test case

* appease the linter

* fix some doctests

* move config to callback

* fixes from rebase

* fixes from rebase

* chlog

* docs

* reformat

* formatting

* fix

* fix

* fixes from rebase

* add new test for patience

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update pytorch_lightning/callbacks/model_checkpoint.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/callbacks/test_early_stopping.py

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* fix formatting

* remove enable_early_stop attribute

* fix test with new epoch indexing

* fix progress bar totals

* fix off by one error (see #2289) epoch starts at 0 now

* added missing imports

* fix hpc_save folderpath

* fix formatting

* fix tests

* small fixes from a rebase

* fix

* tmpdir

* tmpdir

* tmpdir

* wandb

* fix merge conflict

* add back evaluation after training

* test_resume_early_stopping_from_checkpoint TODO

* undo the horovod check

* update changelog

* remove a duplicate test from merge error

* try fix dp_resume test

* add the logger fix from master

* try remove default_root_dir

* try mocking numpy

* try import numpy in docs test

* fix wandb test

* pep 8 fix

* skip if no amp

* dont mock when doctesting

* install extra

* fix the resume ES test

* undo conf.py changes

* revert remove comet pickle from test

* Update CHANGELOG.md

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* Update weights_loading.rst

* Update weights_loading.rst

* Update weights_loading.rst

* renamed flag

* renamed flag

* revert the None check in logger experiment name/version

* add the old comments

* _experiment

* test chckpointing on DDP

* skip the ddp test on windows

* cloudpickle

* renamed flag

* renamed flag

* parentheses for clarity

* apply suggestion max epochs

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

Co-authored-by: Jeremy Jordan <jtjordan@ncsu.edu>
Co-authored-by: Jirka <jirka@pytorchlightning.ai>
Co-authored-by: Jeremy Jordan <13970565+jeremyjordan@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-28 21:36:46 -04:00
Jirka Borovec 1e16681693
fix loading with hparams (#2403)
* fix #2386

* extra test

* extra case

* extra test

* chlog

* fix test
2020-06-28 20:22:03 -04:00
Adrian Wälchli 058c500300
fix when torchtext not installed (#2402) 2020-06-28 20:03:51 -04:00
Jirka Borovec 861a73be12
fix loading past checpoints (#2405)
* fix #2334

* chlog
2020-06-28 17:20:33 -04:00
William Falcon 66ffbaddf5
updates teardown to account for ddp (#2389)
* remove warnings

* remove warnings

* added doc lines

* added doc lines
2020-06-28 07:01:04 -04:00
Adrian Wälchli d910cc5200
docs: dont mock imports when running sphinx doctest (#2396)
* skip if no amp

* dont mock when doctesting

* install extra
2020-06-27 23:31:06 -04:00
Jirka Borovec 75f0a2062c
move torchtext as optional (#2395)
* torchtext

* Update pytorch_lightning/utilities/apply_func.py

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

* Update apply_func.py

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
2020-06-27 20:15:10 -04:00
Jirka Borovec 51711c265a
fix loading model with kwargs (#2387)
* test

* fix

* fix
2020-06-27 16:38:03 -04:00
Mateusz Pieniak e82d9cdb66
Support torchtext on a single GPU (#2379)
* Handle torchtext.data.Batch on GPU

* Update CHANGELOG.md

* Apply code review requests

* Correct the docs

* Change requirements
2020-06-27 16:36:45 -04:00
Jirka Borovec 73a78a13c7
CI: partial move from CircleCI (#2378)
* move from CircleCI

* req

* tex

* tex

* sudo

* extra

* recom

* pic

* dvipng
2020-06-27 16:25:33 -04:00
William Falcon 90f641af0d
fixes logger crash on ddp (#2388)
* remove warnings

* remove warnings

* remove warnings

* remove warnings

* remove warnings

* remove warnings

* remove warnings

* remove warnings

* remove warnings

* remove warnings
2020-06-27 15:08:22 -04:00
Jirka Borovec 41f5df18a4
move Trains logger to Bolts (#2384)
* move Trains logger

* chlog
2020-06-27 09:14:05 -04:00
Jirka Borovec 4e13e419ea
add CLI test for examples (#2285)
* cli examples

* ddp

* CI

* CI

* req

* tests

* skip DDP

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-06-27 09:13:29 -04:00
Jirka Borovec 6673fc9a0b
fix docker builds (#2383) 2020-06-27 08:49:19 -04:00
Jirka Borovec 2f739f5977
fix key typo (#2374) 2020-06-26 21:46:08 -04:00
Kshitij09 20d0f53896
Fix ModelCheckpoint example (#2321)
`save_top_k` should be an `int` and have been mentioned as `save_top_k=True` in the snippet provided under 'Saving and Loading Weights' docs. Changed it to its default value (1) to make it consistent.

Signed-off-by: Kshitij Patil <kshitijpatil98@gmail.com>
2020-06-26 21:45:41 -04:00