lightning

Commit Graph

Author	SHA1	Message	Date
Jirka Borovec	9dd04028d5	tests for legacy checkpoints (#5223 ) * wip * generate * clean * tests * copy * download * download * download * download * download * download * download * download * download * download * download * flake8 * extend * aws * extension * pull * pull * pull * pull * pull * pull * pull * try * try * try * got it * Apply suggestions from code review (cherry picked from commit `72525f0a83`)	2021-01-26 14:27:56 +01:00
chaton	56437e98a6	[bug-fix] Trainer.test points to latest best_model_path (#5161 ) * resolve bug * update code * add set -e * Update pytorch_lightning/callbacks/model_checkpoint.py Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> * update test * Update tests/checkpointing/test_trainer_checkpoint.py Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * Update tests/checkpointing/test_trainer_checkpoint.py Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> * update on comments * resolve test * convert to set * update * add error triggering * update * update on comments * update * resolve import * update * update * Update pytorch_lightning/plugins/rpc_plugin.py Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * update Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-62-109.ec2.internal> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> (cherry picked from commit `d5b367871f`)	2021-01-06 15:14:10 +01:00
Adrian Wälchli	3b0197fce5	reduce verbosity level in drone ci (#5190 ) * reduce verbosity level in drone * verbosity	2021-01-05 09:58:37 +01:00
Ganesh Anand	a5b2392652	update DALIClassificationLoader to not use deprecated arguments (#4925 ) * update DALIClassificationLoader to not use deprecated arguments * fix line length * dali version check added and changed args accordingly * versions * checking version using disutils.version.LooseVersion now * . * ver * import Co-authored-by: Jirka Borovec <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2021-01-05 09:58:37 +01:00
chaton	02152c1729	Simplify optimization Logic (#4984 ) * Rely on ddp plugin for blocking sync behaviour, and skip if we're using manual optimization * debug * Revert "debug" This reverts commit `ccca6b6b` * Expose manual reduce for automatic optimization * Add input arguments * Enable parity test * clean imports * Expose hook after to ensure we reset * Fix naming * add * fix test * uniformize optimizer logic * resolve test * resovle flake8 * resolve amp bug * update tests * remove bug * remove optimizer_step in accelerators * typo * update lightning optimizer * set doesn't work with ddp_spawn * resolve flake8 * update threshold * ignore pyright * correct codeFactor * remove useless if * remove zer_grad function * simplify step * remove typo * resolve bug * Apply suggestions from code review * update on comments * resolve bugs * remove tests * Update pytorch_lightning/trainer/configuration_validator.py Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com> * simplify testing * add more tests Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>	2020-12-07 12:55:49 +00:00
Jirka Borovec	add387c6a7	CI cleaning (#4941 ) * set * cut * env * oonce * env * env * env	2020-12-02 10:00:05 +00:00
Jirka Borovec	42b9a387df	freeze DALI (#4922 ) * freeze DALI * todos * only CI * Update .drone.yml * string * speed Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-11-30 21:21:59 +00:00
Jirka Borovec	bddc6cd77a	pytest default color (#4703 ) * pytest default color * time Co-authored-by: chaton <thomas@grid.ai>	2020-11-18 10:53:44 +00:00
Jirka Borovec	9a5d40aff4	test PL examples (#4551 ) * test PL examples * minor formatting * skip failing * skip failing * args * mnist datamodule * refactor tests * refactor tests * skip * skip * drop DM * drop DM Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-11-17 19:35:17 +01:00
Indrayana Rustandi	6e5f232f5c	Add Dali MNIST example (#3721 ) * add MNIST DALI example, update README.md * Fix PEP8 warnings * reformatted using black * add mnist_dali to test_examples.py * Add documentation as docstrings * add nvidia-pyindex and nvidia-dali-cuda100 * replace nvidia-pyindex with --extra-index-url * mark mnist_dali test as Linux and GPU only * adjust CUDA docker and examples.txt, fix import error in test_examples.py * adjust the GPU check * Exit when DALI is not available * remove requirements-examples.txt and DALI pip install * Refactored example, moved to new logging api, added runtime check for test and dali script * Patch to reflect the mnist example module * add req. * Apply suggestions from code review * Removed requirement as it breaks CPU install, added note in README to install DALI * add DALI to Drone * test examples * Apply suggestions from code review * imports * ABC * cuda * cuda * pip DALI * Move build into init function Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: Jirka Borovec <jirka@pytorchlightning.ai> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Sean Naren <sean.narenthiran@gmail.com>	2020-11-06 14:53:46 +00:00
Jirka Borovec	fc78ffa622	extend release testing (#4506 ) * extend release testing * Drone * also PR to release * actions versions	2020-11-04 09:08:37 +00:00
Jeff Yang	ee414d25be	Switch to PyTorch 1.6 in Drone CI (#4393 ) * switch to 1.6 * readme * 1.7 * back to normal [ci skip] * horovodrun --verbose * try with apex * add apex test * change base * description * test with 1.7 * back to 1.6 * no gradient_clip_val * re-add gradient_clip_val * no amp * temp skip torch.cuda.amp + horovod test * Apply suggestion from code review Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> * Fix formatting * ddp * Moved extended model outside of function to prevent pickling issue for drone * typo * resolve bug * extract automatic_automization Co-authored-by: Sean Naren <sean.narenthiran@gmail.com> Co-authored-by: SeanNaren <sean@grid.ai> Co-authored-by: chaton <thomas@grid.ai>	2020-11-03 18:01:51 +00:00
Jirka Borovec	ce8abd6255	Drone: use nightly build cuda docker images (#3658 ) * upgrade PT version * update docker * docker * try 1.5 * badge * fix typo: dor -> for (#3918) * prune * prune * env * echo * try * notes * env * env * env * notes * docker * prune * maintainer * CI * update * just 1.5 * CI * CI * CI * CI * CI * CI * CI * CI * CI * CI * CI * docker * CI * CI * CI * CI * CI * CI * CI * CI * CI * push * try * prune * CI * CI * CI * CI Co-authored-by: Klyukin Valeriy <mr.clyukin@gmail.com> Co-authored-by: Jeff Yang <ydcjeff@outlook.com>	2020-10-26 10:47:09 +00:00
chaton	829d90b257	activated color in all pytest runs (#4254 ) * activated color in all pytest runs * Update .drone.yml Co-authored-by: Jeff Yang <ydcjeff@outlook.com> Co-authored-by: Jeff Yang <ydcjeff@outlook.com>	2020-10-20 16:38:17 +02:00
Jirka Borovec	e4237963d7	hotfix Drone install Horovod (#4038 ) * hotfix Drone install Horovod * notes	2020-10-09 20:46:27 -04:00
Jirka Borovec	1160270882	fix path in CI for release & python version in all dockers & duplicated badges (#3765 ) * typo * path * check * trigger * fix conda * pip ver * fix cuda * fix XLA * fix xla * ci * docker * BIULD * unBIULD * update * py 3.8 * apex * apex	2020-10-02 05:26:21 -04:00
Jirka Borovec	cbc4f6f8a4	add CI for building dockers (#3383 ) * rename * fix badges * add docker build * mergify * update * env * ci * times * CI * name * comment	2020-09-10 18:38:29 -04:00
William Falcon	f43028f3ae	added copyright notices (#3062 )	2020-08-19 22:03:22 -04:00
Jirka Borovec	448be60701	update GPU to PT 1.5 (#2779 ) * update gpu PT 1.6 * fix docker * use PT 1.5 * Update tests/install_AMP.sh Co-authored-by: Nathan Raw <nxr9266@g.rit.edu> Co-authored-by: Nathan Raw <nxr9266@g.rit.edu>	2020-08-02 08:14:53 -04:00
Jirka Borovec	3772601cd6	update CI testing with pip upgrade (#2380 ) * try pt1.5 * cpu * upgrade * tpu * user * [blocked by #2380] freeze GPU PT 1.4 (#2780) * freeze * user	2020-07-31 14:50:06 -04:00
Jirka Borovec	bc7a08fbe0	test dockers & add AMP in pt-1.6 (#1584 ) * exist images * names * images * args * pt 1.6 dev * circleci * update * refactor * build * fix * MKL	2020-07-31 08:23:13 -04:00
Adrian Wälchli	7ef73f242a	try remove pr (#2543 )	2020-07-07 15:26:58 -04:00
Jirka Borovec	977df6ed31	Docker: building XLA base image (#2494 ) * refactor * add TPU base * wip * builds * typo * extras * simple * unzip * rename	2020-07-06 14:21:36 -04:00
Jirka Borovec	39a6435726	Revert "Revert "join coverage (#2460 )" (#2499 )" (#2500 ) This reverts commit `355918af8d`.	2020-07-04 11:31:12 -04:00
William Falcon	355918af8d	Revert "join coverage (#2460 )" (#2499 ) This reverts commit `944ffba305`.	2020-07-04 10:29:50 -04:00
Jirka Borovec	944ffba305	join coverage (#2460 ) * join coverage * full TPU test * codecov * typo * report * docker * timeout * base * show * cd dir * req * docker * docker * docker * coverage * upload * drop main * report * report * python * upload * drone * drone * drone * drone * drone * drone * drone * drone * drone	2020-07-04 10:22:58 -04:00
zcain117	1a40963d1d	Add Github Action to run TPU tests. (#2376 ) * Add Github Action to run TPU tests. * Trigger new Github Actions run. * Clean up more comments. * Use different fixed version of ml-testing-accelerators and update config to match. * use cluster in us-central1-a * Run 'gcloud logging read' directly without 'echo' to preserve newlines. * cat coverage.xml on the TPU VM side and upload xml on the Github Action side * Use new commit on ml-testing-accelerators so command runs fully. * Preserve newlines in the xml and use if: always() temporarily to upload codecov * Use pytorch_lightning for coverage instead of pytorch-lightning * Remove the debug cat of coverage xml * Apply suggestions from code review * jsonnet rename * name * add codecov flags * add codecov flags * codecov * codecov * revert codecov * Clean up after apt-get and remove old TODOs. * More codefactor cleanups. * drone * drone * disable codecov * cleaning * docker py versions * docker py 3.7 * readme * bash * docker * freeze conda * py3.6 * Stop using apt-get clean. * Dont rm pytorch-lightning * Update docker/tpu/Dockerfile * Longer timeout in the Github Action to wait for GKE to finish. * job1 * job2 * job3 Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: Jirka <jirka@pytorchlightning.ai>	2020-07-01 21:44:19 -04:00
Jirka Borovec	4e13e419ea	add CLI test for examples (#2285 ) * cli examples * ddp * CI * CI * req * tests * skip DDP Co-authored-by: William Falcon <waf2107@columbia.edu>	2020-06-27 09:13:29 -04:00
Jirka Borovec	bfaabd7b7f	clean requirements (#2128 ) * clean requirements * missing * missing * req * min * default >> base * base.txt	2020-06-13 10:15:22 -04:00
Jirka Borovec	2674976f2c	remove deprecated API for v0.8 (#2073 ) * remove deprecated API * chlog * times * missed * formatting check * missing * missing * miss * fix docs build error * fix pep whitespace error * docs * wip * amp_level * amp_level Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>	2020-06-12 14:37:52 -04:00
Jirka Borovec	c438d0dd90	increase acc (#2039 ) * increase acc * try 0.45 * @pytest * @pytest * try .50 * duration * pytest	2020-06-03 08:28:19 -04:00
Adrian Wälchli	a6de1b8d75	doctest for .rst files (#1511 ) * add doctest to circleci * Revert "add doctest to circleci" This reverts commit c45b34ea911a81f87989f6c3a832b1e8d8c471c6. * Revert "Revert "add doctest to circleci"" This reverts commit 41fca97fdcfe1cf4f6bdb3bbba75d25fa3b11f70. * doctest docs rst files * Revert "doctest docs rst files" This reverts commit b4a2e83e3da5ed1909de500ec14b6b614527c07f. * doctest only rst * doctest debugging.rst * doctest apex * doctest callbacks * doctest early stopping * doctest for child modules * doctest experiment reporting * indentation * doctest fast training * doctest for hyperparams * doctests for lr_finder * doctests multi-gpu * more doctest * make doctest drone * fix label build error * update fast training * update invalid imports * fix problem with int device count * rebase stuff * wip * wip * wip * intro guide * add missing code block * circleci * logger import for doctest * test if doctest runs on drone * fix mnist download * also run install deps for building docs * install cmake * try sudo * hide output * try pip stuff * try to mock horovod * Tranfer -> Transfer * add torchvision to extras * revert pip stuff * mlflow file location * do not mock torch * torchvision * drone extra req. * try higher sphinx version * Revert "try higher sphinx version" This reverts commit 490ac28e46d6fd52352640dfdf0d765befa56988. * try coverage command * try coverage command * try undoc flag * newline * undo drone * report coverage * review Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> * remove torchvision from extras * skip tests only if torchvision not available * fix testoutput torchvision Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-05-04 22:16:54 -04:00
William Falcon	29ebe92208	support for native amp (#1561 ) * adding native amp suppport * adding native amp suppport * adding native amp suppport * adding native amp suppport * autocast * autocast * autocast * autocast * autocast * autocast * removed comments * removed comments * added state saving * added state saving * try install amp again * added state saving * drop Apex reinstall Co-authored-by: J. Borovec <jirka.borovec@seznam.cz> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>	2020-04-23 14:47:08 -04:00
Jirka Borovec	0b22b64a10	Tests/docker (#1573 ) * devel image * try parallel * new image	2020-04-23 12:52:59 -04:00
Travis Addair	7024177f7d	Added Horovod distributed backend (#1529 ) * Initial commit of Horovod distributed backend implementation * Update distrib_data_parallel.py * Update distrib_data_parallel.py * Update tests/models/test_horovod.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Update tests/models/test_horovod.py Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com> * Fixed tests * Added six * tests * Install tox for GitHub CI * Retry tests * Catch all exceptions * Skip cache * Remove tox * Restore pip cache * Remove the cache * Restore pip cache * Remove AMP Co-authored-by: William Falcon <waf2107@columbia.edu> Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com> Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>	2020-04-22 17:39:08 -04:00
Jirka Borovec	724b787cd1	faster CI testing (#1323 ) * MNIST digits * increase test acc * smaller parity * drone builds * increase GH action timeout * drone format * fix paths * drone cache * circle cache * fix test * lower nb epochs * circleCI * user orb * fix test * fix test * circle cache * circle cache * circle cache * comment caches * benchmark batch size * cache dataset * smaller dataset * smaller dataset * fix nb samples * batch size * fix test	2020-04-02 12:28:44 -04:00
William Falcon	18d055a390	Parity test (#1284 ) * adding test * adding test * added base parity model * added base parity model * added parity test * added parity test * added parity test * added parity test * added parity test * added parity test * added parity test * added parity test * added parity test * added parity test * added parity test * added parity test * added parity test * added parity test * added parity test * move parity to benchmark * formatting * fixed gradient acc sched * move parity to benchmark * formatting * fixed gradient acc sched * skip for CPU * call last Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>	2020-03-30 18:16:32 -04:00
Jirka Borovec	61177cd1c8	system info (#1234 ) * system info * update big info * test script * update config * rename script * import path	2020-03-27 08:45:52 -04:00
Jirka Borovec	45d671a4a8	CI: split tests-examples (#990 ) * CI: split tests-examples * tests without template * comment depends * CircleCI typo * add doctest * update test req. * CI tests * setup macOS * longer train * lover pred acc * fix model * rename default model * lower tests acc * typo * imports * fix test optimizer * update calls * fix Win * lower Drone image * fix call * pytorch image * fix test * add dev image * add dev image * update image * drone volume * lint * update test notes * rename tests/models >> tests/base * group models * conftest * optim imports * typos * fix import * fix tests * install AMP * tests * fix import	2020-03-25 07:46:27 -04:00
Jirka Borovec	22a7264e9a	improve partial Codecov (#1172 ) * ignore in setup * show report * abs imports * abstract pass * cover loggers * doctest trains * locals * pass * revert tensorboard * use tensorboardX * revert tensorboardX * fix trains * Add TrainsLogger.set_credentials (#1179) * Add TrainsLogger.set_credentials to control trains server configuration and authentication from code. Sync trains package version. Fix CI Trains tests * Add global TrainsLogger set_bypass_mode (#1187) * Add global TrainsLogger set_bypass_mode skips all external communication Co-authored-by: bmartinn <> * rm some no-cov Co-authored-by: Martin.B <51887611+bmartinn@users.noreply.github.com>	2020-03-19 09:14:29 -04:00
Jirka Borovec	f6a7a5278a	enable Codecov (#1133 ) * update config * try Drone cache * drop Drone cache * move import * remove token	2020-03-14 13:01:57 -04:00
Jirka Borovec	5691ffb160	add Drone CI (#1115 ) * add Drone config * update Drone config * add Drone config * list GPUs * add type * native torch * native torch * fix image * update * SLURM_LOCALID * add badge * simple test	2020-03-11 15:39:59 -04:00

42 Commits