Commit Graph

2208 Commits

Author SHA1 Message Date
William Falcon a258d3d31b fixed val_loss for early stopping 2020-04-26 12:27:19 -04:00
William Falcon f3369666e6
Update README.md 2020-04-26 11:27:50 -04:00
William Falcon e8c087ed89
Update README.md 2020-04-26 11:16:48 -04:00
William Falcon 2c88e01736
Update README.md 2020-04-26 11:16:12 -04:00
William Falcon 1823c6997d
Update README.md 2020-04-26 11:14:56 -04:00
William Falcon d290b818d0
Update __init__.py 2020-04-26 11:08:00 -04:00
William Falcon 198d7715ee docs clean up 2020-04-26 11:06:36 -04:00
William Falcon d2b94ca81b
clean up docs (#1614)
* fixed hparams section

* docs clean up
2020-04-26 10:57:26 -04:00
William Falcon 4755ded863
Clean up Argparse interface with trainer (#1606)
* fixed distutil parsing

* fixed distutil parsing

* Apply suggestions from code review

* log

* fixed distutil parsing

* fixed distutil parsing

* fixed distutil parsing

* fixed distutil parsing

* doctest

* fixed hparams section

* fixed hparams section

* fixed hparams section

* formatting

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
2020-04-26 09:20:06 -04:00
Justus Schock 13bf772d96
Create GH action for automated docker builds on releases (#1559)
* Create docker_builds.yml

* Update docker_builds.yml

* Update docker_builds.yml

* Update docker_builds.yml

* Update docker_builds.yml
2020-04-26 08:01:55 -04:00
William Falcon 17bce62e5f
Update __init__.py 2020-04-25 19:04:39 -04:00
William Falcon b620d86c54
diable val and test shuffling (#1600)
* diable val and test shuffling

* diable val and test shuffling

* diable val and test shuffling

* diable val and test shuffling

* log

* condition

* shuffle

* refactor

Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
2020-04-25 16:45:20 -04:00
William Falcon 791ba91dec
slurm job id (#1605) 2020-04-25 16:01:15 -04:00
Justus Schock d1279afff8
Create Dockerfile (#1569)
* Create Dockerfile

* add readme

* Update MANIFEST.in

* Update Dockerfile

Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
2020-04-25 14:17:09 -04:00
William Falcon e684fdf60b
Clean docs (#1604)
* spacing

* slurm docs
2020-04-25 13:21:53 -04:00
William Falcon 1e2c9eaf89 updated docs 2020-04-25 13:04:34 -04:00
William Falcon cbd088bd13
multi processing warnings (#1602)
* multi processing warnings

* multi processing warnings

* multi processing warnings

* multi processing warnings

* multi processing warnings

* multi processing warnings
2020-04-25 10:03:02 -04:00
William Falcon f531ab957b
Update __init__.py 2020-04-24 17:21:52 -04:00
Jirka Borovec 58a467dd68
model checkpint on rank_zero_only & global rank state (#1408)
* try delete in async or DDP us0-ecase

* changelog

* add model chekpoint rank

* simple delete

* flake8

* use global rank

* chnagelog

* fix review

* fix import

* proposal

* proposal

* proposal

* improve proposal (fix problems with method call self)

* cleaning

Co-authored-by: Adrian Wälchli <adrian.waelchli@students.unibe.ch>
Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-04-24 17:21:00 -04:00
William Falcon d0faf97893
fixed dataset stuff + docs (#1599)
* Fixed dataset docs and disabled auto-sampler for iterable dataset
2020-04-24 16:51:26 -04:00
Adrian Wälchli d56b3e5e69
update contributers list (#1597) 2020-04-24 14:46:17 -04:00
Jirka Borovec 570b2c7aeb
fix depreated call (#1596)
* fix parity

* update deprecated call
2020-04-24 14:45:43 -04:00
William Falcon f07176da9b
Update __init__.py 2020-04-24 10:33:26 -04:00
Jirka Borovec e0e67685d7
missing change (#1591) 2020-04-24 10:30:33 -04:00
Boris Dayma f3d139e90f
fix(wandb): allow use of sweeps (#1512)
* fix(wandb): allow use of sweeps

overwrite run config parameters due to precision error

fix #1290

* docs(wandb): update changelog

* test(wandb): update config test

Co-authored-by: William Falcon <waf2107@columbia.edu>
2020-04-24 10:29:24 -04:00
William Falcon cd15bfc3ce
fixed new amp bugs (#1593) 2020-04-24 09:29:39 -04:00
William Falcon 67d5f4dc39
Update __init__.py 2020-04-23 21:02:19 -04:00
William Falcon 890458fdbd
Fixes automatic parser bug (#1585)
* fixes gpu parsing

* fixes gpu parsing
2020-04-23 21:00:41 -04:00
Adrian Wälchli 3e8f2d99a9
Progress bar callback (#1450)
* squash and rebase

sanity check hooks


sanity check callback hook finish


moved core progress bar functionality into callback


wip


remove duplicate merge


clean up


imports


docs


sanity check progress bar main


sanity


move callback calls


init progrss bar callback


configuration and docs


changelog


rate decorator


pass process_position


disable on rank > 0


position index


is_enabled


remove decorator


refactor init tqdm bars


callback method ordering 


cannot reset when disabled


sequence -> list


default values


fix has no attr _time() 


move on_val_end to proper place


fix the pickle issue


update warning


properties


check for None


remove old comment


switch order


pull out non-tqdm functionality into base class


documentation for the base class


docs


fix refresh rate issue in validation


restrict type hint of trainer arg


more docs


update trainer docs


rst docs


fix lines too long


fix test


add missing type hints


fix typo


move docstring to __init__ solves doctest failures


remove doctest :(( can't fix the pickle error


fix example


simplify by saving trainer reference


fix docs errors


move docstring


initial value


multiple val checks per epoch


simpler handling of inf dataset sizes


update inf docs


renamed training_tqdm_dict


rename get_tqdm_dict


rename occurences of tqdm 


update changelog


fix doctest


fix formatting errors


added callback tests


progress bar on off test


more tests for progress bar


weird test fix?


add ignored property


disable default progress bar in LR finder


change enable/disable behavior


trying doctest in CI again


undo doctest pickle error


undo doctest pickle error :((


remove progress_bar_callback Trainer arg and fix tests


restore progress bar after auto lr find


update docs


fix rebase


fix wrong negation

* fix fast dev run total

* more thorough testing

* remove old args

* fix merge

* fix merge

* separate tests

* type hint total batches

* reduce if

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* is_disabled

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* is_enabled

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* rename enabled/disabled

* move deprecated api

* remove duplicated test from merge

* fix rename is_disabled

* newline

* test also testprogress for fast dev run

Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-04-23 20:46:18 -04:00
Guy Davidson fe2b6666e0
Fixing a small issue in trainer logging (#1563)
* The epoch was being logged to metrics, which isn't read, rather than to current_metrics.

* Updated the tests to account for the epoch arriving at the logger.
2020-04-23 17:52:41 -04:00
Jirka Borovec 7989ca844c
test deprecation warnings (#1470)
* check deprecation warnings

* extend warning test

* try

* unimport modules

* update
2020-04-23 17:34:47 -04:00
Alexey Karnachev edb8d7a23c
Nested metrics dictionaries now can be passed to the loggers (#1582)
* now func merge_dicts works with nested dictionaries

* CHANGELOG.md upd
2020-04-23 17:32:36 -04:00
Jirka Borovec 94e53444c6
fix changelog (#1583) 2020-04-23 16:57:37 -04:00
William Falcon 5ab5084f7b
Update __init__.py 2020-04-23 15:32:40 -04:00
William Falcon 47629536e2
Amp2 (#1580)
* fixed new amp bugs

* fixed new amp bugs
2020-04-23 15:24:02 -04:00
William Falcon 68ca577919
why copy? (#1579) 2020-04-23 15:03:39 -04:00
William Falcon 29ebe92208
support for native amp (#1561)
* adding native amp suppport

* adding native amp suppport

* adding native amp suppport

* adding native amp suppport

* autocast

* autocast

* autocast

* autocast

* autocast

* autocast

* removed comments

* removed comments

* added state saving

* added state saving

* try install amp again

* added state saving

* drop Apex reinstall

Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
2020-04-23 14:47:08 -04:00
karlinjf 41b6cbb3ca
Don't copy the batch when training on a single gpu (#1576)
* fix

* whitespace

Co-authored-by: Josh Karlin <karlinjf@gmail.com>
2020-04-23 14:28:20 -04:00
Jirka Borovec 0b22b64a10
Tests/docker (#1573)
* devel image

* try parallel

* new image
2020-04-23 12:52:59 -04:00
Nicki Skafte e977d1cde5
Default value for ModelCheckpoint filepath (#1548)
* allow determine of filepath at runtime

* typing

Co-authored-by: Nicki Skafte <nugginea@gmail.com>
2020-04-23 11:50:58 -04:00
Ferdinand Schlatt 545b38ec5f
fix boolean argparse (#1571)
* fix boolean argparse #1570

* update change log
2020-04-23 11:44:18 -04:00
William Falcon 759557050a
Update __init__.py 2020-04-23 11:04:57 -04:00
Lezwon Castelino 831842972f
check for kaggle env variable (#1568)
* check for kaggle env variable

* added changelog
2020-04-23 07:12:54 -04:00
William Falcon 990fd22488
Update __init__.py 2020-04-22 20:16:04 -04:00
Travis Addair 7024177f7d
Added Horovod distributed backend (#1529)
* Initial commit of Horovod distributed backend implementation

* Update distrib_data_parallel.py

* Update distrib_data_parallel.py

* Update tests/models/test_horovod.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Update tests/models/test_horovod.py

Co-Authored-By: Jirka Borovec <Borda@users.noreply.github.com>

* Fixed tests

* Added six

* tests

* Install tox for GitHub CI

* Retry tests

* Catch all exceptions

* Skip cache

* Remove tox

* Restore pip cache

* Remove the cache

* Restore pip cache

* Remove AMP

Co-authored-by: William Falcon <waf2107@columbia.edu>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
2020-04-22 17:39:08 -04:00
Jirka Borovec 4d24032ea5
tests for pytorch 1.5 (#1552)
* tests for pytorch 1.5

* up Win

* win

* win

* win

* win

* win

* win
2020-04-22 10:10:23 -04:00
Jirka Borovec c1c6e3b6c9
default test logger (#1478)
* default test logger

* fix tests

* spawn

* try

* simplify tests

* simplify tests

* formatting

* loggers

* loggers

* revert to TestTube

* default

* default

* wraps

* world size

* optim imports
2020-04-21 20:33:10 -04:00
Kevin Chen bafdeca42f
Replace GPU device idx with current process index (#1541) 2020-04-21 14:29:15 -04:00
Justus Schock 29c7d2f195
Revert namespace package search to normal package search (#1545)
* Revert this

* typos

* version++

Co-authored-by: J. Borovec <jirka.borovec@seznam.cz>
2020-04-21 08:26:47 -04:00
Justus Schock 8035c10f37
Prepare Namespace package (#1543)
* Update __init__.py

* Update setup.py
2020-04-21 07:12:02 -04:00