diff --git a/docs/source/advanced/amp.rst b/docs/source/advanced/amp.rst index d42f1c8c29..2c25f9e7f9 100644 --- a/docs/source/advanced/amp.rst +++ b/docs/source/advanced/amp.rst @@ -48,9 +48,6 @@ To use 16-bit precision, do two things: .. code-block:: bash - $ git clone https://github.com/NVIDIA/apex - $ cd apex - # ------------------------ # OPTIONAL: on your cluster you might need to load CUDA 10 or 9 # depending on how you installed PyTorch @@ -65,7 +62,7 @@ To use 16-bit precision, do two things: # make sure you've loaded a cuda version > 4.0 and < 7.0 module load gcc-6.1.0 - $ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ + $ pip install --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" https://github.com/NVIDIA/apex .. warning:: NVIDIA Apex and DDP have instability problems. We recommend native 16-bit in PyTorch 1.6+ diff --git a/requirements/install_Apex.sh b/requirements/install_Apex.sh deleted file mode 100644 index 0c70e0bc34..0000000000 --- a/requirements/install_Apex.sh +++ /dev/null @@ -1,10 +0,0 @@ -#!/usr/bin/env bash - -ROOT=$PWD -git clone https://github.com/NVIDIA/apex -cd apex -pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ -# If build with extensions fails, you can run this line to build without extensions -# pip install -v --no-cache-dir ./ -cd $ROOT -rm -rf apex diff --git a/requirements/install_ONNX.sh b/requirements/install_ONNX.sh deleted file mode 100644 index d6784fa373..0000000000 --- a/requirements/install_ONNX.sh +++ /dev/null @@ -1,41 +0,0 @@ -#!/usr/bin/env bash - -ROOT=$PWD - -# python -m pip install protobuf -# git clone --recursive https://github.com/onnx/onnx.git -# cd onnx -# python setup.py bdist_wheel -# pip install --upgrade dist/*.whl -# cd $ROOT -# rm -rf onnx - - -# https://github.com/microsoft/onnxruntime/blob/master/BUILD.md -git clone --recursive https://github.com/Microsoft/onnxruntime -cd onnxruntime -export ONNX_ML=1 -pip install setuptools wheel numpy - -if [[ "$OSTYPE" == "linux-gnu"* ]]; then - ./build.sh --config RelWithDebInfo --build_shared_lib --build_wheel --parallel -elif [[ "$OSTYPE" == "darwin"* ]]; then - # Mac OSX - ./build.sh --config RelWithDebInfo --build_shared_lib --build_wheel --parallel --use_xcode -elif [[ "$OSTYPE" == "cygwin" ]]; then - # POSIX compatibility layer and Linux environment emulation for Windows - ./build.sh --config RelWithDebInfo --build_shared_lib --build_wheel --parallel -elif [[ "$OSTYPE" == "msys" ]]; then - # Lightweight shell and GNU utilities compiled for Windows (part of MinGW) - .\build.bat --config RelWithDebInfo --build_shared_lib --build_wheel --parallel -elif [[ "$OSTYPE" == "win32" ]]; then - .\build.bat --config RelWithDebInfo --build_shared_lib --build_wheel --parallel -else - echo $OSTYPE # Unknown. -fi - -find . -name "*.whl" -pip install --upgrade $(find . -name "*.whl") - -cd $ROOT -rm -rf onnxruntime diff --git a/tests/README.md b/tests/README.md index 0b0563a3ae..b9158358fd 100644 --- a/tests/README.md +++ b/tests/README.md @@ -4,21 +4,10 @@ This provides testing for most combinations of important settings. The tests expect the model to perform to a reasonable degree of testing accuracy to pass. ## Running tests -The automatic travis tests ONLY run CPU-based tests. Although these cover most of the use cases, -run on a 2-GPU machine to validate the full test-suite. - - -To run all tests do the following: - -Install [Open MPI](https://www.open-mpi.org/) or another MPI implementation. Learn how to install Open MPI [on this page](https://www.open-mpi.org/faq/?category=building#easy-build>). - ```bash git clone https://github.com/PyTorchLightning/pytorch-lightning cd pytorch-lightning -# install AMP support -bash requirements/install_Apex.sh - # install dev deps pip install -r requirements/devel.txt @@ -27,11 +16,11 @@ py.test -v ``` To test models that require GPU make sure to run the above command on a GPU machine. -The GPU machine must have: -1. At least 2 GPUs. -2. [NVIDIA-apex](https://github.com/NVIDIA/apex#linux) installed. -3. [Horovod with NCCL](https://horovod.readthedocs.io/en/stable/gpus_include.html) support: `HOROVOD_GPU_OPERATIONS=NCCL pip install horovod` +The GPU machine must have at least 2 GPUs to run distributed tests. +Note that this setup will not run tests that require specific packages installed +such as Horovod, FairScale, NVIDIA/apex, NVIDIA/DALI, etc. +You can rely on our CI to make sure all these tests pass. ## Running Coverage Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed.