/storage and h5py/pytables (e.g. Keras save weights) issues? Here's why, and how to solve it




Long version:

On all Gradient jobs run on images with ubuntu 16.04+ which includes all recent nvidia/cuda images, you will have issues writing and even reading HDF5 files, looking like “No locks available”.

Ubuntu 16.04 Docker doesn’t run the necessary daemons to enable NFS file locking, which is what’s needed for modern HDF5. This is however how /storage is mounted. You will notice the issue with ufoym/py36-all (but not with ufoym/py36-all-jupyter, which is an old unmaintained tag. ufoym/py36-jupyter-all is the current version and fails on HDF5 file opening).

The HDF5 library offers a possibility to run without file locking, which is activated with settting the environment variable HDF5_USE_FILE_LOCKING to the literal FALSE. This is sparsely documented but solves the issue.

I hope this can help you. Good Luck.


Note: you can set the env variable in your Docker image and/or in your Python script, both work.