Loading images from /datasets with fast.ai/PyTorch DataLoader fails

Trying to load images from /datasets/celebA and it fails, always on different image, with this error:
PIL.UnidentifiedImageError: cannot identify image file '/datasets/celebA/027395.jpg'

Here is a full notebook with repro:
https://n9y0uibw.gradient.paperspace.com/notebooks/Repro%20PIL.UnidentifiedImageError%20issue.ipynb

It’s standard fastai/PyTorch setup, the point where it fails is fastai.learner.Learner.lr_find(), which triggers the underlying DataLoader to actually load images from the disk. It’s worth pointing out that this image loading is parallelized.

Since it always fails on different image, I think it’s some disk-access/fs problem on Paperspace/Gradient.

Any ideas would be appreciated. Thank you.

Thank you for identifying this bug @vojtajina. We’ve addressed the issue with the respective images so that you may re-run your workload at your convenience.

Oh, wow, that was fast! Let me try again.

It’s working now! Thank you!