Why is Gradient P6000 slower than Colab K80?

I train a small CNN model using the same PyTorch code in Colab Free K80 and paid instance Gradient P6000:

as far as I know, P6000 has better performance than K80, but when I measure the model training time using the code above, it shows that K80 only needs ~110s to train the model for 20 epochs, while P6000 needs ~138s (you can see the output in the code above)

Why did it happen?

Hi @rianrajagede I would definitely check the batch size. The K80 has significantly less GPU memory so it will be bottlenecked moving data in and out of the GPU. Since the P6000 has 24 GB memory, you can load a lot more data which should accelerate your training time. You can do some back-of-the-envelope math to figure out the ideal size of the batch relative to the GPU – or just try increasing it incrementally. Give that a shot and let us know how it goes.