Running constantly out of memory with free GPUs

Hi good people from Paperspace,

I was training some models with free GPUs using Jupyter notebooks to understand if this is a service I want to pay for and use.

It worked fine, however, at one point my GPU memory started to immediately spike up to 100% for the easiest and smallest tasks. Hence, I can not use any of the training capabilities any more.
Shutting down the kernel resets the GPU memory, but as soon as I run a script it spikes up to 100% again and I receive following error:

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 15.90 GiB total capacity; 15.11 GiB already allocated; 91.50 MiB free; 15.17 GiB reserved in total by PyTorch)

Is this something that happens quite often? And how to resolve it?

image

Thanks!
Jonas

PS - paperspace does not accept german credit cards it seems. Tried a couple of times to pay for an account and was denied every time.

Hi @jonas-nothnagel 9 times out of 10, if you are running out of memory, the batch size which is loaded into GPU memory in-full is too large for the GPU and you need to scale it back a bit. It shouldn’t have anything to do with the compute instance itself.

We have thousands of customers in Germany and we definitely accept German credit cards – it’s just a one-off issue :slight_smile: Please open a ticket here and we’ll take care of you:
https://support.paperspace.com/hc/en-us/requests/new

Hi @Daniel! Thanks for your answer.
I had to contact the support to unflag my credit card and it is working now.

The batch size makes sense to me, however, I opened this issue since I have the feeling the batch size is not entirely the problem. It seems a bit random to me. Sometimes I can train easily on 10 epochs without even lowering the batch size at all and sometimes I run out of memory after 2 seconds using the exact same parameters or even lowering the batch size.

It is the case for both, free instances and paid instances.

What information can I give you to help me further?

Thanks!