Running paperspace-python behind proxy


#1

Hi, everyone. I have been trying to use paperspace-python's run interface to create a job with the workspace being a zip file. However, I am behind my institute proxy. To just check that everything works and to see how to save files, etc., I made a simple python script where I am copying data from one text file and saving it to another one in /storage and moving it from /storage to /artifacts in the file run.sh (below):

python test.py
mv /storage/* /artifacts

And when I run the following, my job starts perfectly:

paperspace-python run --command "bash run.sh"

However, now when I am trying to do the same by also uploading my actual data to be used for training, I get the following error:

{           
  "error": true, 
  "message": "HTTPSConnectionPool(host='api.paperspace.io', port=443): Max retries
   exceeded with url: /jobs/createJob?project=Splice+Site+Prediction&workspaceFileName= 
   autoencoder_train.zip.zip&container=paperspace%2Ftensorflow-python&machineType=P5000& 
   name=AE+train&command=pip2+install+-r+requirements.txt% 0Abash+run.sh (Caused by
   ProxyError('Cannot connect to proxy.', error(\"(110, 'ETIMEDOUT')\",)))"
}

Can anyone please help me with this?
The command used in the second case:

paperspace-python run --command "bash run.sh" --workspace autoencoder_train.zip  --req ../requirements.txt --project "Splice Site Prediction" --name "AE train"

#2

Hmm let us look in to what’s happening. I’ll try to repro the issue locally and see what happens.


#3

One thing that is unfortunately not super well documented is that the /artifacts directory has a size limit and might be getting overwhelmed. But it looks like you are getting that error before the job even runs so it might be something else.


#4

@dte Thanks for your reply. Yes, the problem is that it’s not even running. I think the dataset size might be a problem, but I guess 3 GB should be not that large. There might the issue that my firewall is blocking the request locally itself, since it’s too large for it.


#5

Unfortunately job workspaces are limited to 100MB at this time. If you have a 3GB dataset your best bet is to run a job with a curl command that puts the dataset in to your persistent drive (located at /storage). Then on subsequent runs you can run any code against that data. Does that make sense?


#6

@dte Yup, makes sense perfectly. Got the job running. Thanks for the advice :slight_smile:


#7

Just one small thing, there are no logs on the website after 2000 lines. So, how can I check the progress for the code using paperspace-python?


#8

Hi there, that is a current limitation of the GUI that we are fixing ASAP. In the meantime you can grab the full logs using the CLI with paperspace jobs logs --jobId "j123abc"