A few questions about gradient

Hi, I am new here and I have been reading about Gradient today. Basically, up to now, I would spin-up my VM, launch a jupyter notebook, open the notebook in the browser and work.
There are some disadvantages to this, however. The time you spend coding is uptime (i.e. paid time). I want to make sure I got this correctly:

  1. Gradient is a CLI module that allows you to only send execution commands to an ad-hoc VM (jobs). For example if I write code to train a neural network in a notebook, I can then send just the train method through Gradient and I only pay for the time the network actually trains (time the job takes to run) and not for the entire time I spend coding. Is this correct?

  2. I can recover trained models (saved) and training history after the job is finished?

  3. Can I monitor the net’s training with Tensorboard?

  4. What is the job limit with various subscriptions? Does 10 jobs/month means I can only train my model (or run some script) 10 times per month?

Finally, is there a fully-worked example of training a network using Gradient and how to then recover your model?


1 Like

@qubix12 All good questions!

  1. That is correct. You could work in a local Notebook or you could run a Gradient Notebook on a low-cost instance type like a C2 which is less than a penny an hour. Then you can execute your training code as a Job or just clone the Notebook on a powerful instance type. Jobs are handy because, as you mentioned, they can be automatically stopped when the training is complete.

  2. Just output your model to /artifacts directory (downloadable in the CLI/UI) or /storage (accessible to other Notebook/Jobs).

  3. Yes – you can access TensorBoard from a Notebook or a Job.

  4. The Gradient limits are total and concurrent (not monthly). See this article for more info.

There are several examples on our GitHub, as Public Jobs, and as blog posts.

@Daniel, thanks for the detailed reply. The only thing I still am a bit confused about is number 4 (even after looking at the suggested article). Let’s say I am on the Gradient 0 plan. I can run :

  • 1 concurrent job : this one is clear, you can only submit one job at a time.
  • 1 concurrent notebook : ok, so you only start and work in a notebook at a time.
  • 10 jobs : this is not so clear. What does 10 total jobs mean? Let’s say I work in a notebook, I send a job to train my network. Then I modify the code and send another job, etc. After I finish 10 jobs, what happens? When can I then submit a new job?

@qubix12 10 Jobs just means how many you can have in your account at any given time. Once a Job is complete, it remains in your account for your reference though you can delete these at any time. So, if you have 10 and want to run a new job, either upgrade your gradient subscription or delete a previous Job.