I’m running a transfer-learning algorithm (ResNet-50) on a specific dataset on an AWS EC2-instance. More specifically, I’m using standard Amazon Community AMIs for deep learning on a p3.8xlarge GPU compute instance.
When I ssh into my instance, I source activate the deep learning conda environment. From there, I’m launching jupyter notebooks to run code in the python 3 kernel.
When I first start running my code, it runs normally. Below is the CPU utilization %:
AWS Certified At some point in the code, the connection to the notebook fails. This is the only information I’m getting from terminal:
packet_write_wait: Connection to X.X.X.X IP address port 22: Broken pipe
How do I fix this?