Machine Learning on Amazon AWS GPU Instances

Machine learning algorithms regularly utilize GPUs to parallelize computations, and Amazon AWS GPU Instances provide cheap and on-demand access to capable virtual servers with NVIDIA GPUs.

GPU Instances come in two flavors: G2.2xlarge and G2.8xlarge:

ModelGPUsvCPUMem (GiB)SSD Storage (GB)
g2.2xlarge18151 x 60
g2.8xlarge432602 x 120

The GPU instances feature Intel Xeon E5-2670 (Sandy Bridge) Processors and NVIDIA GPUs with 1,536 CUDA cores and 4GB of video memory each.


Tips & Tricks

Several machine learning frameworks such as Torch and Theano as well as Amazon itself provide AMIs (Amazon Machine Images) with pre-installed dependencies and NVIDIA kernel drivers:

Use spot instances - they’re much cheaper for GPU instances!

  • Pick a price that has been steady for a while
  • $0.10/hr often gets you a g2.xlarge instance, even for a few days continuously
  • You can view price graphs for the instance type in the AWS console.

Spot instances get a 2 minute notice before being shut down. You can use boto (AWS SDK for Python) to check the timestamp for when that will occur.

Make sure to snapshot your models, otherwise you might lose training time and have to start over. You can save the snapshots to S3 (depending on the size of the model).

Create an AMI with all dependencies pre-installed so you don’t waste time installing those when the instance spins up.

For very large datasets use their Elastic Block Storage (EBS). It’s basically an on-demand SSD you can attach to instances when they spin up.


References


If you have suggestions, feedback or ideas, reach out to @metachris!