Which GPU(s) to Get for Deep Learning

2733 shaares
32 private links

2733 shaares · 32 private links

Filters

Links per page

20 50 100

Which GPU(s) to Get for Deep Learning

Convolutional networks and Transformers: Tensor Cores > FLOPs > Memory Bandwidth > 16-bit capability
Recurrent networks: Memory Bandwidth > 16-bit capability > Tensor Cores > FLOPs

A simple and effective way to think about matrix multiplication AB=C is that it is memory bandwidth bound: Copying memory of A, B unto the chip is more costly than to do the computations of AB. This means memory bandwidth is the most important feature of a GPU if you want to use LSTMs and other recurrent networks that do lots of small matrix multiplications. The smaller the matrix multiplications, the more important is memory bandwidth.

On the contrary, convolution is bound by computation speed. Thus TFLOPs on a GPU is the best indicator for the performance of ResNets and other convolutional architectures. Tensor Cores can increase FLOPs dramatically.

February 2, 2020 at 6:41:07 PM EST * · permalink

https://timdettmers.com/2019/04/03/which-gpu-for-deep-learning/

Filters

Links per page

20 50 100