Sparse Networks from Scratch: Faster Training without Losing Performance

2733 shaares
32 private links

2733 shaares · 32 private links

Filters

Links per page

20 50 100

Sparse Networks from Scratch: Faster Training without Losing Performance

The results are quite impressive! We compared against compression algorithms on MNIST, where sparse momentum outperforms most other methods. This is a pretty good result given that compression methods start from a dense network and usually retrain repetitively while we train a sparse network from scratch! Another impressive result is that we can match or even exceed the performance of dense networks by using 20% of weights (80% sparsity). On CIFAR-10, we compare against Single-shot Network Pruning which is designed for simplicity and not performance — so it is not surprising that sparse momentum does better. However, what is interesting is that we can train both VGG16-D (a version of VGG16 with two fully connected layers) and Wide Residual Network (WRN) 16-10 (16 layers deep and very wide WRN) to dense performance levels with just 5% of weights. For other networks, sparse momentum comes close to dense performance levels. Furthermore, as I will show later, with an optimized sparse convolution algorithm, we would be able to train a variety of networks to yield the same performance levels while training between 3.0-5.6x faster!

February 9, 2020 at 11:29:44 AM EST * · permalink

https://timdettmers.com/2019/07/11/sparse-networks-from-scratch/#more-774

Filters

Links per page

20 50 100