WeightGrad: Geo-Distributed Data Analysis Using Quantization for Faster Convergence and Better Accuracy
Syeda Nahida Akter: Bangladesh University of Engineering and Technology; Muhammad Abdullah Adnan: University of California San Diego
High network communication cost for synchronizing weights and gradients in geo-distributed data analysis consumes the benefits of advancement in computation and optimization techniques. Many quantization methods for weight, gradient or both have been proposed in recent years where weight-quantized model suffers from error related to weight dimension and gradient-quantized method suffers from slow convergence rate by a factor related to the gradient quantization resolution and gradient dimension. All these methods have been proved to be infeasible in terms of distributed training across multiple data centers all over the world. Moreover recent studies show that communicating over WANs can significantly degrade DNN model performance by upto 53.7x because of unstable and limited WAN bandwidth. Our goal in this work is to design a geo-distributed Deep-Learning system that (1) ensures efficient and faster communication over LAN and WAN and (2) maintain accuracy and convergence for complex DNNs with billions of parameters. In this paper, we introduce WeightGrad which acknowledges the limitations of quantization and provides loss-aware weight-quantized networks with quantized gradients for local convergence and for global convergence it dynamically eliminates insignificant communication between data centers while still guaranteeing the correctness of DNN models. Our experiments on our developed prototypes of WeightGrad running across 3 Amazon EC2 global regions and on a cluster that emulates EC2 WAN bandwidth show that WeightGrad provides 1.06% gain in top-1 accuracy, 5.36x speedup over baseline and 1.4x-2.26x over the four state-of-the-art distributed ML systems.
How can we assist you?
We'll be updating the website as information becomes available. If you have a question that requires immediate attention, please feel free to contact us. Thank you!
Please enter the word you see in the image below: