KDD Papers

Small Batch or Large Batch: Gaussian Walk with Rebound Can Teach

Peifeng Yin (IBM Research Almaden);Ping Luo (Institute of Computing Technology, CAS);Taiga Nakamura (IBM Research Almaden)


Abstract

Efficiency of large-scale learning is a hot topic in both academic and industry. The \emph{stochastic gradient descent (SGD)} algorithm, and its extension \emph{mini-batch SGD}, allow the model to be updated without scanning the whole data set. However, the use of approximate gradient leads to the uncertainty issue, slowing down the decreasing of objective function. Furthermore, such uncertainty may result in a high frequency of meaningless update on the model, causing a communication issue in parallel learning environment. In this work, we develop a \emph{batch-adaptive stochastic gradient descent (BA-SGD)} algorithm, which can dynamically choose a proper batch size as learning proceeds. Particularly on the basis of Taylor extension and central limit theorem, it models the decrease of objective value as a random walk game with a Gaussian dice. In this game, a heuristic strategy of determining batch size is adopted to maximize the utility of each incremental sampling. By evaluation on multiple real data sets, we demonstrate that by smartly choosing the batch size, the BA-SGD not only conserves the fast convergence of SGD algorithm but also avoids too frequent model updates.


Comments