Rich Information is Affordable: A Systematic Performance Analysis of Second-order Optimization Using K-FAC
Yuichiro Ueno: Tokyo Institute of Technology; Kazuki Osawa: Tokyo Institute of Technology; Yohei Tsuji: Tokyo Institute of Technology; Akira Naruse: NVIDIA; Rio Yokota: Tokyo Institute of Technology
Rich information matrices from first and second-order derivatives have many potential applications in both theoretical and practical problems in deep learning. However, computing these information matrices is extremely expensive and this enormous cost is currently limiting its application to important problems regarding generalization, hyperparameter tuning, and optimization of deep neural networks. One of the most challenging use cases of information matrices is their use as a preconditioner for the optimizers, since the information matrices need to be updated every step. In this work, we conduct a step-by-step performance analysis when computing the Fisher information matrix during training of ResNet-50 on ImageNet, and show that the overhead can be reduced to the same amount as the cost of performing a single SGD step. We also show that the resulting Fisher preconditioned optimizer can converge in 1/3 the number of epochs compared to SGD, while achieving the same Top-1 validation accuracy. This is the first work to achieve such accuracy with K-FAC while reducing the training time to match that of SGD.
How can we assist you?
We'll be updating the website as information becomes available. If you have a question that requires immediate attention, please feel free to contact us. Thank you!
Please enter the word you see in the image below: