Dual coordinate descent method is one of the most effective approaches for large-scale linear classification. However, its sequential design makes the parallelization difficult. In this work, we target at the parallelization in a multi-core environment. After pointing out difficulties faced in some existing approaches, we propose a new framework to parallelize the dual coordinate descent method. The key idea is to make the majority of all operations (gradient calculation here) parallelizable. The proposed framework is shown to be theoretically sound. Further, we demonstrate through experiments that the new framework is robust and efficient in a multi-core environment.

Filed under: Big Data | Large Scale Machine Learning Systems