Isolation Kernel and Its Effect on SVM
Kai Ming Ting (Federation University Australia); Yue Zhu (Nanjing University); Zhi-Hua Zhou (Nanjing University)
This paper investigates data dependent kernels that are derived directly from data. This has been an outstanding issue for about two decades which hampered the development of kernel-based methods. We introduce Isolation Kernel which is solely dependent on data distribution, requiring neither class information nor explicit learning to be a classifier. In contrast, existing data dependent kernels rely heavily on class information and explicit learning to produce a classifier. We show that Isolation Kernel approximates well to a data independent kernel function called Laplacian kernel under uniform density distribution. With this revelation, Isolation Kernel can be viewed as a data dependent kernel that adapts a data independent kernel to the structure of a dataset. We also provide a reason why the proposed new data dependent kernel enables SVM (which employs a kernel through other means) to improve its predictive accuracy. The key differences between Random Forest kernel and Isolation Kernel are discussed to examine the reasons why the latter is a more successful tree-based kernel.