Deploying Analytics with the Portable Format for Analytics (PFA)
Jim Pivarski, Open Data Group Inc.; Collin Bennett, Open Data Group Inc.; Robert Grossman*, University of Chicago
We introduce a new language for deploying analytic models into products, services and operational systems called the Portable Format for Analytics (PFA). PFA is an example of what is sometimes called a model interchange format, a language for describing analytic models that is independent of speciﬁc tools, applications or systems. Model interchange formats allow one application (the model producer) to ex-port models and another application (the model consumer or scoring engine) to import models. The core idea behind PFA is to support the safe execution of statistical functions, mathematical functions, and machine learning algorithms and their compositions within a safe execution environment. With this approach, the common analytic models used in data science can be implemented, as well as the data transformations and data aggregations required for pre- and post-processing data. PFA compliant scoring engines can be extended by adding new user deﬁned functions described in PFA. We describe the design of PFA. A Data Mining Group (DMG) Working Group is developing the PFA standard. The current version is 0.8.1 and contains many of the commonly used statistical and machine learning models, including regression, clustering, support vector machines, neural networks, etc. We also describe two implementations of Hadrian, one in Scala and one in Python. We discuss four case studies that use PFA and Hadrian to specify analytic models, including two that are deployed in operations at client sites.
Filed under: Big Data