Active learning and transfer learning at scale with R and Python
John-Mark Agosta, Olga Liakhovich, Robert Horton, Mario Inchiosa, Justin Ormont, Vanja Paunić, Siddarth Ramesh, Tomas Singliar, Ali-Kazim Zaidi, and Hang Zhang (Microsoft)
Many organizations have access to large amounts of data, but find it challenging to train supervised machine learning models because it is laborious or expensive to label the training cases. In this hands-on tutorial we will work through two simplified examples of active learning, one with text classification and one with image classification. By iteratively labeling small numbers of cases and building models to aid in selection of additional cases to label, we will be able to build much higher performance models from a given number of cases than we would be likely to achieve if we randomly chose cases to label. For both the text and image classifiers, we will use a type of transfer learning to represent each case as a vector of floating point numbers, which will be used as features in conventional machine learning algorithms. Organizations often use both R and Python, and data scientists and engineers fluent in one of these languages would often benefit by being able to apply their knowledge across the language gap. Our exercises will emphasize approaches to interoperability between these languages so that we may use both environments toward a common goal.
Although the R and Python ecosystems both offer a rich set of packages and functions for machine learning, when it comes to scaling and operationalizing, many practitioners are hindered by the limitations of available functions to handle big data efficiently. Our examples will use a variety of computing environments to scale scripts from single-nodes to elastic and distributed cloud services, including integration with Spark.
Time and location will be posted when available.
Return to: Hands-on Tutorials