Online controlled experiments, also called A/B testing, have been established as the mantra for data-driven decision making in many web-facing companies. In recent years, there are emerging research works focusing on building the platform and scaling it up [34], best practices and lessons learned to obtain trustworthy results [19; 20; 23; 26], and experiment design techniques and various issues related to statistical inference and testing [6; 7; 8]. However, despite playing a central role in online controlled experiments, there is little published work on treating metric development itself as a data-driven process. In this paper, we focus on the topic of how to develop meaningful and useful metrics for online services in their online experiments, and show how data-driven techniques and criteria can be applied in metric development process. In particular, we emphasize two fundamental qualities for the goal metrics (or Overall Evaluation Criteria) of any online service: directionality and sensitivity. We share lessons on why these two qualities are critical, how to measure these two qualities of metrics of interest, how to develop metrics with clear directionality and high sensitivity by using approaches based on user behavior models and data-driven calibration, and how to choose the right goal metrics for the entire online services.

Filed under: Big Data