Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix
Huizhi Xie*, Netflix; Juliette Aurisset, Netflix
Controlled experiments are widely regarded as the most scientiﬁc way to establish a true causal relationship between product changes and their impact on business metrics. Many technology companies rely on such experiments as their main data-driven decision-making tool. The sensitivity of a controlled experiment refers to its ability to detect diﬀerences in business metrics due to product changes. At Netﬂix, with tens of millions of users, increasing the sensitivity of con-trolled experiments is critical as failure to detect a small eﬀect, either positive or negative, can have a substantial revenue impact. This paper focuses on methods to increase sensitivity by reducing the sampling variance of business metrics. We deﬁne Netﬂix business metrics and share context around the critical need for improved sensitivity. We review popular variance reduction techniques that are broadly applicable to any type of controlled experiment and metric. We describe an innovative implementation of stratiﬁed sampling at Netﬂix where users are assigned to experiments in real time and discuss some surprising challenges with the implementation. We conduct case studies to compare these variance reduction techniques on a few Netﬂix datasets. Based on the empirical results, we recommend to use post-assignment variance reduction techniques such as post stratiﬁcation  and CUPED  instead of at-assignment variance reduction techniques such as stratiﬁed sampling  in large-scale controlled experiments.
Filed under: Big Data