Dynamic Bike Reposition: A Spatio-Temporal Reinforcement Learning Approach
Yexin Li (The Hong Kong University of Science and Technology); Yu Zheng (Urban Computing Business Unit, JD Finance); Qiang Yang (The Hong Kong University of Science and Technology)
Bike-sharing systems are widely deployed in many major cities, while the jammed and empty stations in them lead to severe customer loss. Currently, operators try to constantly reposition bikes among stations when the system is operating. However, how to efficiently reposition to minimize the customer loss in a long period remains unsolved. We propose a spatio-temporal reinforcement learning based bike reposition model to deal with this problem. Firstly, an inter-independent inner-balance clustering algorithm is proposed to cluster stations into groups. Clusters obtained have two properties, i.e. each cluster is inner-balanced and independent from the others. As there are many trikes repositioning in a very large system simultaneously, clustering is necessary to reduce the problem complexity. Secondly, we allocate multiple trikes to each cluster to conduct inner-cluster bike reposition. A spatio-temporal reinforcement learning model is designed for each cluster to learn a reposition policy in it, targeting at minimizing its customer loss in a long period. To learn each model, we design a deep neural network to estimate its optimal long-term value function, from which the optimal policy can be easily inferred. Besides formulating the model in a multi-agent way, we further reduce its training complexity by two spatio-temporal pruning rules. Thirdly, we design a system simulator based on two predictors to train and evaluate the reposition model. Experiments on real-world datasets from Citi Bike are conducted to confirm the effectiveness of our model.