0


基于深度值网络的多司机订单调度方法

A Deep Value-network Based Approach for Multi-Driver Order Dispatching
课程网址: http://videolectures.net/kdd2019_tang_qin_zhang/  
主讲教师: Xiaocheng Tang
开课单位: 滴滴出行科技有限公司
开课时间: 2020-03-02
课程语种: 英语
中文简介:
最近关于共乘命令调度的工作强调了在调度过程中考虑空间和时间动态对提高交通系统效率的重要性。与此同时,深度强化学习已发展到在许多领域实现超人绩效的程度。在这项工作中,我们提出了一种基于深度强化学习的订单调度解决方案,并在DiDi的骑乘调度平台上进行了大规模的在线a/B测试,以表明所提出的方法在驾驶员总收入和用户体验相关指标上都取得了显著改善。 特别是,我们将骑乘调度问题建模为半马尔可夫决策过程,以考虑调度行动的时间方面。为了提高非线性函数逼近器(如神经网络)值迭代的稳定性,我们提出了具有新型分布式状态表示层的小脑值网络(CVNet)。我们进一步推导了CVNet的正则化策略评估方案,该方案惩罚较大的Lipschitz值常数网络,以增强对抗扰动和噪声的鲁棒性。最后,我们将各种迁移学习方法应用于CVNet,以提高跨多个城市的学习适应性和效率。我们基于实际调度数据进行广泛的离线模拟,并通过DiDi平台进行在线AB测试。结果表明,CVNet始终优于其他最近提出的调度方法。我们最后表明,通过有效使用迁移学习,可以进一步提高绩效。
课程简介: Recent works on ride-sharing order dispatching have highlighted the importance of taking into account both the spatial and temporal dynamics in the dispatching process for improving the transportation system efficiency. At the same time, deep reinforcement learning has advanced to the point where it achieves superhuman performance in a number of fields. In this work, we propose a deep reinforcement learning based solution for order dispatching and we conduct large scale online A/B tests on DiDi’s ride-dispatching platform to show that the proposed method achieves significant improvement on both total driver income and user experience related metrics. In particular, we model the ride dispatching problem as a Semi Markov Decision Process to account for the temporal aspect of the dispatching actions. To improve the stability of the value iteration with nonlinear function approximators like neural networks, we propose Cerebellar Value Networks (CVNet) with a novel distributed state representation layer. We further derive a regularized policy evaluation scheme for CVNet that penalizes large Lipschitz constant of the value network for additional robustness against adversarial perturbation and noises. Finally, we adapt various transfer learning methods to CVNet for increased learning adaptability and efficiency across multiple cities. We conduct extensive offline simulations based on real dispatching data as well as online AB tests through the DiDi’s platform. Results show that CVNet consistently outperforms other recently proposed dispatching methods. We finally show that the performance can be further improved through the efficient use of transfer learning.
关 键 词: 数据科学; 基于深度值网络; 多司机订单调度方法; 各种迁移学习方法
课程来源: 视频讲座网
数据采集: 2022-09-16:cyh
最后编审: 2022-09-19:cyh
阅读次数: 42