0


深度嵌入森林:具有深度嵌入特征的基于森林的服务

Deep Embedding Forest: Forest­based Serving with Deep Embedding Features
课程网址: http://videolectures.net/kdd2017_shan_deep_embedding_forest/  
主讲教师: Ying Shan
开课单位: 微软
开课时间: 2017-10-09
课程语种: 英语
中文简介:
深度神经网络(DNN)已经证明了从低级特征中提取高级嵌入向量的优越能力。尽管取得了成功,但由于多层密集矩阵的运行时计算成本高昂,服务时间仍然是瓶颈。基于GPGPU、FPGA或ASIC的服务系统需要额外的硬件,这些硬件不在大多数商业应用的主流设计中。相比之下,基于树或森林的模型由于服务成本低而被广泛采用,但在很大程度上依赖于精心设计的功能。这项工作提出了一个深度嵌入森林模型,该模型受益于两个世界的最佳结果。该模型由多个嵌入层和一个森林/树层组成。前者将高维(数十万到数百万)和异构的低级特征映射到低维(数千)向量,后者确保快速服务。在一个名为Deep Crossing的代表性DNN模型和两个基于森林/树的模型(包括XGBoost和LightGBM)的基础上,两步深度嵌入森林算法被证明与DNN算法相比实现了同等或略好的性能,在传统硬件上只需一小部分服务时间。在与本文中提出的一种称为部分模糊化的联合优化算法进行比较后,得出结论,两步深度嵌入森林实现了接近最优的性能。基于大型赞助搜索引擎的大规模数据集(多达10亿个样本)的实验证明了所提出模型的有效性。
课程简介: Deep Neural Networks (DNN) have demonstrated superior ability to extract high level embedding vectors from low level features. Despite the success, the serving time is still the bottleneck due to expensive run-time computation of multiple layers of dense matrices. GPGPU, FPGA, or ASIC-based serving systems require additional hardware that are not in the mainstream design of most commercial applications. In contrast, tree or forest-based models are widely adopted because of low serving cost, but heavily depend on carefully engineered features. This work proposes a Deep Embedding Forest model that benefits from the best of both worlds. The model consists of a number of embedding layers and a forest/tree layer. The former maps high dimensional (hundreds of thousands to millions) and heterogeneous low-level features to the lower dimensional (thousands) vectors, and the latter ensures fast serving. Built on top of a representative DNN model called Deep Crossing, and two forest/tree-based models including XGBoost and LightGBM, a two-step Deep Embedding Forest algorithm is demonstrated to achieve on-par or slightly better performance as compared with the DNN counterpart, with only a fraction of serving time on conventional hardware. After comparing with a joint optimization algorithm called partial fuzzification, also proposed in this paper, it is concluded that the two-step Deep Embedding Forest has achieved near optimal performance. Experiments based on large scale data sets (up to 1 billion samples) from a major sponsored search engine proves the efficacy of the proposed model.
关 键 词: 神经网络; 嵌入特征; 密集矩阵
课程来源: 视频讲座网
数据采集: 2023-05-29:chenxin01
最后编审: 2023-05-29:chenxin01
阅读次数: 32