极端多标签分类的近似最近邻搜索AnnexML: Approximate Nearest Neighbor Search for Extreme Multilabel Classification |
|
课程网址: | http://videolectures.net/kdd2017_tagami_annexML/ |
主讲教师: | Yukihiro Tagami |
开课单位: | 雅虎!研究日本 |
开课时间: | 2017-10-09 |
课程语种: | 英语 |
中文简介: | 极端多标签分类方法已广泛应用于网页标记和产品推荐等Web规模的分类任务中。本文提出了一种新的图嵌入方法——附件ML(AnnexML)。在训练步骤中,AnnexML构造标签向量的k近邻图,并尝试在嵌入空间中再现图结构。使用近似最近邻搜索方法有效地探索嵌入空间中学习到的k近邻图,从而有效地进行预测。我们对几个大规模的真实世界数据集进行了评估,并将我们的方法与最近最先进的方法进行了比较。实验结果表明,该方法可以显著提高预测精度,特别是在标签空间较大的数据集上。此外,AnnexML改善了预测时间和准确性之间的权衡。在相同精度水平下,AnnexML的预测时间比SLEEC快58倍,SLEEC是目前最先进的基于嵌入的方法。 |
课程简介: | Extreme multi-label classification methods have been widely used in Web-scale classification tasks such as Web page tagging and product recommendation. In this paper, we present a novel graph embedding method called AnnexML. At training step, AnnexML constructs k-nearest neighbor graph of the label vectors and attempts to reproduce the graph structure in the embedding space. The prediction is efficiently performed by using an approximate nearest neighbor search method which efficiently explores the learned k-nearest neighbor graph in the embedding space. We conducted evaluations on several large-scale real-world data sets and compared our method with recent state-of-the-art methods. Experimental results show our AnnexML can significantly improve prediction accuracy, especially on data sets that have larger label space. In addition, AnnexML improves the trade-off between prediction time and accuracy. At the same level of accuracy, the prediction time of AnnexML was up to 58 times faster than that of SLEEC, which is a state-of-the-art embedding-based method. |
关 键 词: | 标签分类; 网页标记; 产品推荐 |
课程来源: | 视频讲座网 |
数据采集: | 2023-05-15:chenxin01 |
最后编审: | 2023-05-18:chenxin01 |
阅读次数: | 28 |