FastXML:一种快速、准确、稳定的极端多标签学习树分类器FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning |
|
课程网址: | http://videolectures.net/kdd2014_prabhu_fast_xml/ |
主讲教师: | Yashoteja Prabhu |
开课单位: | 印度理工学院 |
开课时间: | 2014-10-08 |
课程语种: | 英语 |
中文简介: | 极端多标签分类的目的是学习一种分类器,该分类器可以使用大型标签集中的标签中最相关的子集自动标记数据点。极端的多标签分类是一个重要的研究问题,因为它不仅可以处理具有多个标签的应用程序,而且还可以重新排列排名问题,相对于现有配方具有某些优势。我们在本文中的目标是开发一种极限的多标签分类器,该分类器比最新的多标签随机森林(MLRF)算法[2]和用于子线性排名的标签划分更快,训练更准确,预测更准确(LPSR)算法[35]。 MLRF和LPSR学习一个层次结构来处理大量标签,但是为了学习该层次结构,需要优化与任务无关的措施,例如Gini索引或聚类错误。我们提出的FastXML算法通过直接优化基于nDCG的排名损失函数来实现更高的精度。我们还开发了一种交替最小化算法,可以有效地优化所提出的公式。实验表明,FastXML可以在一个标准的桌面上使用一个核在8个小时内使用一个内核,而在一小时内使用多个核在一个台式机上可以解决超过一百万个标签的问题。 p> |
课程简介: | The objective in extreme multi-label classification is to learn a classifier that can automatically tag a data point with the most relevant subset of labels from a large label set. Extreme multi-label classification is an important research problem since not only does it enable the tackling of applications with many labels but it also allows the reformulation of ranking problems with certain advantages over existing formulations. Our objective, in this paper, is to develop an extreme multi-label classifier that is faster to train and more accurate at prediction than the state-of-the-art Multi-label Random Forest (MLRF) algorithm [2] and the Label Partitioning for Sub-linear Ranking (LPSR) algorithm [35]. MLRF and LPSR learn a hierarchy to deal with the large number of labels but optimize task independent measures, such as the Gini index or clustering error, in order to learn the hierarchy. Our proposed FastXML algorithm achieves significantly higher accuracies by directly optimizing an nDCG based ranking loss function. We also develop an alternating minimization algorithm for efficiently optimizing the proposed formulation. Experiments reveal that FastXML can be trained on problems with more than a million labels on a standard desktop in eight hours using a single core and in an hour using multiple cores. |
关 键 词: | 随机算法; 多标签分类 |
课程来源: | 视频讲座网 |
数据采集: | 2020-11-23:zyk |
最后编审: | 2020-12-15:chenxin |
阅读次数: | 114 |