0


狄利克雷聚合:无监督学习的比例数据的最佳度量

Dirichlet Aggregation: Unsupervised Learning towards an Optimal Metric for Proportional Data
课程网址: http://videolectures.net/icml07_wang_daul/  
主讲教师: Hua-Yan Wang
开课单位: 北京大学
开课时间: 2007-06-23
课程语种: 汉简
中文简介:
比例数据(标准化柱状图)经常出现在各个领域,它们可以数学上抽象为几何单纯形中的点。在分类和信息检索等许多应用中,适当的单纯形距离度量具有重要意义。在本文中,我们开发了一个新的框架来学习单纯形上的最优度量。我们方法的主要特点包括:1)处理容器/维度之间相关性的灵活性;2)广泛适用,不局限于特殊背景;3)与现有的传统本地方法相比,真正的全球解决方案。我们方法的技术本质是将参数分布拟合到单纯形中观察到的经验数据。分布参数化之间的单纯形顶点密切关系,这是通过最大化的可能性观察到的数据。然后,这些相似性在单纯形上引入一个度量,定义为配备有从单纯形顶点相似性导出的地面距离的地球移动器的距离。
课程简介: Proportional data (normalized histograms) have been frequently occurring in various areas, and they could be mathematically abstracted as points residing in a geometric simplex. A proper distance metric on this simplex is of importance in many applications including classification and information retrieval. In this paper, we develop a novel framework to learn an optimal metric on the simplex. Ma jor features of our approach include: 1) its flexibility to handle correlations among bins/dimensions; 2) widespread applicability without being limited to ad hoc backgrounds; and 3) a "real" global solution in contrast to existing traditional local approaches. The technical essence of our approach is to fit a parametric distribution to the observed empirical data in the simplex. The distribution is parameterized by affinities between simplex vertices, which is learned via maximizing likelihood of observed data. Then, these affinities induce a metric on the simplex, defined as the earth mover's distance equipped with ground distances derived from simplex vertex affinities.
关 键 词: 比例数据; 参数分布; 信息检索
课程来源: 视频讲座网
最后编审: 2019-12-05:lxf
阅读次数: 50