社会标签涌现语义的相似性度量Evaluating Similarity Measures for Emergent Semantics of Social Tagging |
|
课程网址: | http://videolectures.net/www09_markines_esst/ |
主讲教师: | Benjamin Markines, Ciro Cattuto, Dominik Benz, Filippo Menczer, Andreas Hotho, Gerd Stumme |
开课单位: | 卡塞尔大学 |
开课时间: | 2009-05-20 |
课程语种: | 英语 |
中文简介: | 社交书签系统对于引导和维护语义Web应用程序正变得越来越重要。它们的涌现信息结构已被称为民间分类法。从这些系统中获取语义的一个关键问题是如何扩展和适应与民俗分类法相似的传统概念,以及哪些措施最适合诸如社区检测,导航支持,语义搜索,用户配置文件和本体学习之类的应用。在这里,我们建立了一个评估框架,以比较各种基于普通民间手术的相似性度量,这些相似性度量是从几个已建立的信息理论,统计和实践度量中得出的。我们的框架通常且对称地处理用户,标签和资源。为了评估的目的,我们关注标签之间和资源之间的相似性,并考虑使用不同的方法来汇总用户之间的注释。在比较了几种标签相似性度量来预测用户创建的标签关系的能力之后,我们通过基于WordNet和Open Directory Project的用户验证的语义代理提供了外部基础。我们还将调查可伸缩性问题。我们发现,具有跨用户分布的微聚合的互信息可产生最高的准确性,但不可扩展。通过协作聚合的每位用户投影,可通过增量计算提供最佳的可扩展方法。结果在资源和标签相似度上是一致的。 |
课程简介: | Social bookmarking systems are becoming increasingly important data sources for bootstrapping and maintaining Semantic Web applications. Their emergent information structures have become known as folksonomies. A key question for harvesting semantics from these systems is how to extend and adapt traditional notions of similarity to folksonomies, and which measures are best suited for applications such as community detection, navigation support, semantic search, user profiling and ontology learning. Here we build an evaluation framework to compare various general folksonomy-based similarity measures, which are derived from several established information-theoretic, statistical, and practical measures. Our framework deals generally and symmetrically with users, tags, and resources. For evaluation purposes we focus on similarity between tags and between resources and consider different methods to aggregate annotations across users. After comparing the ability of several tag similarity measures to predict user-created tag relations, we provide an external grounding by user-validated semantic proxies based on WordNet and the Open Directory Project. We also investigate the issue of scalability. We find that mutual information with distributional micro-aggregation across users yields the highest accuracy, but is not scalable; per-user projection with collaborative aggregation provides the best scalable approach via incremental computations. The results are consistent across resource and tag similarity. |
关 键 词: | 社交; 书签系统; 信息结构 |
课程来源: | 视频讲座网 |
最后编审: | 2020-05-16:杨雨(课程编辑志愿者) |
阅读次数: | 55 |