0


BioSnowball:维基的自动填充

BioSnowball: Automated Population of Wikis
课程网址: http://videolectures.net/kdd2010_nie_bsap/  
主讲教师: Zaiqing Nie
开课单位: 微软公司
开课时间: 2010-10-01
课程语种: 英语
中文简介:
互联网用户经常需要找到感兴趣的人的传记和事实。维基百科已经成为名人传记和事实的第一站。然而,由于维基百科的中立观点(npov)编辑政策,它只能为名人提供信息。在本文中,我们提出了一个名为biosnowball的集成引导框架来自动总结网络,为任何一个有适度网络存在的人生成维基百科风格的页面。在Biosnowball中,以马尔可夫逻辑网络(MLN)为基础的统计模型,在一个单一的综合训练和推理过程中对传记进行排序和事实提取。引导框架只从少量种子开始,并迭代地查找新的事实和传记。由于网络上的传记段落是由最重要的事实组成的,与文献中的去耦方法相比,我们的联合摘要模型可以提高事实提取和传记排序的准确性。在一个小标记数据集和一个真实的网络规模数据集上的实验结果表明了生物信息球的有效性。我们也从经验上证明了生物信息球优于去耦方法。
课程简介: Internet users regularly have the need to find biographies and facts of people of interest. Wikipedia has become the first stop for celebrity biographies and facts. However, Wikipedia can only provide information for celebrities because of its neutral point of view (NPOV) editorial policy. In this paper we propose an integrated bootstrapping framework named BioSnowball to automatically summarize the Web to generate Wikipedia-style pages for any person with a modest web presence. In BioSnowball, biography ranking and fact extraction are performed together in a single integrated training and inference process using Markov Logic Networks (MLNs) as its underlying statistical model. The bootstrapping framework starts with only a small number of seeds and iteratively finds new facts and biographies. As biography paragraphs on the Web are composed of the most important facts, our joint summarization model can improve the accuracy of both fact extraction and biography ranking compared to decoupled methods in the literature. Empirical results on both a small labeled data set and a real Web-scale data set show the effectiveness of BioSnowball. We also empirically show that BioSnowball outperforms the decoupled methods.
关 键 词: 互联网用户; 维基百科; 马尔可夫逻辑网络; 引导框架
课程来源: 视频讲座网
最后编审: 2020-04-13:chenxin
阅读次数: 47