0


SmartMiner:一种挖掘大规模Web使用数据的新框架

SmartMiner: A New Framework for Mining Large Scale Web Usage Data
课程网址: http://videolectures.net/www09_alibayir_sm/  
主讲教师: Ahmet Cosar; Ismail Hakki Toroslu; Guven Fidan; Murat Ali Bayir
开课单位: 水牛城大学
开课时间: 2009-05-20
课程语种: 英语
中文简介:
本文提出了一种用于web使用挖掘问题的智能挖掘框架,该框架利用链接信息生成精确的用户会话和频繁的导航模式。与基于时间和导航的方法中的简单会话概念不同,在这种方法中,会话是从服务器请求的web页面序列或在浏览器中查看的web页面序列,而在智能挖掘会话中,会话是在web图中遍历的一组路径,这些路径对应于用户在web页面之间的导航。我们将会话重构建模为一个新的图形问题,并利用一种新的算法Smart-SRA有效地解决了这一问题。对于模式发现阶段,我们开发了Apriori-All技术的有效版本,该技术使用web图的结构来提高性能。从我们对真实和模拟数据所做的实验中,我们观察到,与包括以前的会话构造方法在内的其他方法相比,Smart-Miner生成的web使用模式至少要精确30%。我们还研究了web服务器日志中有引用信息的影响,以表明不同版本的Smart-SRA会产生相似的结果。另一个新颖的工作是,我们通过使用Map-Reduce范式实现了Smart Miner框架的分布式版本,它支持处理属于多个web站点的大型web服务器日志。据我们所知,本文是第一次尝试为web使用挖掘问题提出如此大规模的框架。我们的结论是,通过使用我们的可伸缩框架,我们可以有效地处理属于多个web站点的tb级web服务器日志。
课程简介: In this paper, we propose a novel framework called Smart- Miner for web usage mining problem which uses link information for producing accurate user sessions and frequent navigation patterns. Unlike the simple session concepts in the time and navigation based approaches, where sessions are sequences of web pages requested from the server or viewed in the browser, in Smart-Miner sessions are set of paths traversed in the web graph that corresponds to users' navigations among web pages. We have modeled session reconstruction as a new graph problem and utilized a new algorithm, Smart-SRA, to solve this problem efficiently. For the pattern discovery phase, we have developed an efficient version of the Apriori-All technique which uses the structure of web graph to increase the performance. From the experiments that we have performed on both real and simulated data, we have observed that Smart-Miner produces at least 30%more accurate web usage patterns than other approaches including previous session construction methods. We have also studied the effect of having the referrer information in the web server logs to show that different versions of Smart-SRA produce similar results. Another novel work is that we have implemented distributed version of the Smart Miner framework by employing Map-Reduce paradigm which enables processing huge size web server logs belonging to multiple web sites. To the best of our knowledge this paper is the first attempt to propose such large scale framework forweb usage mining problem. We conclude that we can efficiently process terabytes of web server logs belonging to multiple web sites by employing our scalable framework.
关 键 词: 计算机科学; Web挖掘; 框架; 会话构建
课程来源: 视频讲座网
最后编审: 2019-10-29:lxf
阅读次数: 72