0


用于序列挖掘的分类驱动集总

Taxonomy-Driven Lumping for Sequence Mining
课程网址: http://videolectures.net/ecmlpkdd09_gionis_tdl/  
主讲教师: Aris Gionis
开课单位: 雅虎公司
开课时间: 2009-10-20
课程语种: 英语
中文简介:
鉴于事件的分类和这些事件序列的数据集,我们研究了找到有效和有效的方法来产生序列的紧凑表示的问题。我们使用Markov模型对序列进行建模,其状态对应于所提供的分类中的节点,并且每个状态表示相应节点下的子树中的事件。通过将观察到的事件集中到与分类中的内部节点相对应的状态,我们允许更容易理解和可视化的更紧凑的模型,代价是数据可能性降低。我们正式定义和表征我们的问题,并提出可扩展的搜索在两个相互冲突的目标之间找到良好折衷的方法:最大化数据可能性,并最小化模型复杂性。我们在Taxomo中实现这些想法,Taxomo是一个分类驱动的建模器,我们在两个不同的领域中应用,查询日志挖掘和轨迹挖掘。
课程简介: Given a taxonomy of events and a dataset of sequences of these events, we study the problem of finding efficient and effective ways to produce a compact representation of the sequences. We model sequences with Markov models whose states correspond to nodes in the provided taxonomy, and each state represents the events in the subtree under the corresponding node. By lumping observed events to states that correspond to internal nodes in the taxonomy, we allow more compact models that are easier to understand and visualize, at the expense of a decrease in the data likelihood. We formally define and characterize our problem, and propose a scalable search method for finding a good trade-off between two conflicting goals: maximizing the data likelihood, and minimizing the model complexity. We implement these ideas in Taxomo, a taxonomy-driven modeler, which we apply in two different domains, query-log mining and mining of trajectories.
关 键 词: 数据集; 紧凑表示; 节点
课程来源: 视频讲座网
最后编审: 2020-04-07:chenxin
阅读次数: 44