双聚类关联分析方法An Association Analysis Approach to Biclustering |
|
课程网址: | http://videolectures.net/kdd09_pandey_aaab/ |
主讲教师: | Gaurav Pandey |
开课单位: | 明尼苏达大学 |
开课时间: | 2009-09-14 |
课程语种: | 英语 |
中文简介: | 发现双轮攻击器是对生物学等不同领域的实值数据集进行的一种重要分析, 它表示在数据集中所有事务的一个子集中显示一致值的项组。在这些数据集中, 已经提出了几种算法来寻找不同类型的二重器。然而, 这些算法无法详尽地搜索所有可能的双人控制器的空间。关联分析中的模式挖掘算法也基本上产生双相似度, 因为模式由所有事务的子集支持的项组成。然而, 关联分析中开发的众多技术的一个主要局限性是, 它们只能分析具有二进制和/或分类变量的数据集, 而它们在实值数据集中的应用往往涉及到一些有损性转换, 如属性的离散化或二值化。本文提出了一种新的关联分析框架, 用于从这样的数据集中对 "范围支持" 模式进行详尽有效的挖掘。一方面, 这一框架减少了基于二值和离散的方法所产生的信息损失, 另一方面, 它能够彻底发现连贯的双色方法。通过对这些算法从微阵列数据中得到的构成模式的基因的细胞功能的相似性的评估, 将我们的框架的性能与两种标准的双线算法进行了比较。这些实验表明, 我们的框架所发现的实值模式通过小的生物有趣的函数类得到了更好的丰富。此外, 通过具体的示例, 我们演示了 rap 框架发现常用的自行车算法 isa 所没有发现的功能丰富模式的能力。 |
课程简介: | The discovery of biclusters, which denote groups of items that show coherent values across a subset of all the transactions in a data set, is an important type of analysis performed on real-valued data sets in various domains, such as biology. Several algorithms have been proposed to find different types of biclusters in such data sets. However, these algorithms are unable to search the space of all possible biclusters exhaustively. Pattern mining algorithms in association analysis also essentially produce biclusters as their result, since the patterns consist of items that are supported by a subset of all the transactions. However, a major limitation of the numerous techniques developed in association analysis is that they are only able to analyze data sets with binary and/or categorical variables, and their application to real-valued data sets often involves some lossy transformation such as discretization or binarization of the attributes. In this paper, we propose a novel association analysis framework for exhaustively and efficiently mining "range support" patterns from such a data set. On one hand, this framework reduces the loss of information incurred by the binarization- and discretization-based approaches, and on the other, it enables the exhaustive discovery of coherent biclusters. We compared the performance of our framework with two standard biclustering algorithms through the evaluation of the similarity of the cellular functions of the genes constituting the patterns/biclusters derived by these algorithms from microarray data. These experiments show that the real-valued patterns discovered by our framework are better enriched by small biologically interesting functional classes. Also, through specific examples, we demonstrate the ability of the RAP framework to discover functionally enriched patterns that are not found by the commonly used biclustering algorithm ISA. |
关 键 词: | 双聚类; 挖掘算法; 源代码和数据集 |
课程来源: | 视频讲座网 |
最后编审: | 2021-12-23:liyy |
阅读次数: | 106 |