数字矢量与模块化网络优化组合的谱聚类方法A Spectral Clustering Approach to Optimally Combining Numerical Vectors with a Modular Network |
|
课程网址: | http://videolectures.net/kdd07_shiga_asca/ |
主讲教师: | Motoki Shiga |
开课单位: | 京都大学 |
开课时间: | 2007-09-14 |
课程语种: | 日语 |
中文简介: | 我们解决了使用网络聚类数值向量的问题。问题设置基本上等同于Wagstaff和Cardie [20]的约束聚类以及Basu等人的半监测聚类。 [2],但我们更关注的是两个异构数据源的最佳组合。该设置的应用是网页,其可以通过其内容进行数字矢量化,例如,术语频率,并且彼此超链接,显示网络。另一种典型应用是可以数值测量其行为的基因,并且可以从另一个数据源给出基因网络。我们首先通过平衡原始模块的簇大小来定义一个新的图聚类度量,我们称之为规范化网络模块化。然后,我们提出了一种新的聚类方法,该方法将数值向量聚类的成本与将归一化网络模块化最大化为频谱弛豫问题的成本进行了整合。我们的学习算法基于谱聚类,这使我们的问题成为特征值问题,并使用k均值进行最终聚类分配。我们的方法的一个显着优点是我们可以通过选择最小总成本来优化权重参数,以平衡来自给定数据的两个成本。我们使用各种数据集评估了我们提出的方法的性能,包括合成数据以及来自分子生物学的真实世界数据。实验结果表明,该方法对数值向量和网络聚类具有良好的聚类效果。 |
课程简介: | We address the issue of clustering numerical vectors with a network. The problem setting is basically equivalent to constrained clustering by Wagstaff and Cardie [20] and semisupervised clustering by Basu et al. [2], but our focus is more on the optimal combination of two heterogeneous data sources. An application of this setting is web pages which can be numerically vectorized by their contents, e.g. term frequencies, and which are hyperlinked to each other, showing a network. Another typical application is genes whose behavior can be numerically measured and a gene network can be given from another data source. We first define a new graph clustering measure which we call normalized network modularity, by balancing the cluster size of the original modularity. We then propose a new clustering method which integrates the cost of clustering numerical vectors with the cost of maximizing the normalized network modularity into a spectral relaxation problem. Our learning algorithm is based on spectral clustering which makes our issue an eigenvalue problem and uses k-means for final cluster assignments. A significant advantage of our method is that we can optimize the weight parameter for balancing the two costs from the given data by choosing the minimum total cost. We evaluated the performance of our proposed method using a variety of datasets including synthetic data as well as real-world data from molecular biology. Experimental results showed that our method is effective enough to have good results for clustering by numerical vectors and a network. |
关 键 词: | 网络聚类; 异构数据源; 网络模块化 |
课程来源: | 视频讲座网 |
最后编审: | 2019-05-09:lxf |
阅读次数: | 37 |