0


部分排序特征集的容量控制

Capacity Control for Partially Ordered Feature Sets
课程网址: http://videolectures.net/ecmlpkdd09_ruckert_ccpofs/  
主讲教师: Ulrich Rückert
开课单位: 加州大学伯克利分校
开课时间: 2009-10-20
课程语种: 英语
中文简介:
部分排序的特征集自然出现在具有结构化实例的分类设置中。例如,当实例是图形并且要素表示子图出现检查时,可以根据“是子关系”关系对要素进行部分排序。我们研究了这些数据集中的冗余如何影响线性分类方法的容量控制行为。虽然容量一般不会降低,但我们为分布获得了更好的容量边界,这为较低级别的特征层次结构中的实例分配了较低的概率。对于项目集,子序列和子树,即使对于具有无限数量特征的数据,容量也是有限的。我们凭经验验证了这些结果,并表明线性分类器的有限容量使得欠拟合而不是过度拟合更突出的容量控制问题。为了避免欠拟合,我们提出了具有“弹性边缘”的子结构类,并且我们演示了如何将这些宽泛的要素类用于大型数据集。
课程简介: Partially ordered feature sets appear naturally in classification settings with structured instances. For example, when the instances are graphs and the features represent subgraph-occurrence-checks, the features can be partially ordered according to an “is subgraph of” relation. We investigate how the redundancy in such datasets affects the capacity control behavior of linear classification methods. While the capacity does not decrease in general, we derive better capacity bounds for distributions, which assign lower probabilities to instances in the lower levels of the feature hierarchy. For itemset, subsequence and subtrees, the capacity is finite even for data with an infinite number of features. We validate these results empirically and show that the limited capacity of linear classifiers makes underfitting rather than overfitting the more prominent capacity control problem. To avoid underfitting, we propose substructure classes with “elastic edges”, and we demonstrate how such broad feature classes can be used with large datasets.
关 键 词: 结构化实例; 数据集; 线性分类
课程来源: 视频讲座网
最后编审: 2019-03-27:lxf
阅读次数: 43