0


与机器学习能力相关的决策规则类的熵属性

Entropy Properties of a Decision Rule Class in Connection with machine learning abilities
课程网址: http://videolectures.net/ida07_chervonenkis_ep/  
主讲教师: Alexey Chervonenkis
开课单位: 伦敦大学学院
开课时间: 2007-10-08
课程语种: 英语
中文简介:
许多机器学习方法都是基于经验风险最小化的思想。从某一集合中找出最符合训练集合数据的决策规则或模型。这一思想是基于大数定律:如果训练集足够大,经验风险就会收敛到实际风险。但是,如果决策规则或模型的类太大(在某种意义上),就会遇到拟合问题,该模型与训练集中的数据完全对应,但在新数据上会出现较大的误差。这是因为只有经验风险与实际风险的一致收敛,才能保证训练集和新数据上的最优模型行为的接近性。我们引入了一个固定样本序列上决策规则类的熵的概念,作为根据该类规则对该序列可能分类的数量的日志。定长L序列的最大熵决定了一致收敛的充分条件和相应的估计。但只有平均熵H(L)行为才决定了一致收敛的充要条件。条件是当序列长度为无穷大时,h(l)/l(每个符号的平均熵)应为零。如果条件不成立,则存在一组具有非零概率度量的对象,这样,该集合中任意有限长度的几乎所有序列都可以用类规则以所有可能的方式进行划分。很容易看出,在这种情况下,过度装配是不可避免的。对于实际的依赖关系而不是决策规则,也发现了类似的结果。
课程简介: Many methods of Machine Learning are based on the idea of empirical risk minimisation. It is to find a decision rule or a model from some set which most perfectly fits the data presented in the training set. This idea is based on the large number law: empirical risk converges to real risk, if the training set is large enough. But if the class of decision rules or models is too large (in some sense) one meets the problem of oferfitting, the model perfectly corresponds to the data presented in the training set, but shows large errors on new data. It is due to the fact that only uniform convergence of empirical risk to the real risk guarantees closeness of the optimal model behaviour on the training set and on the new data. We introduce the notion of entropy of a decision rule class over a fixed sample sequence as log of the number of possible classifications of the sequence by the rules of the class. Maximum entropy over sequences of a fixed length l determines sufficient condition of the uniform convergence and corresponding estimates. But only average entropy H(l) behaviour determine necessary and sufficient condition of the uniform convergence. The condition is that H(l) / l (average entropy per symbol) should go to zero when the sequence length goes to infinity. If the condition does not hold then there exists a set of objects with non zero probability measure, such that almost all sequences of arbitrary finite length from this set may be divided in all possible ways by the rules of the class. One can easily see, that in this case overfitting is inevitable. Similar results are found for real dependencies instead of decision rules.
关 键 词: 机器学习; 经验风险最小化; 大数定律; 非零概率测度
课程来源: 视频讲座网
最后编审: 2020-10-22:chenxin
阅读次数: 53