0


功能的最佳数量是多少?从学习理论视角解读

What is the Optimal Number of Features? A learning theoretic perspective
课程网址: http://videolectures.net/slsfs05_navot_wonfl/  
主讲教师: Amir Navot
开课单位: 耶路撒冷希伯来大学
开课时间: 2007-02-25
课程语种: 英语
中文简介:
在本文中,我们从统计机器学习的角度讨论了监督学习的特征选择问题。我们询问哪些特征子集将导致最佳分类准确性。很明显,如果统计模型是已知的,或者如果存在无限数量的训练样本,则任何附加特征只能提高准确性。然而,我们明确地表明,当训练集是有限的时,使用所有特征可能是次优的,即使所有特征都是独立的并且在标签上携带信息。我们分析地分析一个设置,并显示特征选择如何提高准确性。我们还针对一些具体示例找到了作为训练集大小的函数的最佳特征数。这种关于特征选择的观点不同于关注特定算法将选择完全不相关或冗余特征的概率的常见方法。
课程简介: In this paper we discuss the problem of feature selection for supervised learning from the standpoint of statistical machine learning. We inquire what subset of features will lead to the best classification accuracy. It is clear that if the statistical model is known, or if there are an unlimited number of training samples, any additional feature can only improve the accuracy. However, we explicitly show that when the training set is finite, using all the features may be suboptimal, even if all the features are independent and carry information on the label. We analyze one setting analytically and show how feature selection can increase accuracy. We also find the optimal number of features as a function of the training set size for a few specific examples. This perspective on feature selection is different from the common approach that focuses on the probability that a specific algorithm will pick a completely irrelevant or redundant feature.
关 键 词: 机器学习; 特征子集; 训练集
课程来源: 视频讲座网
最后编审: 2019-09-21:cwx
阅读次数: 78