0


时间取样的邮件过滤

Chronological Sampling for Email Filtering
课程网址: http://videolectures.net/um05_silver_csef/  
主讲教师: Daniel Silver
开课单位: 阿卡迪亚大学
开课时间: 2007-02-25
课程语种: 英语
中文简介:
电子邮件过滤的用户模型应根据适当的培训和测试集进行开发。文献中通常会出现 k-折叠交叉验证, 作为一种将新旧消息混合生成这些数据集的方法。我们表明, 这导致对电子邮件筛选器在对将来的邮件进行分类时的准确性的估计过于乐观, 因为测试集包含与训练集中的邮件类似的邮件的概率较高。我们建议使用 k 折叠时间顺序交叉验证方法, 该方法保留测试集中电子邮件的时间顺序。
课程简介: User models for email filtering should be developed from appropriate training and test sets. A k-fold cross-validation is commonly presented in the literature as a method of mixing old and new messages to produce these data sets. We show that this results in overly optimistic estimates of the email filter’s accuracy in classifying future messages because the test set has a higher probability of containing messages that are similar to those in the training set. We propose the k-fold chronological cross-validation method that preserves the chronology of the email messages in the test set.
关 键 词: 计算机科学; 时间取样; 邮件过滤
课程来源: 视频讲座网
最后编审: 2020-06-18:dingaq
阅读次数: 39