0


不确定数据分类识别模式的直接挖掘

Direct Mining of Discriminative Patterns for Classifying Uncertain Data
课程网址: http://videolectures.net/kdd2010_wang_dmdp/  
主讲教师: Jianyong Wang
开课单位: 清华大学
开课时间: 2010-10-01
课程语种: 英语
中文简介:
分类是数据挖掘中最重要的任务之一。与其他方法不同,关联分类试图找到输入分类数据中存在的所有频繁模式,其满足用户指定的最小支持和/或其他区分度量,例如最小置信度或信息增益。在特征选择过程之后,这些模式稍后用作基于规则的分类器的规则或用于支持;(SVM)分类器的训练特征,该特征选择过程通常试图以各种方式覆盖具有最多辨别模式的输入实例。还提出了几种算法来直接挖掘最具辨别力的模式而无需昂贵的特征选择。以往的实证结果表明,关联分类可以提供比许多数据集更好的分类精度。最近,对不确定数据进行了许多研究,其中不确定属性的领域不再具有某些值。而是采用概率分布函数来表示可能的值及其相应的概率。不确定性通常由噪声,测量限制或其他可能因素引起。最近提出了几种算法来解决不确定数据的分类问题,例如通过扩展传统的基于规则的分类器和决策树来处理不确定数据。在本文中,我们提出了一种新的算法uHARMONY,它从不确定数据中直接有效地挖掘判别模式作为分类特征/规则,以帮助训练SVM或基于规则的分类器。由于模式是直接从输入数据库中发现的,因此可以完全避免通常需要花费大量时间的特征选择。还提出了用于计算用作判别测量的开采模式的预期置信度的有效方法。实证结果表明,使用SVM分类器,我们的算法uHARMONY在不同的不确定度和不确定属性数下,在30个分类数据集上的精度平均提高了4%到10%,显着优于现有的不确定数据分类算法。
课程简介: Classification is one of the most essential tasks in data mining. Unlike other methods, associative classification tries to find all the frequent patterns existing in the input categorical data satisfying a user-specified minimum support and/or other discrimination measures like minimum confidence or information-gain. Those patterns are used later either as rules for rule-based classifier or training features for support vector machine (SVM) classifier, after a feature selection procedure which usually tries to cover as many as the input instances with the most discriminative patterns in various manners. Several algorithms have also been proposed to mine the most discriminative patterns directly without costly feature selection. Previous empirical results show that associative classification could provide better classification accuracy over many datasets. Recently, many studies have been conducted on uncertain data, where fields of uncertain attributes no longer have certain values. Instead probability distribution functions are adopted to represent the possible values and their corresponding probabilities. The uncertainty is usually caused by noise, measurement limits, or other possible factors. Several algorithms have been proposed to solve the classification problem on uncertain data recently, for example by extending traditional rule-based classifier and decision tree to work on uncertain data. In this paper, we propose a novel algorithm uHARMONY which mines discriminative patterns directly and effectively from uncertain data as classification features/rules, to help train either SVM or rule-based classifier. Since patterns are discovered directly from the input database, feature selection usually taking a great amount of time could be avoided completely. Effective method for computation of expected confidence of the mined patterns used as the measurement of discrimination is also proposed. Empirical results show that using SVM classifier our algorithm uHARMONY outperforms the state-of-the-art uncertain data classification algorithms significantly with 4% to 10% improvements on average in accuracy on 30 categorical datasets under varying uncertain degree and uncertain attribute number.
关 键 词: 数据挖掘; 关联分类; 向量机
课程来源: 视频讲座网
最后编审: 2019-05-11:cwx
阅读次数: 85