0


基于层次均值漂移的分类检测

Category Detection Using Hierarchical Mean Shift
课程网址: http://videolectures.net/kdd09_wong_cduhms/  
主讲教师: Weng-Keen Wong
开课单位: 俄勒冈州立大学
开课时间: 2009-08-14
课程语种: 英语
中文简介:
监视,监视,科学发现和数据清理中的许多应用需要识别异常。尽管已经开发了许多方法来识别统计上显着的异常,但更困难的任务是识别既有趣又有统计学意义的异常。类别检测是机器学习的新兴领域,可以使用“人在循环”方法来帮助解决这个问题。在此交互式设置中,算法要求用户在现有类别下标记查询数据点,或者声明查询数据点属于先前未发现的类别。类别检测的目标是在尽可能少的查询中引起用户注意来自数据中每个类别的代表性数据点。在具有不平衡类别的数据集中,主要挑战在于识别罕见类别或异常;因此,该任务通常被称为{\罕见}类别检测。我们提出了一种基于分层均值漂移的稀有类别检测的新方法。在我们的方法中,通过在数据上以不断增加的带宽重复应用均值漂移来创建层次结构。此层次结构允许我们识别不同比例的数据集中的异常,然后将其作为查询提供给用户。该方法优于现有方法的主要优点是,它不需要任何数据集属性的知识,例如类别的总数或类别的先验概率。现实世界数据集的结果表明,我们的分层均值平移方法比以前的技术表现更好。
课程简介: Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to identify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statistically significant. Category detection is an emerging area of machine learning that can help address this issue using a "human-in-the-loop" approach. In this interactive setting, the algorithm asks the user to label a query data point under an existing category or declare the query data point to belong to a previously undiscovered category. The goal of category detection is to bring to the user's attention a representative data point from each category in the data in as few queries as possible. In a data set with imbalanced categories, the main challenge is in identifying the rare categories or anomalies; hence, the task is often referred to as {\it rare} category detection. We present a new approach to rare category detection based on hierarchical mean shift. In our approach, a hierarchy is created by repeatedly applying mean shift with an increasing bandwidth on the data. This hierarchy allows us to identify anomalies in the data set at different scales, which are then posed as queries to the user. The main advantage of this methodology over existing approaches is that it does not require any knowledge of the dataset properties such as the total number of categories or the prior probabilities of the categories. Results on real-world data sets show that our hierarchical mean shift approach performs consistently better than previous techniques.
关 键 词: 识别异常; 统计学; 类别检测
课程来源: 视频讲座网
最后编审: 2019-05-10:lxf
阅读次数: 25