在判别模式中逻辑回归的挖掘Constrained Logistic Regression for Discriminative Pattern Mining |
|
课程网址: | http://videolectures.net/ecmlpkdd2011_al_stouhi_mining/ |
主讲教师: | Samir Al-Stouhi |
开课单位: | 美国韦恩州立大学 |
开课时间: | 2011-11-29 |
课程语种: | 英语 |
中文简介: | 分析多元数据集的差异是一个具有挑战性的问题。本课题较早的研究是通过发现分布差异的变化,这些变化要么表现为表示属性值对的联合的模式,要么表现为每个属性的单变量统计分析,以突出这些差异。所有这些方法都只关注某种形式的属性更改,并不隐式地考虑与数据关联的类标签。在本文中,我们提出了在一个有监督的场景中,数据分布的变化是用相应的分类边界的变化来测量的。我们提出了一种新的约束逻辑回归模型来测量多变量数据分布之间的这种差异,该模型是建立在预测模型的基础上的。利用约束模型,利用模型分类边界的变化来度量数据分布的差异。我们使用合成数据集和实际数据集,展示了与文献中可用的其他方法相比,本文提出的工作具有的优势。 |
课程简介: | Analyzing differences in multivariate datasets is a challenging problem. This topic was earlier studied by finding changes in the distribution differences either in the form of patterns representing conjunction of attribute value pairs or univariate statistical analysis for each attribute in order to highlight the differences. All such methods focus only on change in attributes in some form and do not implicitly consider the class labels associated with the data. In this paper, we pose the difference in distribution in a supervised scenario where the change in the data distribution is measured in terms of the change in the corresponding classification boundary. We propose a new constrained logistic regression model to measure such a difference between multivariate data distributions based on the predictive models induced on them. Using our constrained models, we measure the difference in the data distributions using the changes in the classification boundary of these models. We demonstrate the advantages of the proposed work over other methods available in the literature using both synthetic and real-world datasets. |
关 键 词: | 多元数据集; 分布差异; 单变量统计分析; 预测模型; 合成数据集; 实际数据集 |
课程来源: | 视频讲座网 |
最后编审: | 2019-08-25:chenxin |
阅读次数: | 82 |