0


用于改进证券欺诈检测的关系数据预处理技术

Relational Data Pre-Processing Techniques for Improved Securities Fraud Detection
课程网址: http://videolectures.net/kdd07_fast_rdppt/  
主讲教师: Andrew Fast
开课单位: 马萨诸塞大学
开课时间: 2007-08-14
课程语种: 英语
中文简介:
商业数据集通常是大型的,关系型的和动态的。它们包含许多人,地点,事物,事件及其相互作用的记录。这些数据集很少适合于知识发现,并且它们通常包含其含义在数据的不同子集之间变化的变量。我们描述了马萨诸塞州阿默斯特大学和全国证券交易商协会(NASD)开展的协作分析项目如何解决这些挑战。我们描述了几种用于数据预处理的方法,我们应用这些方法来转换描述几乎整个美国证券业的大型动态关系数据集,并且我们展示了这些方法如何使数据集适合于学习统计关系模型。为了更好地利用社会结构,我们首先应用已知的合并和链接形成技术,将个人与分支机构位置相关联。此外,我们开发了一种创新技术,通过利用动态的就业历史来推断专业协会。最后,我们应用归一化技术来创建合适的类标签,以调整数据中的空间,时间和其他异质性。我们展示了这些预处理技术如何结合起来,为学习欺诈活动的高性能统计模型提供必要的基础。
课程简介: Commercial datasets are often large, relational, and dynamic. They contain many records of people, places, things, events and their interactions over time. Such datasets are rarely structured appropriately for knowledge discovery, and they often contain variables whose meanings change across different subsets of the data. We describe how these challenges were addressed in a collaborative analysis project undertaken by the University of Massachusetts Amherst and the National Association of Securities Dealers (NASD). We describe several methods for data preprocessing that we applied to transform a large, dynamic, and relational dataset describing nearly the entirety of the U.S. securities industry, and we show how these methods made the dataset suitable for learning statistical relational models. To better utilize social structure, we first applied known consolidation and link formation techniques to associate individuals with branch office locations. In addition, we developed an innovative technique to infer professional associations by exploiting dynamic employment histories. Finally, we applied normalization techniques to create a suitable class label that adjusts for spatial, temporal, and other heterogeneity within the data. We show how these pre-processing techniques combine to provide the necessary foundation for learning high-performing statistical models of fraudulent activity.
关 键 词: 商业数据; 数据预处理; 美国证券业
课程来源: 视频讲座网
最后编审: 2020-07-13:yumf
阅读次数: 61