0


流数据挖掘:大数据视角

Stream Data Mining: A Big Data Perspective
课程网址: https://videolectures.net/videos/kdd2016_khan_data_perspective  
主讲教师: Latifur Khan
开课单位: KDD 2016研讨会
开课时间: 2025-02-04
课程语种: 英语
中文简介:
数据流是连续的数据流。数据流的示例包括网络流量、传感器数据、呼叫中心记录等。数据流展示了几个独特的属性,这些属性共同符合大数据的特征(即量、速度、多样性和准确性),并给数据流挖掘带来了挑战。在本次演讲中,我们将介绍如何在数据流中处理各种数据挖掘技术。大多数现有的数据流分类技术忽略了流数据的一个重要方面:新类的到达。我们解决了这个问题,并提出了一种数据流分类技术,该技术将一种新的类检测机制集成到传统分类器中,在新类实例的真实标签到达之前自动检测新类。当底层数据分布以流的形式演变时,在概念漂移的情况下,新的类检测问题变得更加具有挑战性。在本次演讲中,我们将展示如何在有限的标记训练数据的约束下快速正确地做出分类决策,并将其应用于真实的基准数据。此外,我们将介绍一些流分类应用,如自适应恶意代码检测、网站指纹识别、不断发展的内部威胁检测和文本流分类。这项研究部分由美国国家科学基金会、美国国家航空航天局、空军科学研究办公室(AFOSR)和雷神公司资助。
课程简介: Data streams are continuous flows of data. Examples of data streams include network traffic, sensor data, call center records and so on. Data streams demonstrate several unique properties that together conform to the characteristics of big data (i.e., volume, velocity, variety and veracity) and add challenges to data stream mining. In this talk we will present an organized picture on how to handle various data mining techniques in data streams. Most existing data stream classification techniques ignore one important aspect of stream data: arrival of a novel class. We address this issue and propose a data stream classification technique that integrates a novel class detection mechanism into traditional classifiers, enabling automatic detection of novel classes before the true labels of the novel class instances arrive. Novel class detection problem becomes more challenging in the presence of concept-drift, when the underlying data distributions evolve in streams. In this talk we will show how to make fast and correct classification decisions under this constraint with limited labeled training data and apply them to real benchmark data. In addition, we will present a number of stream classification applications such as adaptive malicious code detection, website fingerprinting, evolving insider threat detection and textual stream classification. This research was funded in part by NSF, NASA, Air Force Office of Scientific Research (AFOSR) and Raytheon.
关 键 词: 流数据挖掘; 大数据; 标记训练数据
课程来源: 视频讲座网
数据采集: 2025-04-06:liyq
最后编审: 2025-04-06:liyq
阅读次数: 8