0


挖掘大型多方面数据:算法和应用

Mining Large Multi-Aspect Data: Algorithms and Applications
课程网址: http://videolectures.net/kdd2017_papalexakis_multi_aspect_data/  
主讲教师: Evangelos Papalexakis
开课单位: 加州大学河滨分校
开课时间: 2017-10-09
课程语种: 英语
中文简介:
当一个人读到“苹果”这个词时,他的大脑活动是什么样的?它与同一个人(甚至不同的人)在阅读有关飞机的内容时的活动有何不同?我们如何识别人脑对不同语义概念活跃的部分?在看似无关的环境中,我们如何建模和挖掘网络上的知识(例如,主谓宾三元组),以便找到隐藏的新兴模式?我们对这两个问题(以及更多问题)提出的答案是通过桥接信号处理和大规模多方面数据挖掘。具体来说,大脑中的语言以及许多其他真实的单词过程和现象具有不同的方面,例如大脑活动的各种语义刺激(苹果或飞机)、我们分析其活动的特定人以及测量技术。在上面的例子中,“苹果”高度激活的大脑区域可能与“飞机”的大脑区域不同。然而,活动的每个方面都是同一潜在物理现象的信号:人脑中的语言理解。考虑到大脑活动的各个方面,可以产生更准确的模型,从而推动科学发现(例如,识别语义一致的大脑区域)。除了上述神经语义学应用之外,多方面数据还出现在许多场景中,例如在网络上挖掘知识,其中数据中的不同方面包括知识库中的实体以及它们之间的链接或这些实体的搜索引擎结果,以及多方面图挖掘,以多视图社交网络为例,我们观察人们在不同通信方式下的社交互动,并利用通信的各个方面来更准确地提取社区。我们工作的主要论点是,许多现实世界的问题,例如前面提到的问题,都受益于联合建模和分析与我们寻求揭示的潜在现象相关的多方面数据。在本文中,我们开发可扩展和可解释的算法来挖掘多方面的大数据,重点是张量分解。我们提出了在扩展和并行化张量分解以及评估其结果质量方面的算法进展,这使得能够对最先进的技术无法支持的多方面数据进行分析。表明,我们提出的方法将最先进的速度提高了两个数量级,并能够评估 100 倍大张量的质量。此外,我们还展示了专注于神经语义学、社交网络和网络的多方面数据应用的结果,证明了多方面建模和挖掘的有效性。最后,我们提出了将信号处理和数据科学连接到现实世界应用的未来愿景。
课程简介: What does a person’s brain activity look like when they read the word apple? How does it differ from the activity of the same (or even a different person) when reading about an airplane? How can we identify parts of the human brain that are active for different semantic concepts? On a seemingly unrelated setting, how can we model and mine the knowledge on web (e.g., subject-verb-object triplets), in order to find hidden emerging patterns? Our proposed answer to both problems (and many more) is through bridging signal processing and large-scale multi-aspect data mining. Specifically, language in the brain, along with many other real-word processes and phenomena, have different aspects, such as the various semantic stimuli of the brain activity (apple or airplane), the particular person whose activity we analyze, and the measurement technique. In the above example, the brain regions with high activation for “apple” will likely differ from the ones for “airplane”. Nevertheless, each aspect of the activity is a signal of the same underlying physical phenomenon: language understanding in the human brain. Taking into account all aspects of brain activity results in more accurate models that can drive scientific discovery (e.g, identifying semantically coherent brain regions). In addition to the above Neurosemantics application, multi-aspect data appear in numerous scenarios such as mining knowledge on the web, where different aspects in the data include entities in a knowledge base and the links between them or search engine results for those entities, and multi-aspect graph mining, with the example of multi-view social networks, where we observe social interactions of people under different means of communication, and we use all aspects of the communication to extract communities more accurately. The main thesis of our work is that many real-world problems, such as the aforementioned, benefit from jointly modeling and analyzing the multi-aspect data associated with the underlying phenomenon we seek to uncover. In this thesis we develop scalable and interpretable algorithms for mining big multiaspect data, with emphasis on tensor decomposition. We present algorithmic advances on scaling up and parallelizing tensor decomposition and assessing the quality of its results, that have enabled the analysis of multi-aspect data that the state-of-the-art could not support. Indicatively, our proposed methods speed up the state-of-the-art by up to two orders of magnitude, and are able to assess the quality for 100 times larger tensors. Furthermore, we present results on multi-aspect data applications focusing on Neurosemantics and Social Networks and the Web, demonstrating the effectiveness of multiaspect modeling and mining. We conclude with our future vision on bridging Signal Processing and Data Science for real-world applications.
关 键 词: 数据挖掘; 计算机科学; 数据科学
课程来源: 视频讲座网
数据采集: 2023-12-25:wujk
最后编审: 2024-01-19:liyy
阅读次数: 12