0


因果发现简介:贝叶斯网络方法

Introduction to causal discovery: A Bayesian Networks approach
课程网址: http://videolectures.net/ecmlpkdd2011_tsamardinos_discovery/  
主讲教师: Ioannis Tsamardinos
开课单位: 研究与技术基金会
开课时间: 2011-11-29
课程语种: 英语
中文简介:
本教程介绍了使用代表条件独立模型的图表从观测数据中发现因果发现的基本假设和技术。它首先介绍了因果关系的发现,如因果马尔科夫状态,忠实的条件,和d分离标准的基本理论,图形模型来表示因果关系,如因果贝叶斯网络,最大祖图和局部祖图形。它提供了原型和最先进的算法,如PC,FCI和HITON,用于从数据中学习此类模型(全局学习)或此类模型的一部分(本地学习)。本教程还讨论了因果关系与特征选择的关联以及基于因果的特征选择技术。最后,介绍了因果发现算法应用的案例研究,重点是生物医学数据的应用。本教程是为广大受众设计的,具有通用机器学习,数据挖掘和统计背景。本教程旨在:*使观众熟悉该领域,并增加对因果感应问题的理解,因为它涉及日常数据分析任务;让观众熟悉代表变量之间因果关系的形式主义,并提供思考因果关系和因果发现的语言*增加对因果归纳的基本原则的理解,熟悉该领域的原型和最先进的算法;能够正确解释此类算法的输出*在实际数据挖掘,机器学习或统计分析任务中启用因果发现算法的正确应用更具体地说,它旨在澄清以下对每个研究人员和数据从业者都很重要的问题分析:*虽然大多数机器学习技术在许多领域中经常假设相同且独立分布的数据(iid数据),但数据不遵循这一假设。数据可以是实验性的(例如,在敲除基因后)或在选择偏倚下,例如在对照研究的情况下。该教程有助于理解由于域的因果结构而产生的差异及其产生方式*通常情况下,分析的目的是识别重要变量(也称为特征选择),称为生物学中的生物标记,医学中的风险因素等教程有助于理解所选变量与因果结构之间的联系。 *通常情况下,预测模型不是最终目标,而是目标是控制系统,例如治疗患者,设计具有所需特性的药物等。因果建模和归纳对于构建机器学习是必要的可以预测正在被操纵的系统中的结果的模型(例如,在不同的实验条件下)。 *本教程提供了对标准(非因果)贝叶斯网络的更深入理解,这些网络已被证明在机器学习,人工智能不确定性推理和决策支持系统方面具有20多年的重要性。 *因果发现已经导致了重要的发现,因此对这些方法及其潜力的了解对于未来的数据分析师来说非常重要。教程大纲如下所示:1。代表因果关系2.从数据中诱导因果模型3.案例研究和实际问题
课程简介: The tutorial presents an introduction to basic assumptions and techniques for causal discovery from observational data with the use of graphs that represent conditional independence models. It first presents the basic theory of causal discovery such as the Causal Markov Condition, the Faithfulness Condition, and the d-separation criterion, graphical models for representing causality such as Causal Bayesian Networks, Maximal Ancestral Graphs and Partial Ancestral Graphs. It presents prototypical and state-of-the-art algorithms such as the PC, FCI and HITON for learning such models (global learning) or parts of such models (local learning) from data. The tutorial also discusses the connections of causality to feature selection and present causal-based feature selection techniques. Finally, case-studies of applications of causal discovery algorithms are presented, with a focus on applications to biomedical data. The tutorial is designed for a wide audience with a general Machine Learning, Data Mining, and Statistical background.. The tutorial aims to: * Familiarize the audience with the field and increase comprehension of the problem of causal induction as it pertains to everyday data analysis tasks; familiarize the audience with formalisms that represent causal relations among variables and provide a language for thinking about causality and causal discovery * Increase understanding of the basic principles of causal induction and familiarity with prototypical and state-of-the-art algorithms in the field; enable the correct interpretation of the output of such algorithms * Enable the correct application of causal-discovery algorithms in practical data mining, machine learning, or statistical analysis tasks More specifically, it aims to clarify the following issues that are important to every researcher and practitioner of data analysis: * While most machine learning techniques assume identically and independently distributed data (i.i.d. data) quite often in many fields the data do not follow this assumption. The data may be experimental (e.g., after knocking out a gene) or under selection bias, e.g., in case-control studies. The tutorial helps understanding the differences and how they arise due to the causal structure of the domain * It is often the case that the purpose of the analysis is to identify important variables (a.k.a feature selection), called biomarkers in biology, risk factors in medicine, etc. The tutorial helps understanding the connection between the selected variables and the causal structure. * It is often the case that prediction models are not the final goal, but instead the goal is to control a system, e.g., treat a patient, design a drug with desired properties, etc. Causal modeling and induction is necessary to build machine learning models that can predict the outcome in a system that is being manipulated (e.g., under different experimental conditions). * The tutorial provides a deeper understanding in standard (non-causal) Bayesian Networks that have been proven important in Machine Learning, reasoning with Uncertainty in Artificial Intelligence, and Decision Support Systems for over two decades. * Causal discovery has already led to important discoveries, thus knowledge of these methods and their potential is important for the data analysts of the future. The tutorial outline is shown below: 1. Representing Causality 2. Inducing Causal Models from Data 3. Case Studies and Practical Issues
关 键 词: 独立模型; 最大祖图; 先进算法
课程来源: 视频讲座网
最后编审: 2019-04-07:cwx
阅读次数: 88