0


健康信息学中的机器学习:更好地利用领域专家

Machine Learning in Health Informatics: Making Better use of Domain Experts
课程网址: http://videolectures.net/kdd2013_wallace_health_informatics/  
主讲教师: Byron C. Wallace
开课单位: 布朗大学
开课时间: 2013-09-27
课程语种: 英语
中文简介:

我们提出了新颖的机器学习和数据挖掘方法,这些方法使现实世界中的学习系统更加高效。我们专注于临床信息学的领域,这是一个信息不胜枚举的领域的典型例子。由于临床信息学任务固有的特性-实际上,由于许多需要专门领域知识的任务-“现成的”机器学习技术在该领域通常表现不佳。

如果机器学习要成功在临床科学中,必须开发出新颖的方法来:减轻模型归纳过程中班级不平衡的影响;利用丰富的领域知识,将高技能的领域专家带到任务中;并以更少的颤动(更少的标签)得出更好的模型。我们提出了解决这些问题中每一个的新机器学习方法,并展示了它们在抽象筛选任务中的功效。特别是,我们针对班级不平衡问题开发了新的理论观点,开发了双重监督的新方法(例如,实例和功能上的标签),以及解决了现实应用中固有问题的新的主动学习技术(例如,同时利用多个专家) )。这些贡献中的每一个都旨在从更少的标签中榨取更好的分类性能,从而更好地利用领域专家的时间和专业知识。

这项工作的直接目的是减少进行系统审核的工作量,为此,我们证明了所开发的方法可以将审阅者的工作量减少一半以上,而不会牺牲评论的全面性(即,不会丢失任何相关的公开证据)。但这只是一个模范任务。此处介绍的方法可广泛应用于许多现实世界中的学习问题,即那些需要专业知识,表现出班级不平衡(以及成本不对称)且人力资源有限的问题。我们证明,我们所开发的方法相对于现有的机器学习方法而言,在诱导更好的模型和更少的耗费方面有了实质性的改进。

课程简介: We present novel machine learning and data mining methods that make real-world learning systems more efficient. We focus on the domain of clinical informatics, an archetypical example of a field overwhelmed with information. Due to properties inherent to clinical informatics tasks – and indeed, to many tasks that require specialized domain knowledge – ‘off-the-shelf’ machine learning technologies generally perform poorly in this domain. If machine learning is to be successful in clinical science, novel methods must be developed to: mitigate the effects of class imbalance during model induction; exploit the wealth of domain knowledge highly skilled domain experts bring to the task; and to induce better models with less effort (fewer labels). We present new machine learning methods that address each of these issues, and demonstrate their efficacy in the task of abstract screening. In particular, we develop new theoretical perspectives on class imbalance, novel methods for exploiting dual supervision (i.e., labels on both instances and features), and new active learning techniques that address issues inherent to real-world applications (e.g., exploiting multiple experts in tandem). Each of these contributions aims to squeeze better classification performance out of fewer labels, thereby making better use of domain experts’ time and expertise. The immediate aim in this work is to reduce the workload involved in conducting systematic reviews, and to this end we demonstrate that the developed methods can reduce reviewer workload by more than half, without sacrificing the comprehensiveness of reviews (i.e., without missing any relevant published evidence). But this is only an exemplary task; the approaches presented here have wider application to many real-world learning problems, i.e., those that require specialized expertise, exhibit class imbalance (and asymmetric costs) and for which limited human resources are available. We show that the methods we have developed bring substantial improvements over previously existing machine learning approaches in terms of inducing better models with less effort.
关 键 词: 临床信息学; 机器学习技术; 模型归纳; 数据挖掘
课程来源: 视频讲座网
数据采集: 2021-05-27:zyk
最后编审: 2021-05-28:zyk
阅读次数: 59