0


用非力竭训练集的学习:细菌培养,利用光散射技术检测

Learning with a Non-Exhaustive Training Dataset: Detection of Bacteria Cultures Using Optical-Scattering Technology
课程网址: http://videolectures.net/kdd09_dundar_lnetdbcost/  
主讲教师: M. Murat Dundar
开课单位: 印第安纳大学
开课时间: 2009-09-14
课程语种: 英语
中文简介:
对于具有非穷举类别列表的训练数据集,即某些类尚未知,因此未表示,所得到的学习问题是不明确的。在这种情况下,来自缺失类的样本被错误地分类到现有类之一。对于某些应用,错误分类样本的成本可以忽略不计。然而,当考虑将食物病原体错误分类为非病原体的潜在不良后果时,可以更好地认识到该问题的重要性。我们的研究旨在使用光散射技术实时检测食物病原体。由单个亲本细胞的后代组成的细菌菌落在635nm处散射光以产生独特的前向散射特征。这些光谱特征包含细菌菌落的描述特征,可用于实时识别细菌培养物。仍有待解决的一个瓶颈是培训库的非穷举性质。从所有可能的细菌菌落中收集样品并构建具有详尽散布特征集的数字库是非常困难的,如果不是不切实际的话。本研究涉及从缺失类中实时检测样本以及使用非穷举训练数据集进行学习的相关问题。我们提出的方法假设已知和缺失的所有类的集合的公共先验。从已知类的样本估计先验的参数。然后使用该先验生成大量样本以模拟缺失类的空间。最后,使用来自真实类和模拟类的样本来实现贝叶斯最大似然分类器。用针对28种细菌亚类收集的样品进行的实验有利于所提出的超过现有技术的方法。
课程简介: For a training dataset with a nonexhaustive list of classes, i.e. some classes are not yet known and hence are not represented, the resulting learning problem is ill-defined. In this case a sample from a missing class is incorrectly classified to one of the existing classes. For some applications the cost of misclassifying a sample could be negligible. However, the significance of this problem can better be acknowledged when the potentially undesirable consequences of incorrectly classifying a food pathogen as a nonpathogen are considered. Our research is directed towards the real-time detection of food pathogens using optical-scattering technology. Bacterial colonies consisting of the progeny of a single parent cell scatter light at 635 nm to produce unique forward-scatter signatures. These spectral signatures contain descriptive characteristics of bacterial colonies, which can be used to identify bacteria cultures in real time. One bottleneck that remains to be addressed is the nonexhaustive nature of the training library. It is very difficult if not impractical to collect samples from all possible bacteria colonies and construct a digital library with an exhaustive set of scatter signatures. This study deals with the real-time detection of samples from a missing class and the associated problem of learning with a nonexhaustive training dataset. Our proposed method assumes a common prior for the set of all classes, known and missing. The parameters of the prior are estimated from the samples of the known classes. This prior is then used to generate a large number of samples to simulate the space of missing classes. Finally a Bayesian maximum likelihood classifier is implemented using samples from real as well as simulated classes. Experiments performed with samples collected for 28 bacteria subclasses favor the proposed approach over the state of the art.
关 键 词: 光散射技术; 实时识别细菌培养; 贝叶斯最大似然分类器
课程来源: 视频讲座网
最后编审: 2020-06-28:yumf
阅读次数: 100