0


高维重叠分布之间的潜在空间域转移

Latent Space Domain Transfer between High Dimensional Overlapping Distributions
课程网址: http://videolectures.net/www09_xie_lsdt/  
主讲教师: Jiangtao Ren; Olivier Verscheure; Jing Peng; Wei Fan; Sihong Xie
开课单位: 伊利诺大学
开课时间: 2009-05-20
课程语种: 英语
中文简介:
由于许多原因,将知识从一个领域转移到另一个领域是具有挑战性的。由于训练数据和测试数据的条件分布和边际分布是不相同的,所以在一个域中训练的模型,当直接应用于不同域中时,通常精度较低。对于文本文档、序列数据、医疗数据、不同分辨率的图像数据等许多具有大特征集的应用程序,两个域通常不包含完全相同的特征,从而引入了大量“缺失值”;当考虑来自两个域的特性的联合时。换句话说,它的边际分布最多是重叠的。同时,这些问题通常是高维的,比如,几千个特征。因此,高维数和缺失值的组合使得两个域之间的条件概率关系难以度量和建模。为了解决这些挑战,我们提出了一个框架,首先通过“填充”使两个域的边际分布更接近;不相交特征的缺失值。然后,在“后期空间”中寻找类似的子结构。由展开的特征向量映射,其中边际分布和条件分布相似。利用这些潜在空间中的子结构,该方法可以找到具有高概率跨域可转移的共同概念。在预测过程中,将未标记的实例作为“查询”处理,检索出域外最相关的标记实例,利用检索出的域外实例进行加权投票进行分类。摘要从形式上证明了跨域引入特征值和潜在语义索引可以使两个相关域的分布比在原特征空间中更容易度量,在预测域内样本时,采用最近邻方法检索相关域外样本存在误差。
课程简介: Transferring knowledge from one domain to another is challenging due to a number of reasons. Since both conditional and marginal distribution of the training data and test data are non-identical, model trained in one domain, when directly applied to a diffeerent domain, is usually low in accuracy. For many applications with large feature sets, such as text document, sequence data, medical data, image data of different resolutions, etc. two domains usually do not contain exactly the same features, thus introducing large numbers of "missing values" when considered over the union of features from both domains. In other words, its marginal distributions are at most overlapping. In the same time, these problems are usually high dimensional, such as, several thousands of features. Thus, the combination of high dimensionality and missing values make the relationship in conditional probabilities between two domains hard to measure and model. To address these challenges, we propose a framework that first brings the marginal distributions of two domains closer by "filling up" those missing values of disjoint features. Afterwards, it looks for those comparable sub-structures in the "latent-space" as mapped from the expanded feature vector, where both marginal and conditional distribution are similar. With these sub-structures in latent space, the proposed approach then find common concepts that are transferable across domains with high probability. During prediction, unlabeled instances are treated as "queries", the mostly related labeled instances from out-domain are retrieved, and the classifcation is made by weighted voting using retrievd out-domain examples. We formally show that importing feature values across domains and latent semantic index can jointly make the distributions of two related domains easier to measure than in original feature space, the nearest neighbor method employed to retrieve related out domain examples is bounded in error when predicting in-domain examples.
关 键 词: 知识领域转移; 边际分布; 高维数与缺失值; 跨域引入特征值和潜在语义指数; 建立数据模型
课程来源: 视频讲座网
最后编审: 2020-07-16:yumf
阅读次数: 22