0


fLDA:基于潜在狄利克雷分布的矩阵分解

fLDA: Matrix Factorization through Latent Dirichlet Allocation
课程网址: http://videolectures.net/wsdm2010_chen_fmftl/  
主讲教师: Bee-Chung Chen
开课单位: 领英公司
开课时间: 2010-02-22
课程语种: 英语
中文简介:
我们提出了一种新的矩阵因子分解方法fLDA,用于预测推荐系统应用中的评分。项目元数据的表示是自然的。这类场景在内容推荐、广告定位和web搜索等web应用程序中很常见,这些web应用程序中的条目分别是文章、广告和web页面。由于数据的稀疏性,正则化是提高预测精度的关键。我们的方法是通过用户特征和与每个条目相关联的词袋同时对用户和条目因子进行正则化。具体地说,一个项目中的每一个词都与一个经常被称为该词主题的离散潜因子相关联;项目主题是通过对项目中所有单词的主题进行平均得到的。然后,将一个条目的用户评分建模为用户与条目主题的关联性,其中用户与主题的关联性(用户因子)和条目中单词的主题赋值(条目因子)是在监督的方式下共同学习的。为了避免过拟合,分别通过高斯线性回归和潜在狄利克雷分配(LDA)先验对用户和项目因子进行正则化。我们展示了我们的模型是精确的、可解释的,并且通过一个模型无缝地处理冷启动和暖启动场景。在基准数据集和Yahoo!在冷启动场景中,fLDA提供了优越的预测精度,可以与热启动场景中的先进方法相媲美。作为副产品,fLDA还标识了解释用户-项目交互的有趣主题。我们的方法还将最近提出的一种称为监督LDA (sLDA)的技术推广到协同实验过滤应用中。虽然sLDA在单次回归中以监督的方式估计项目主题向量,但是fLDA在估计项目因素时合并了多个回归(每个用户一个)。
课程简介: We propose fLDA, a novel matrix factorization method to predict ratings in recommender system applications where a “bag-of-words” representation for item meta-data is natu- ral. Such scenarios are commonplace in web applications like content recommendation, ad targeting and web search where items are articles, ads and web pages respectively. Because of data sparseness, regularization is key to good predictive accuracy. Our method works by regularizing both user and item factors simultaneously through user features and the bag of words associated with each item. Specifically, each word in an item is associated with a discrete latent factor often referred to as the topic of the word; item topics are obtained by averaging topics across all words in an item. Then, user rating on an item is modeled as user’s affinity to the item’s topics where user affinity to topics (user factors) and topic assignments to words in items (item factors) are learned jointly in a supervised fashion. To avoid overfitting, user and item factors are regularized through Gaussian linear regression and Latent Dirichlet Allocation (LDA) priors respectively. We show our model is accurate, interpretable and handles both cold-start and warm-start scenarios seamlessly through a single model. The efficacy of our method is illustrated on benchmark datasets and a new dataset from Yahoo! Buzz where fLDA provides superior predictive accuracy in cold-start scenarios and is comparable to state-of- the-art methods in warm-start scenarios. As a by-product, fLDA also identifies interesting topics that explains user- item interactions. Our method also generalizes a recently proposed technique called supervised LDA (sLDA) to col- laborative filtering applications. While sLDA estimates item topic vectors in a supervised fashion for a single regression, fLDA incorporates multiple regressions (one for each user) in estimating the item factors.
关 键 词: 计算机科学; 语义Web; 注释
课程来源: 视频讲座网
最后编审: 2020-01-13:chenxin
阅读次数: 58