特征的个人通信模式Characterizing Individual Communication Patterns |
|
课程网址: | http://videolectures.net/kdd09_malmgren_cic/ |
主讲教师: | Robert Dean Malmgren |
开课单位: | 西北大学 |
开课时间: | 2009-09-14 |
课程语种: | 英语 |
中文简介: | 电子通信数据的日益普及,例如电子邮件交换所产生的数据,为社会和信息科学家提供了表征个人行为的新可能性,并通过扩展,识别人口中的潜在结构。在这里,我们提出了一个单独的电子邮件通信模型,该模型足够丰富,可以捕获跨个体的有意义的可变性,同时保持简单到可解释。我们证明了该模型,一个级联非齐次泊松过程,可以表示为双链隐马尔可夫模型,允许我们使用有效的推理算法从观测数据估计模型参数。然后,我们将此模型应用于两个电子邮件数据集,这些数据集分别由404和6,164个用户组成,这些用户是从不同国家和地区的两所大学收集的。我们发现两个数据集的最终估计参数分布惊人地相似,表明通信动态的至少一些特征超出了特定的上下文。我们还发现个体行为随时间的变化显着小于整个群体的变异性,表明个体可以被分类为持久的“类型”。我们得出结论,通信模式可能被证明是一种额外的属性数据类型,可以补充人口统计和网络数据,用于用户分类和异常值检测 - 我们通过基于推断的模型参数对用户进行可解释的聚类来说明这一点。 |
课程简介: | The increasing availability of electronic communication data, such as that arising from e-mail exchange, presents social and information scientists with new possibilities for characterizing individual behavior and, by extension, identifying latent structure in human populations. Here, we propose a model of individual e-mail communication that is sufficiently rich to capture meaningful variability across individuals, while remaining simple enough to be interpretable. We show that the model, a cascading non-homogeneous Poisson process, can be formulated as a double-chain hidden Markov model, allowing us to use an efficient inference algorithm to estimate the model parameters from observed data. We then apply this model to two e-mail data sets consisting of 404 and 6,164 users, respectively, that were collected from two universities in different countries and years. We find that the resulting best-estimate parameter distributions for both data sets are surprisingly similar, indicating that at least some features of communication dynamics generalize beyond specific contexts. We also find that variability of individual behavior over time is significantly less than variability across the population, suggesting that individuals can be classified into persistent "types". We conclude that communication patterns may prove useful as an additional class of attribute data, complementing demographic and network data, for user classification and outlier detection---a point that we illustrate with an interpretable clustering of users based on their inferred model parameters. |
关 键 词: | 电子通信数据; 电子邮件; 数据集 |
课程来源: | 视频讲座网 |
最后编审: | 2020-06-29:yumf |
阅读次数: | 73 |