0


复杂系统的采样与建模

On Sampling and Modeling Complex Systems
课程网址: http://videolectures.net/itis2013_marsili_complex_systems/  
主讲教师: Matteo Marsili
开课单位: 阿卜杜斯·萨拉姆国际理论物理中心
开课时间: 2013-12-04
课程语种: 英语
中文简介:
对复杂系统的研究受到这样一个事实的限制,即只有少数变量可用于建模和采样,而这些变量不一定是解释系统行为的最相关变量。此外,经验数据通常低于可能状态的样本空间。我们研究了一个通用框架,其中一个复杂系统被视为一个具有多个相互作用的自由度的系统,这些自由度只有部分已知,可以优化给定的函数。我们证明了关于已知变量的基本分布是玻尔兹曼形式,温度取决于未知变量的数量。特别是,当目标函数的未知部分衰减快于指数衰减时,温度会随着变量数量的增加而降低。我们在高斯分布的典型案例中表明,只有当相关变量的数量小于临界阈值时,模型才是可预测的。作为进一步的结果,我们证明了样本包含的关于系统行为的信息是由不同状态发生频率的熵量化的。这使我们能够描述\最大信息量样本的特性“:在欠采样区域,信息量最大的频率大小分布具有幂律行为,齐普夫定律出现在欠采样区域和样本包含足够的统计信息以推断系统行为的区域之间的交叉处。这些想法在一些应用中得到了说明,表明t它们可用于识别相关变量或选择最具信息性的数据表示形式,例如在数据聚类中。
课程简介: The study of complex systems is limited by the fact that only few variables are accessible for modeling and sampling, which are not necessarily the most relevant ones to explain the systems behavior. In addition, empirical data typically under sample the space of possible states. We study a generic framework where a complex system is seen as a system of many interacting degrees of freedom, which are known only in part, that optimize a given function. We show that the underlying distribution with respect to the known variables has the Boltzmann form, with a temperature that depends on the number of unknown variables. In particular, when the unknown part of the objective function decays faster than exponential, the temperature decreases as the number of variables increases. We show in the representative case of the Gaussian distribution, that models are predictable only when the number of relevant variables is less than a critical threshold. As a further consequence, we show that the information that a sample contains on the behavior of the system is quanti ed by the entropy of the frequency with which different states occur. This allows us to characterize the properties of maximally informative samples": In the under-sampling regime, the most informative frequency size distributions have power law behavior and Zipf's law emerges at the crossover between the under sampled regime and the regime where the sample contains enough statistics to make inference on the behavior of the system. These ideas are illustrated in some applications, showing that they can be used to identify relevant variables or to select most informative representations of data, e.g. in data clustering.
关 键 词: 建模和采样; 样本空间; 玻尔兹曼形式
课程来源: 视频讲座网
数据采集: 2022-02-21:zkj
最后编审: 2022-02-21:zkj
阅读次数: 47