0


在x86二进制文件中区分代码与数据

Differentiating Code from Data in x86 Binaries
课程网址: http://videolectures.net/ecmlpkdd2011_wartell_code/  
主讲教师: Richard Wartell
开课单位: 德克萨斯大学
开课时间: 2011-10-03
课程语种: 英语
中文简介:
强大的静态反汇编是实现许多二进制代码分析的重要部分,例如逆向工程,恶意软件分析,衬里参考监控和软件故障隔离。然而,目前的反汇编程序中的一个主要困难是在数据交错时从数据中分辨出代码。本文介绍了一种基于机器学习的反汇编算法,它将x86二进制分割成字节的子序列,然后将每个子序列分类为代码或数据。该算法使用统计数据压缩技术从一组预先标记的二进制文件构建语言模型。它依次扫描新的二进制可执行文件,并在每个可能的代码中设置一个断点,以编码和编码数据/数据到代码转换。作为代码或数据的每个段基于最小交叉熵。实验结果表明了算法的有效性。
课程简介: Robust, static disassembly is an important part of achieving high coverage for many binary code analyses, such as reverse engineering, malware analysis, reference monitor in-lining, and software fault isola- tion. However, one of the major diculties current disassemblers face is di erentiating code from data when they are interleaved. This paper presents a machine learning-based disassembly algorithm that segments an x86 binary into subsequences of bytes and then classi es each subse- quence as code or data. The algorithm builds a language model from a set of pre-tagged binaries using a statistical data compression technique. It sequentially scans a new binary executable and sets a breaking point at each potential code-to-code and code-to-data/data-to-code transition. The classi cation of each segment as code or data is based on the min- imum cross-entropy. Experimental results are presented to demonstrate the e ectiveness of the algorithm.
关 键 词: 静态反汇编; 数据交错; 交叉熵
课程来源: 视频讲座网
最后编审: 2020-07-17:yumf
阅读次数: 55