
Differentiating Code from Data in x86 Binaries
课程网址: http://videolectures.net/ecmlpkdd2011_wartell_code/  
主讲教师: Richard Wartell
开课单位: 德克萨斯大学
开课时间: 2011-10-03
课程语种: 英语
课程简介: Robust, static disassembly is an important part of achieving high coverage for many binary code analyses, such as reverse engineering, malware analysis, reference monitor in-lining, and software fault isola- tion. However, one of the major diculties current disassemblers face is di erentiating code from data when they are interleaved. This paper presents a machine learning-based disassembly algorithm that segments an x86 binary into subsequences of bytes and then classi es each subse- quence as code or data. The algorithm builds a language model from a set of pre-tagged binaries using a statistical data compression technique. It sequentially scans a new binary executable and sets a breaking point at each potential code-to-code and code-to-data/data-to-code transition. The classi cation of each segment as code or data is based on the min- imum cross-entropy. Experimental results are presented to demonstrate the e ectiveness of the algorithm.
关 键 词: 静态反汇编; 数据交错; 交叉熵
课程来源: 视频讲座网
最后编审: 2020-07-17:yumf
阅读次数: 55