In this final video in the series we go over how to generate new inputs for a program once we detect a user-influenced conditional branch. At the end there is also an example of the type of condition/resulting formula that we get from working on a real file parser, in this case libwebp.
(You probably want to click the “Watch on YouTube” option on the bottom right of the video and set the quality to 720p)
This type of emulation, input generation and formula checking does not need to be limited to conditional jumps. As I discussed in a previous post you can use a similar approach to discover variable ranges, check for variable relationships and assist in figuring out complex algorithms. For example, one could generate a query to the solver every time an argument to malloc is found to be influenced by the user, or export a list of all functions that operate on user-influenced data to IDA for manual review. (In fact, a light-weight version of this approach in combination with fuzzing and an IDA importer is possibly more generally useful to an individual auditor than going down the route of full on whitebox fuzzing. More on that later =))
Anyway, I hope these videos provide some insight into how a whitebox fuzzer might work as well as one approach to building a symbolic emulator. To give an idea of the effort involved – the combined whitebox fuzzing, trace parsing and emulation code (along with supporting libraries) comes to around 10,000 lines of Python. Of this, the emulator itself is only 3000 lines or so. The PIN tracer is just under 1000 lines of C++.
Tracing is currently fairly unoptimised and parsing something like a video or image while tracing can result in a factor of 10-100 increase in running time. This usually means a wait of 30 seconds, which isn’t too bad for whitebox fuzzing as tracing is not performed too often but for other uses of a symbolic emulator (like tracing while fuzzing normally) this will require some work. The emulator itself is Python based and as such is not lightning fast. In the default run-mode it emulates ~5000 instructions per second. What this translates to is about 30-40 minutes per trace of an average file parser. This isn’t as bad as you might think however as the tests cases generated tend to be much more effective at hitting new code than what you would get from dumb fuzzing. Despite this we still need performance gains and I’m working on a few different solutions for that. Somewhere around 30,000+ instructions per second would be what I would consider approaching acceptable =)
To preempt the inevitable questions – for now JESTER is not publicly available but that may change in the future. It’s very much a research prototype at the moment where we’re testing out several approaches to improving performance and general usefulness. However, if you are interested in working on this type of research post a comment with a contact address (it won’t appear publicly) as I’m fairly sure we are currently hiring.