<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Sean Heelan&#039;s Blog</title>
	<atom:link href="http://seanhn.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://seanhn.wordpress.com</link>
	<description>Program analysis, verification and security</description>
	<lastBuildDate>Sat, 10 Dec 2011 21:34:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='seanhn.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Sean Heelan&#039;s Blog</title>
		<link>http://seanhn.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://seanhn.wordpress.com/osd.xml" title="Sean Heelan&#039;s Blog" />
	<atom:link rel='hub' href='http://seanhn.wordpress.com/?pushpress=hub'/>
		<item>
		<title>SAT/SMT Summer School 2011 Summary (Days 5 &amp; 6)</title>
		<link>http://seanhn.wordpress.com/2011/06/21/satsmt-summer-school-2011-summary-days-5-6/</link>
		<comments>http://seanhn.wordpress.com/2011/06/21/satsmt-summer-school-2011-summary-days-5-6/#comments</comments>
		<pubDate>Tue, 21 Jun 2011 04:04:27 +0000</pubDate>
		<dc:creator>seanhn</dc:creator>
				<category><![CDATA[SAT/SMT Summer School 2011]]></category>

		<guid isPermaLink="false">http://seanhn.wordpress.com/?p=801</guid>
		<description><![CDATA[Day 5 Sketching: Program Synthesis using SAT Solvers (Armando Solar-Lezama) Armando started his talk by demonstrating the automatic synthesis of a program for swapping two integer variables without using a third. It&#8217;s a standard algorithm and quite small but was still cool to see. He then demonstrated a few more algorithms involving bit-level arithmetic. The [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=801&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><b>Day 5</b></p>
<p><b>Sketching: Program Synthesis using SAT Solvers (Armando Solar-Lezama)</b></p>
<p>Armando started his talk by demonstrating the automatic synthesis of a program for swapping two integer variables without using a third. It&#8217;s a standard algorithm and quite small but was still cool to see. He then demonstrated a few more algorithms involving bit-level arithmetic. The implementation of this tool, called Sketch, can be found <a href="https://bitbucket.org/gatoatigrado/sketch-frontend/wiki/Home">here</a>. The demonstrations given were for a C-like language and apparently synthesis works quite well for algorithms based around bit-twiddling.</p>
<p>These programs were generated from program &#8216;sketches&#8217;, essentially algorithmic skeletons, and a test harness, similar to unit tests, that described the desired semantics of the program. The sketches express the high level structure of the program and then the details are synthesized using a SAT solver and a refinement loop driven by the tests. The idea of the sketches is to make the problem tractable. The intuition for this was given by the example of curve fitting. That can be a difficult problem if you have nothing to go on but data points whereas if you are told the curve is Gaussian, for example, the problem becomes much more feasible.</p>
<p>The synthesis algorithm first uses the sketched fragment to generate a candidate program and then a SAT solver is invoked to see if the conjunction of this program and the semantics described by the tests are valid. If not, a counter-example is generated and used to refine the next iteration of program generation. This is incredibly simplified and the full details can be found in Armando&#8217;s <a href="http://people.csail.mit.edu/asolar/papers/thesis.pdf">thesis</a>. </p>
<p>This was yet another talk where there was an emphasis on pre-processing formulae before they get to the solver. The phrase &#8216;aggressive simplification&#8217; was used over and over throughout the conference and for synthesis this involved dataflow analysis and expression reduction (e.g. <tt>y AND 1</tt> reduces to <tt>y</tt>) as well as more standard common sub-expression elimination. </p>
<p><b>Day 6</b></p>
<p><b>Harnessing SMT power using the verification engine Boogie (Rustan Leino)</b></p>
<p>This talk began with some coding demonstrations in a language called Dafny that has support for function pre-conditions, post-conditions and loop invariants. As these features are added to a code-base they are checked in real time. Dafny is translated into an intermediate verification language (IVL) called <a href="http://boogie.codeplex.com/">Boogie</a> (the verification system for it is open source under the MS public license) which can be converted into SMT form and then checked using the Z3 SMT solver. While this fun to watch, most languages don&#8217;t have these inbuilt constructs for pre/post-conditions and invariants. Fortunately, Boogie is designed to be a generic IVL and translation tools exist for C, C++, C#, x86 and a variety of other languages (although from what I gather only some of these are publicly available and none are open source). As such, Boogie is designed to separate the verification of programs in a given language from the effort of converting them into a form that is amenable to checking. </p>
<p>The high level, take-away message from this talk was &#8220;Don&#8217;t go directly to the SMT solver&#8221;. It relates to the separation of concerns I just mentioned. This lets you share infrastructure and code for verification tasks that will be common between many languages and also means you have an intermediate form to perform simplification on before passing any formulae to a solver. </p>
<p><b>HAVOC: SMT solvers for precise and scalable reasoning of programs (Shuvendu Lahiri &amp; Shaz Qadeer)</b></p>
<p><a href="http://research.microsoft.com/en-us/projects/havoc/">HAVOC</a> is one such verification tool for C that makes use of Boogie. It adds support for user-defined contracts on C code that can then be checked. Based on the <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.26.956&amp;rep=rep1&amp;type=pdf">Houdini algorithm</a> HAVOC can also perform contract inference with the aim of alleviating much of the burden on the user. </p>
<p>I really wish we had something similar to HAVOC for code auditing (this was actually one of the use cases mentioned during the talk). I&#8217;m not sure about others but essentially how I audit source code involves manually coming up with pre-conditions, post-conditions and invariants and then trying to verify these across the entire code-base by hand. This is fine, but with a tool-set of vim, ctags and cscope it&#8217;s also incredibly manual and seems like something that could at least be partially automated. It was mentioned that a more up-to-date version of HAVOC might be released soon so maybe this will be a possibility. </p>
<p><b>Non-DPLL Approaches to Boolean SAT Solving (Bart Selman &amp; Carla Gomes)</b></p>
<p>This talk was on probabilistic approaches to SAT solving. These techniques still lag far behind DPLL based algorithms on industrial benchmarks but are apparently quite good on random instances with large numbers of variables. </p>
<p><b>Symbolic Execution and Automated Exploit Generation (David Brumley)</b></p>
<p>While I previously ranted about the <a href="http://dl.packetstormsecurity.net/papers/attack/automatic-exploit.pdf">paper</a> this talk was based on, this talk was a far better portrayal of the research. Effectively, we&#8217;re in the very, very early stages of exploit generation research; while there have been some cool demos of how solvers might come into play we&#8217;re still targeting the most basic of vulnerabilities and in toy environments. All research has to start somewhere though, my own <a href="http://www.cprover.org/dissertations/thesis-Heelan.pdf">thesis</a> was no more advanced, so it was good to see this presented with an honest reflection on how much work is left.</p>
<p>One interesting feature of the CMU work is preconditioned symbolic execution which adds preconditions to paths that must be satisfied for the path to be explored. This is a feature missing from KLEE but would be just as useful in symbolic execution for bug finding as well as exploit generation. Something that remains to be researched and discussed is efficient ways to come up with these pre-conditions. </p>
<p><b>Conclusion</b></p>
<p>The summer school was a great event and renewed my enthusiasm for formal methods as a feasible and cost effective basis for bug finding and exploit development. The best talks were those that presented an idea, gave extensive, concrete data to back it up and explained the core concepts <b>and</b> limitations with real world examples. I hope to see more papers and talks like this in the future.</p>
<p>A generic conclusion for the six days would be difficult, so instead the following were the reoccurring themes that stood out to me across the talks that may be relevant to someone implementing these systems:</p>
<p>- <b>Focus on one thing and do it well.</b> For example, separate instrumentation from symbolic execution from solving formulae.<br />
- <b>Aggressively simplify before invoking a solver.</b> Simplification strategies varied from domain specific e.g. data-flow analysis, to generic logical reductions but all of them greatly reduced the complexity of the problems that solvers had to deal with and thus increased the problems the tools could handle.<br />
- <b>Abstract, refine and repeat.</b> The concept of a counter-example guided abstraction refinement loop seemed to be core to algorithms from hardware model checking, to program synthesis, to bug finding. In each, CEGAR was used to scale algorithms to more complex and more numerous problems by abstracting complexity and then reintroducing it as necessary.<br />
- <b>Nothing beats hard data for justifying conclusions and driving new research.</b> This point was made in the earliest talks on comparing SAT solver algorithms and reiterated through the SAGAN information collection/organisation system of SAGE. Designing up front to gather data lets you know where things are going wrong, keeps a record of improvements and makes for some pretty cool slides when you need to convince other people you&#8217;re not insane =)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/seanhn.wordpress.com/801/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/seanhn.wordpress.com/801/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/seanhn.wordpress.com/801/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/seanhn.wordpress.com/801/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/seanhn.wordpress.com/801/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/seanhn.wordpress.com/801/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/seanhn.wordpress.com/801/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/seanhn.wordpress.com/801/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/seanhn.wordpress.com/801/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/seanhn.wordpress.com/801/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/seanhn.wordpress.com/801/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/seanhn.wordpress.com/801/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/seanhn.wordpress.com/801/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/seanhn.wordpress.com/801/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=801&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://seanhn.wordpress.com/2011/06/21/satsmt-summer-school-2011-summary-days-5-6/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/72a292ee63247e7ef61caf1c8c5e18b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">seanhn</media:title>
		</media:content>
	</item>
		<item>
		<title>SAT/SMT Summer School 2011 Summary (Days 3 &amp; 4)</title>
		<link>http://seanhn.wordpress.com/2011/06/16/satsmt-summer-school-2011-summary-days-3-4/</link>
		<comments>http://seanhn.wordpress.com/2011/06/16/satsmt-summer-school-2011-summary-days-3-4/#comments</comments>
		<pubDate>Thu, 16 Jun 2011 21:57:01 +0000</pubDate>
		<dc:creator>seanhn</dc:creator>
				<category><![CDATA[SAT/SMT Summer School 2011]]></category>

		<guid isPermaLink="false">http://seanhn.wordpress.com/?p=795</guid>
		<description><![CDATA[The slides for the summer school have started to go online so for the remaining days I&#8217;ll just give a quick summary of parts I thought were particularly interesting or comments that were made but not in the slides. Day 3 BitBlaze &#38; WebBlaze: Tools for computer security using SMT Solvers (Dawn Song &#38; Prateek [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=795&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The slides for the summer school have started to go <a href="https://wikis.mit.edu/confluence/display/satsmtschool11/SATSMT+Summer+School+2011">online</a> so for the remaining days I&#8217;ll just give a quick summary of parts I thought were particularly interesting or comments that were made but not in the slides.</p>
<p><b>Day 3</b></p>
<p><b>BitBlaze &amp; WebBlaze: Tools for computer security using SMT Solvers (Dawn Song &amp; Prateek Saxena)</b></p>
<p>The first thing of note from this talk was a brief discussion on selecting paths for analysis during symbolic/concrete execution. Anyone who has used KLEE knows the lack of sane path selection mechanisms is a significant drawback so it was good to see one of the talks on this type of system discuss it. The methods used were dataflow and control flow distances to functions of interest. </p>
<p>An interesting problem tackled later in the talk was how to distinguish due from undue influence of tainted data over control flow. For example, it might be acceptable for user data to taint a value used in a switch statement that selects a function to run but it&#8217;s unlikely to be very good if it can taint a function pointer directly. Four different methods were presented from distinguishing these cases, the simplest being point by point exhaustion using a solver of the number of possible target addresses in the address space. More complex probabilistic and information theoretic approaches were also discussed and are elaborated on in their <a href="http://bitblaze.cs.berkeley.edu/papers/influence_plas09.pdf">paper</a>. It would be nice to see some more experimental data with these more advanced methods though as it is limited to 3 vulnerabilities and 3 benign cases. </p>
<p><b>SAT-based Model-Checking (Armin Biere)</b></p>
<p>Armin is the developer of one of the best SAT solvers, <a href="http://fmv.jku.at/lingeling/">Lingeling</a>, and his talk discussed advances in using SAT technology for model checking. During the talk he mentioned a paper by Aaron Bradley on <a href="http://ecee.colorado.edu/~bradleya/ic3/ic3_bradley.pdf">SAT based model checking without unrolling</a> which might be worth checking out but I haven&#8217;t had a chance to yet</p>
<p><b>CryptoMiniSat &#8212; A Rough Guide (Mate Soos)</b></p>
<p>This was a great talk by <a href="http://www.msoos.org/">Mate Soos</a> on CryptoMiniSat, which won last years SAT Race and is <a href="http://www.msoos.org/cryptominisat2">open source</a>, and SAT solver design. Mate started with a discussion of the software design philosophy behind the project and put forward that it&#8217;s better to have less optimised and complex code if you can more easily implement better ideas. Given that his solver is faster than Lingeling, which is far more difficult to comprehend, it seems that he is correct. He had some other interest things to say on SAT solver features, emphasising regular simplification of expressions and maintaining a cache of results from unit propagation even if they are not currently useful. </p>
<p><b>SAGE: Automated Whitebox Fuzzing using SMT solvers (Patrice Godefroid &amp; David Molnar)</b></p>
<p>In my opinion, this was the best talk of the summer school so far. Patrice and David discussed <a href="http://research.microsoft.com/en-us/projects/atg/">SAGE</a> and presented a lot of data to encourage the development of tools for this kind of testing. SAGE is built on top of previously developed MS tools detecting crashes (AppVerifier), recording traces (Nirvana), generating constraints (TruScan) and solving constraints (Z3).</p>
<p>Unlike KLEE and the Bitblaze tools, the symbolic execution part of SAGE only accounts for a small fraction of the time cost. Only 1/4 of the total time is spent on symbolic execution with the remainder of their 3 week fuzzing runs spent on tracing, generating constraints and running the application under test on the fuzz files. </p>
<p>One interesting thing mentioned was that while most queries to the solver only take 1/10th of a second all queries are capped at 5 seconds and after that the solver is killed and the result is presumed UNSAT. This is based on the observation that they get more code coverage by this method than waiting for hours for a single query to return. They backed this up with some statistics that showing that longer run times only very rarely led to more bugs. </p>
<p>Some other points of note were:<br />
- From the start SAGE was engineered to provide enough information and statistics on every part of its system that determining what it is doing and where it is succeeding/failing is possible. This is facilitated through a system called SAGAN that allows them to focus on areas needing work.<br />
- SAGE is primarily deployed against file parsers. This is a use case where the majority of non-determinism is from the input. In other environments with different sources of non-determinism it might be more difficult to direct the application through constraint solving.<br />
- Most OOM conditions are a result of trying to store the constraints in memory while analysing the trace, not in the solver as I would have expected. As a result, simplification and expression elimination can be necessary even before that staged.<br />
- Most crashes seemed to be concentrated within the first 6 generations of constructed fuzz files but crashes were seen in all generations up to the mid to late teens. I don&#8217;t think they&#8217;ve ran for any longer than that.<br />
- SAGE was responsible for 30% of bugs found in a certain class of file parsers on Windows 7. These were bugs missed by all other testing mechanisms. I wonder how long it will be before those of us interested in bug finding will have to start looking at tools like SAGE from the point of view of discovering where they are weak as a starting point for auditing. </p>
<p>All in all, this presentation had hard data to support some very exciting conclusions. </p>
<p><b>Day 4</b></p>
<p><b>Approaches to Parallel SAT Solving (Youssef Hamadi)</b><br />
I had recently been wondering what the state of the art in parallel solving is so this was good to see. Youssef first started by proposing that we are unlikely to see order of magnitude speed ups in SAT solving based on advances in the current CDCL architecture of SAT solvers. I guess there are two ways to deal with this, one is to look at different approaches to sequential SAT solving and the other is to look at parallelism.</p>
<p>From 1996 to 2008 most of the approaches to parallel SAT proceeded by splitting the problem space into chunks and solving these instances in parallel with clause sharing of clauses under a certain size. Another approach is the portfolio approach used by parallel Z3, PLingeling and ManySAT. In this approach the same problem is attacked by several different solver instances each using a different configuration. The solver configuration is parameterized on the restart policy, polarity selection, clause learning and branching heuristics. This lead to super-linear speed-up with combinations of these solvers performing better than the sum of their parts. </p>
<p>One particular issue for this strategy though is how best to do clause sharing between solvers. Sharing of large clauses can be costly so one needs a strategy to figure out the upper limit on the size of clauses to share. This is complicated by the fact that the size of learned clauses progresses as the solver advances in the problem. Two algorithms were discussed in this context. The first was based on TCP bandwidth calculation algorithms that slowly increase the size of shared clauses and then quickly back off once a problem is detected and a heuristic based on variable activity to set a quality threshold on clauses to accept from other solvers. </p>
<p>This portfolio mechanism works well for between 4 and 8 cores but after that the effects of added cores is greatly diminished. </p>
<p><b>SAT Solving and Complexity Theory (Ryan Williams)</b></p>
<p>This was a theoretical talk that discussed some of the difficulties that people have ran into in not only proving P != NP but on many problems on the relations between complexity spaces. Much like the talk by Shai Ben David I&#8217;m pretty sure I&#8217;d butcher the mathematical details were I to summarise but the take away message was similar: the logics for reasoning and proof techniques that we have today for problems like this are quite insufficient; as a result people have struggled to even do far weaker reasoning like establishing lower bounds on NP-complete problems. It was a really interesting talk but I&#8217;ll definitely need to rewatch the video when it comes out =D</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/seanhn.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/seanhn.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/seanhn.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/seanhn.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/seanhn.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/seanhn.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/seanhn.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/seanhn.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/seanhn.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/seanhn.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/seanhn.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/seanhn.wordpress.com/795/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/seanhn.wordpress.com/795/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/seanhn.wordpress.com/795/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=795&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://seanhn.wordpress.com/2011/06/16/satsmt-summer-school-2011-summary-days-3-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/72a292ee63247e7ef61caf1c8c5e18b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">seanhn</media:title>
		</media:content>
	</item>
		<item>
		<title>SAT/SMT Summer School 2011 Summary (Day 2)</title>
		<link>http://seanhn.wordpress.com/2011/06/15/satsmt-summer-school-2011-summary-day-2/</link>
		<comments>http://seanhn.wordpress.com/2011/06/15/satsmt-summer-school-2011-summary-day-2/#comments</comments>
		<pubDate>Wed, 15 Jun 2011 04:54:00 +0000</pubDate>
		<dc:creator>seanhn</dc:creator>
				<category><![CDATA[SAT/SMT Summer School 2011]]></category>

		<guid isPermaLink="false">http://seanhn.wordpress.com/?p=783</guid>
		<description><![CDATA[Independence Results for the P vs. NP Question (Shai Ben David) This was a really fun talk that was very much on the theoretical side of logic and satisfiability but with potentially very important implications. Ever since learning about early 20th century work by Hilbert, Godel, Turing etc. on foundations of logical systems and proofs [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=783&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><b>Independence Results for the P vs. NP Question (Shai Ben David)</b></p>
<p>This was a really fun talk that was very much on the theoretical side of logic and satisfiability but with potentially very important implications. Ever since learning about early 20th century work by Hilbert, Godel, Turing etc. on foundations of logical systems and proofs I&#8217;ve been fascinated by anything that discusses the universal limitations and capabilities of logical systems. This was the first talk I&#8217;ve seen where this kind of purely theoretical work was linked to an implication for solving technologies. The fundamental question approached in the talk was whether P != NP is an irresolvable question give the logics we have available. That is, can we prove that it is unprovable. </p>
<p>I would do it injustice to try and summarise the talk (and I&#8217;d get it wrong!) but the main result was that if it were true that P is nearly equal to NP then we would not be able to prove P != NP using current lines of reasoning and tools. The interesting result for SAT solvers is that if this were the case then many of the problems we want to solve may be solvable in almost-polynomial time. The downside is that even if we could prove this the proof probably wouldn&#8217;t help at all in building a solver than can exploit this closeness.  </p>
<p>I&#8217;ve totally butchered the details of this talk but you can find a earlier/shorter version of it <a href="http://www.cs.uwaterloo.ca/~shai/P%20vs%20NP-2.ppt">here</a> and a paper <a href="http://www.cs.technion.ac.il/~shai/ph.ps.gz">here</a>. </p>
<p><b>HAMPI: A Solver for String Theories (Vijay Ganesh)</b></p>
<p>Vijay&#8217;s talk was on the <a href="http://people.csail.mit.edu/akiezun/hampi/">HAMPI</a> solver. HAMPI contains a theory for bounded (that&#8217;s important) character strings that allows it to reason about things like whether a regular expression matches against a particular string or not. From what I gathered, it operates by converting a regular expression into a context-free-grammar and then converting that context-free-grammar, along with any constraints we may wish to check, into a formula over bitvectors and checking the satisfiability of this with STP. The main target application was detecting oversights in regexs designed to catch SQL injection attempts but Vijay also mentioned they got a 2-5x speed-up when using this solver with KLEE on applications that do a lot of string manipulation. KLEE tends to perform quite poorly on things like XML parsers so I&#8217;d love to see if specialised solvers can help out here. </p>
<p><b>Modern SMT Solver Implementation (Leonardo De Moura&amp; Nikolaj Bjorner)</b></p>
<p>This was a good talk by some of the best guys building SMT solvers, <a href="http://research.microsoft.com/en-us/um/people/leonardo/">Leonardo De Moura</a> and <a href="http://research.microsoft.com/en-us/people/nbjorner/">Nikolaj Bjorner</a>. Both of their publication pages are worth checking out for details on building SMT solvers as well as the theoretical aspects. </p>
<p>They first highlighted some of the core problems in SMT solvers that affect performance: combining engines, unfairness between theory solvers and quantifiers. The most interesting part of the talk for me was on the use of abstraction/relaxing and then refinement when dealing with complex problems. For example, abstracting problems problems using uninterpreted functions and then checking satisfiability may reduce the complexity of the original problem. If it turns out that is UNSAT then the original is UNSAT and if you get a SAT result you can then refine the abstraction if necessary and check again. This idea of abstraction/refinement (CEGAR I guess) loops came up a lot in many different talks.</p>
<p>Also interesting was the mention of their verifying compiler projects that do function level verification and use contracts for called functions in the analysis rather than analysing down into them. I know the idea of contracts is used in <a href="http://research.microsoft.com/en-us/projects/havoc/">HAVOC</a> and discussed extensively in Thomas Ball&#8217;s <a href="http://research.microsoft.com/en-us/people/tball/">publications</a> but I&#8217;m not sure if this was the project they were referring too.    </p>
<p><b>Scalable Testing/Reverse Engineering/Performance Profiling with Parallel and Selective Symbolic Execution (George Candea &amp; Stefan Bucur)</b></p>
<p>The next talk was on the guys behind <a href="http://dslab.epfl.ch/proj/s2e">S2E</a> and <a href="http://dslab.epfl.ch/proj/cloud9">Cloud9</a>. Cloud9 is cool in that it&#8217;s a big cluster of nodes each exploring different parts of a tree in symbolic execution. They found run times for gaining a particular code coverage percentile to drop dramatically when going from 1 to 8 nodes and then drop even further as they went up to 48 nodes. The total effect being a drop from 6 hours to minutes for the particular example. </p>
<p>S2E caught my attention a few weeks ago after reading their paper as it is designed to be a platform for writing analysis tools that leverage symbolic execution. To my knowledge it is the first system of this kind that allows a callback/event based mechanism for analysis tools and can target an entire operating system stack (it&#8217;s built on QEMU). They have some good documentation as well which is crucial for getting users involved. When I grabbed the code a few weeks back I did notice some dramatic slowdown in execution times even when not doing symbolic execution so that&#8217;s an issue that will have to be addressed but this looks like it could be a great project. With the combination of docs and well thought out design I&#8217;m hoping for the PIN of symbolic execution tools.</p>
<p>In the later part of their talk they gave some feedback to the SMT developer community with suggestions for improvements. For example, 30% of the time spent within their solver (STP) was spent in memory allocation routines.  It&#8217;s something I haven&#8217;t seen a whole lot written on but the type of work that SMT engines is probably specific enough to require carefully crafted memory allocation algorithms. It&#8217;ll be interesting to see what comes of this in the future. </p>
<p><b>CVC3 and Applications (Clark Barrett)</b></p>
<p>Clark Barrett has been involved in SMT solver development for probably as long as anyone else and as <a href="http://www.cs.nyu.edu/acsys/cvc3/">CVC3</a> is the solver used internally in Immunity Debugger this talk was of particular interest. Clark mentioned that CVC4 is in development and should be seeing a release sometime this year so that&#8217;s good news. We&#8217;ve had some issues with CVC3 dealing with large array constraints and as this is being redone it should hopefully fare a bit better. </p>
<p>Unrelated to CVC3 really but one of the comments at the end was kind of striking in that the person said they often found using the theory of linear integer arithmetic with constraints to represent the bounded nature of 32-bit values faster than the theory of bitvectors. I guess that has something to do with their application area and the kinds of constraints if they&#8217;re not heavy on bit-level operations but it was something I&#8217;ve never thought to do before.  </p>
<p><b>CEGAR+SMT: Formal Verification of Control Logic in the Reveal System (Karem Sakallah)</b></p>
<p>Karem Sakallah was one of the most entertaining speakers of the day and also presented some interesting ideas behind a verification system based on model checking and Counter Example Guided Abstraction Refinement (CEGAR) that is currently being used to verify real hardware. This was the second talk of the day in which abstraction and refinement using uninterpreted functions were discussed to make difficult problems more tractable (the first being the one by the MSR guys). In this talk Karem also mentioned that naive refinement was not sufficient. So, typically what happens is that when a SAT result turns out to be a false positive a constraint is generated that will block that result from being given again and this is added to the global state. To alleviate this some post-processing is done on the generated formula. They weaken the conditions so that it entails an entire family of states. For example, if the condition was (a == 5 AND b == 6) they weaken it to (a &lt; b). I have no idea how they prevent this weakening from excluding valid states so I&#039;ll need to follow up on that tomorrow =D</p>
<p>A final point made was that throughout their development process they built a number of optimisations but discovering the best combination of these optimisations was a trial/error process. The graph shown for this varied from combinations of optimisations that had no effect at all to some that cut the execution time to minuscule fractions of the base case.  </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/seanhn.wordpress.com/783/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/seanhn.wordpress.com/783/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/seanhn.wordpress.com/783/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/seanhn.wordpress.com/783/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/seanhn.wordpress.com/783/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/seanhn.wordpress.com/783/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/seanhn.wordpress.com/783/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/seanhn.wordpress.com/783/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/seanhn.wordpress.com/783/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/seanhn.wordpress.com/783/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/seanhn.wordpress.com/783/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/seanhn.wordpress.com/783/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/seanhn.wordpress.com/783/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/seanhn.wordpress.com/783/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=783&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://seanhn.wordpress.com/2011/06/15/satsmt-summer-school-2011-summary-day-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/72a292ee63247e7ef61caf1c8c5e18b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">seanhn</media:title>
		</media:content>
	</item>
		<item>
		<title>SAT/SMT Summer School 2011 Summary (Day 1)</title>
		<link>http://seanhn.wordpress.com/2011/06/13/satsmt-summer-school-2011-summary/</link>
		<comments>http://seanhn.wordpress.com/2011/06/13/satsmt-summer-school-2011-summary/#comments</comments>
		<pubDate>Mon, 13 Jun 2011 03:53:21 +0000</pubDate>
		<dc:creator>seanhn</dc:creator>
				<category><![CDATA[SAT/SMT Summer School 2011]]></category>

		<guid isPermaLink="false">http://seanhn.wordpress.com/?p=768</guid>
		<description><![CDATA[This week I&#8217;m attending the first SAT/SMT Summer School, organised by Vijay Ganesh and hosted at MIT. There are plenty of interesting talks, organised into three categories, so I figured it might be useful to do a brief summary of each day with some links to relevant material. I&#8217;ll update this post as the week [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=768&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This week I&#8217;m attending the first <a>SAT/SMT Summer School</a>, organised by Vijay Ganesh and hosted at MIT. There are <a href="https://wikis.mit.edu/confluence/display/satsmtschool11/SATSMT+Summer+School+2011">plenty of interesting talks, organised into three categories</a>, so I figured it might be useful to do a brief summary of each day with some links to relevant material. I&#8217;ll update this post as the week progresses.</p>
<p><b> Introduction to Satisfiability Solving with Practical Applications (Niklas Een)</b></p>
<p>The first day of talks was a preliminary day, providing introductions to the foundations of the SAT and SMT problems and a quick history of how the research areas have progressed. Niklas Een, one of the <a href="http://minisat.se/">MiniSAT</a> developers, opened the technical part of the conference discussing the history of automated approaches to SAT. He then moved on to an overview of the algorithms that form the core of most SAT solvers. In Niklas&#8217; opinion the most important algorithms in terms of their impact on the ability of SAT solvers have been those for conflict clause analysis and variable activity tracking. Perhaps an obvious point, but one worth making was that often industrial SAT problems are trivial once you know what part of the graph to explore and hence why algorithms for locating this are quite important. One idea that was reiterated multiple times was that the development of SAT solvers is largely an experimental science and that even though we can give intuitions as to why certain algorithms are useful, in many cases nobody has a clue what the true reason is, not even the algorithms inventors. </p>
<p><b>SMT Theory and DPLL(T) (Albert Oliveras)</b></p>
<p>The second talk was by Albert Oliveras and provided an introduction to SMT solving. After a quick discussion on the motivation for SMT solvers Albert gave an overview of both the lazy and eager approaches to SMT solving. The eager approach, embodied in solvers like <a href="http://sites.google.com/site/stpfastprover/">STP</a>, involves directly encoding the expressions over different theories into a SAT problem and then using a SAT solver on that problem. The lazy approach, found in <a href="http://research.microsoft.com/en-us/um/redmond/projects/z3/">Z3</a> among others, has two distinct systems &#8211; a SAT solver and one or more theory specific solvers, and proceeds by finding SAT solutions to a propositional skeleton and using theory specific solvers to check for consistency of the returned model in their domains. Albert also provided a good high level summary of how these theory specific solvers share information. The slides from this talk can be found <a href="http://www.lsi.upc.edu/~oliveras/TDV/intro-SMT.pdf">here</a> and are definitely worth a look for an introduction to the topic. </p>
<p><b>SAT Solvers for Formal Verification (Ed Clarke)</b></p>
<p>After lunch Ed Clarke continued the introductions with a basic summary of bounded model checking (BMC) and linear temporal logic (LTL). BMC has progressed over the years from initially using basic data structures to explicitly represent states to the use of binary decision diagrams and on to SAT encodings. Its an interesting topic and there are many parallels and cross overs between modern versions of these algorithms (and CEGAR) and symbolic execution based approaches to verification. The second part of Ed&#8217;s talk was on current research his students are doing into the use of bounded model on systems that have properties modelled by differential equations. I apparently didn&#8217;t pay enough attention in calculus classes because most of this went over my head =) The work is being led by Sicun Gao so if you&#8217;re interested his <a href="http://www.cs.cmu.edu/~sicung/">research page</a> is probably helpful.</p>
<p><b>SMT-LIB Initiative (Cesare Tinelli)</b></p>
<p>Following this, Cesare Tinelli talked about the <a href="http://www.smtlib.org/">SMT-LIB</a> and <a href="http://www.smtcomp.org">SMT-COMP</a> initiatives. It was interesting to hear the story behind how it started. One thing of note that Cesare said is that back when this was started it was incredibly hard to tell what algorithms and extensions to SAT solvers were truly useful. This was because of unstandardised reporting and also because the tendency of developers to report on the benchmarks that their tools performed particularly well on. This is relevant I think because we are at a similar stage with symbolic execution and security focused tools. It&#8217;s hard to tell what really works as sometimes the tools aren&#8217;t released and when they are it can be quite difficult to recreate the authors results. It might be useful for some sort of standardised benchmark/testing suite with a focus on modern problems, especially as people move into other areas like automatic exploit generation. </p>
<p>An interesting discussion broke out near the end on the usefulness of the SMT-LIB standard as an input mechanism for tools given that it is not designed for efficient storage and so writing large problems to disk to read into a solver isn&#8217;t feasible for many cases in symbolic execution. The solution here seems to be to simply embed a solver and use its C/OCaml/etc API but that does somewhat nullify the usefulness of the SMT-LIB language as anything more than a standard and teaching tool. It might be interesting to see a version of the language developed specifically with the goal of efficient storage and parsing though. </p>
<p><b>Constraint Solving Challenges in Dynamic Symbolic Execution (Cristian Cadar)</b></p>
<p>The final talk of the day was by Christian Cadar on the technologies behind EXE and <a href="http://klee.llvm.org/">KLEE</a>, with a focus on the problems of solving the types of constraints generated. Using the example of constraints over arrays, Christian made the point that it can be useful to modify a solvers algorithms for a particular theory for domain specific cases. The graph presented showed a reduction from something near exponential slowdown to linear slowdown by removing the array axioms typically added for every array based operation (and one other thing that escapes me right now!) and thus checking an under-approximation of the original formula. </p>
<p>These tools are essentially mixed symbolic/concrete execution tools with a focus on bug finding, equivalence checking and so forth. KLEE is open source, which is a definite plus and worth checking out. I&#8217;d love to hear some feedback from people on its performance as I&#8217;ve ran it a few times and haven&#8217;t really gotten results that are as useful as fuzzing or manual auditing for bug finding. I think for tools such as this a large set of standardised benchmarks in modern applications could be very useful for gauging progress and focusing research efforts. Apparently Microsoft Research have found this approach very useful in their <a href="http://research.microsoft.com/en-us/um/people/pg/">SAGE</a> framework but it&#8217;s hard to tell if it can be used generally when you don&#8217;t have a data-center and a dedicated research group. </p>
<p>And that was it! Fun day and tomorrow looks awesome =)</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/seanhn.wordpress.com/768/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/seanhn.wordpress.com/768/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/seanhn.wordpress.com/768/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/seanhn.wordpress.com/768/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/seanhn.wordpress.com/768/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/seanhn.wordpress.com/768/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/seanhn.wordpress.com/768/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/seanhn.wordpress.com/768/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/seanhn.wordpress.com/768/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/seanhn.wordpress.com/768/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/seanhn.wordpress.com/768/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/seanhn.wordpress.com/768/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/seanhn.wordpress.com/768/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/seanhn.wordpress.com/768/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=768&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://seanhn.wordpress.com/2011/06/13/satsmt-summer-school-2011-summary/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/72a292ee63247e7ef61caf1c8c5e18b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">seanhn</media:title>
		</media:content>
	</item>
		<item>
		<title>Infiltrate 2011 Slides</title>
		<link>http://seanhn.wordpress.com/2011/05/10/infiltrate-2011-slides/</link>
		<comments>http://seanhn.wordpress.com/2011/05/10/infiltrate-2011-slides/#comments</comments>
		<pubDate>Tue, 10 May 2011 03:11:11 +0000</pubDate>
		<dc:creator>seanhn</dc:creator>
				<category><![CDATA[Infiltrate 2011]]></category>
		<category><![CDATA[Presentations]]></category>
		<category><![CDATA[WebKit]]></category>

		<guid isPermaLink="false">http://seanhn.wordpress.com/?p=756</guid>
		<description><![CDATA[The slides for most of the talks from this years Infiltrate have gone online! Among them you can find the slide deck for Attacking the WebKit Heap, which Agustin and I gave. The talks were awesome and I&#8217;d recommend grabbing them all. Halvar&#8217;s, titled State Spaces and Exploitation, was one of my favourites. I generally [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=756&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The slides for most of the talks from this years Infiltrate have gone <a>online</a>! Among them you can find the slide deck for <a href="http://www.immunityinc.com/infiltrate/presentations/webkit_heap.pdf">Attacking the WebKit Heap</a>, which Agustin and I gave. </p>
<p>The talks were awesome and I&#8217;d recommend grabbing them all. Halvar&#8217;s, titled <a href="http://www.immunityinc.com/infiltrate/presentations/Fundamentals_of_exploitation_revisited.pdf">State Spaces and Exploitation</a>, was one of my favourites. I generally believe that the reason we in industry, as well as university based research groups, sometimes fail to build better tools is because we don&#8217;t spend enough time reflecting on the nature of exploitation and bug finding and as a result end up solving the wrong problems. Halvar spent about half of his talk addressing one way to think about exploits, which is programming a &#8216;weird machine&#8217; that lives inside a program and is unlocked by a bug trigger. It&#8217;s an interesting way to look at things and, as he mentioned, similar to how many of us think of exploit development when it comes down to it. </p>
<p>As far as I know, we&#8217;ll only be releasing audio for one of the talks, Nico&#8217;s keynote on Strategic Surprise, which can be found <a href="http://seclists.org/dailydave/2011/q2/50">here</a>. Also worth checking out, educational, funny and just a little bit troll-y &#8230; what more can you ask =D</p>
<p><i>&#8220;We don&#8217;t care about nulls because this ain&#8217;t no strcpy shit&#8221;</i> &#8211; Ryan Austin, by consensus the best quote of the conference.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/seanhn.wordpress.com/756/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/seanhn.wordpress.com/756/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/seanhn.wordpress.com/756/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/seanhn.wordpress.com/756/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/seanhn.wordpress.com/756/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/seanhn.wordpress.com/756/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/seanhn.wordpress.com/756/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/seanhn.wordpress.com/756/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/seanhn.wordpress.com/756/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/seanhn.wordpress.com/756/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/seanhn.wordpress.com/756/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/seanhn.wordpress.com/756/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/seanhn.wordpress.com/756/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/seanhn.wordpress.com/756/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=756&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://seanhn.wordpress.com/2011/05/10/infiltrate-2011-slides/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/72a292ee63247e7ef61caf1c8c5e18b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">seanhn</media:title>
		</media:content>
	</item>
		<item>
		<title>Finding Optimal Solutions to Arithmetic Constraints</title>
		<link>http://seanhn.wordpress.com/2011/05/08/finding-optimal-solutions-to-arithmetic-constraints/</link>
		<comments>http://seanhn.wordpress.com/2011/05/08/finding-optimal-solutions-to-arithmetic-constraints/#comments</comments>
		<pubDate>Sun, 08 May 2011 20:33:15 +0000</pubDate>
		<dc:creator>seanhn</dc:creator>
				<category><![CDATA[Bug hunting]]></category>
		<category><![CDATA[SMT solving]]></category>
		<category><![CDATA[Static analysis]]></category>

		<guid isPermaLink="false">http://seanhn.wordpress.com/?p=684</guid>
		<description><![CDATA[This post is a follow on to my previous ones on automatically determining variable ranges and on uses for solvers in code auditing sessions. In the first of those posts I showed how we can use the symbolic execution engine of ID to automatically model code and then add extra constraints to determine how it [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=684&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This post is a follow on to my previous ones on <a href="http://seanhn.wordpress.com/2010/10/15/determining-variable-ranges-part-i/">automatically determining variable ranges</a> and on <a href="http://seanhn.wordpress.com/2010/11/05/augment-your-auditing-with-a-theorem-prover/">uses for solvers in code auditing sessions</a>. In the first of those posts I showed how we can use the symbolic execution engine of ID to automatically model code and then add extra constraints to determine how it restricts the state space for certain variables. In the second I looked at one use case for manual modelling of code and proving properties about it as part of C++ auditing. </p>
<p>In this post I&#8217;m going to talk about a problem that lies between the previous two cases. That is, manually modelling code, but using Python classes provided by ID in a much more natural way than with the SMT-LIB language, and looking for optimal solutions to a problem rather than a single one or all possible solutions.</p>
<p>Consider the following code, produced by HexRays decompiler from an x86 binary. It was used frequently throughout the binary in question to limit the ranges allowed by particular variables. The first task is to verify that it does restrict the ranges of <tt>width</tt> and <tt>height</tt> as it is designed to. Its purpose is to ensure that <tt>v3 * height</tt> is less than 0&#215;7300000 where <tt>v3</tt> is derived from width. </p>
<p><tt></p>
<pre>
int __usercall check_ovf(int width, int height,
    int res_struct)
{
  int v3; // ecx@1

  v3 = ((img_width + 31) &gt;&gt; 3) &amp; 0xFFFFFFFC;
  *(_DWORD *)(res_struct + 12) = width;
  *(_DWORD *)(res_struct + 16) = height;
  *(_DWORD *)(res_struct + 20) = v3;
  if ( width &lt;= 0 || height &lt;= 0 ) // <b>1</b>
  {
    *(_DWORD *)(res_struct + 24) = 0;
    *(_DWORD *)(res_struct + 28) = 0;
  }
  else
  {
    if ( height * v3 &lt;= 0 || 120586240 / v3 &lt;= height ) // 2
      *(_DWORD *)(res_struct + 24) = 0;
    else
      *(_DWORD *)(res_struct + 24) = malloc_wrapper(res_struct,
                                       120586240 % v3,
                                       height * v3); // <b>3</b>
    *(_DWORD *)(res_struct + 28) = 1;
  }
  return res_struct;
</pre>
<p></tt></p>
<p>If the above code reaches the line marked as <b>3</b> a malloc call will occur with <tt>height * v3</tt> as the size argument. Can this overflow? Given the checks at <b>1</b> and <b>2</b> it&#8217;s relatively clear that this cannot occur but for the purposes of later tasks we will model and verify the code. </p>
<p>One of the things that becomes clear when using the SMT-LIB language (even version 2 which is considerably nicer than version 1) is that using it directly is still quite cumbersome. This is why in recent versions of Immunity Debugger we have added wrappers around the CVC3 solver that allow one to build a model of code using Python expressions (credit for this goes to Pablo who did an awesome job). This was one of the things we covered during the recent Master Class at Infiltrate and people found it far easier than using the SMT-LIB language directly. </p>
<p>Essentially, we have <tt>Expression</tt> objects that represent variables or concrete values and the operators on these expressions (+, -, %, &gt;&gt; etc) are over-ridden so that they make assertions on the solvers state. For example, if <tt>x</tt> and <tt>y</tt> are <tt>Expression</tt> objects then <tt>x + y</tt> is also an Expression object representing the addition of <tt>x</tt> and <tt>y</tt> in the current solver context. Using the <tt>assertIt()</tt> function of any Expression object then asserts that condition to hold. </p>
<p>With this in mind, we can model the decompiled code in Python as follows:</p>
<p><tt></p>
<pre>
import sys
import time

sys.path.append('C:\\Program Files\\Immunity Inc\\Immunity Debugger\\Libs\\x86smt')

from prettysolver import Expression
from smtlib2exporter import SmtLib2Exporter

def check_sat():
    img_width = Expression("img_width", signed=True)
    img_height = Expression("img_height", signed=True)
    tmp_var = Expression("tmp_var", signed=True)

    const = Expression("const_val")
    (const == 0x7300000).assertIt()

    (img_width &gt; 0).assertIt()
    (img_height &gt; 0).assertIt()

    tmp_var = ((img_width + 31) &gt;&gt; 3) &amp; 0xfffffffc
    (img_height * tmp_var &gt; 0).assertIt()
    (const / tmp_var &gt; img_height).assertIt()

    expr = (((tmp_var * img_height) &amp;
            0xffffffff000000000) != 0)  # <b>1</b>
    expr.assertIt()

    s = SmtLib2Exporter()
    s.dump_to_file(expr, 'test.smt2') # <b>2</b>

    # After this we can check with z3 /smt2 /m test.smt2
    # Alternatively we can use expr.isSAT which calls CVC3 but it
    # is a much slower solver

    start_time = time.time()
    if expr.isSAT():
        print 'SAT'
        print expr.getConcreteModel()
    else:
        print 'UNSAT'

    print 'Total run time: %d seconds' % (time.time() - start_time)

if __name__ == '__main__':
    check_sat()
</pre>
<p></tt></p>
<p>The above code (which can be run from the command-line completely independently of Immunity Debugger) models the parts of the decompiled version that we care about. The added condition, marked as <b>1</b> checks for integer overflow by performing a 64-bit multiplication and then checking if the upper 32 bits are 0 or not. The first thing to note about this code is that it models the decompiled version quite naturally and is far easier to write and understand than the SMT-LIB alternative. This makes this kind of approach to analysing code much more tractable and means that once you are familiar with the API you can model quite large functions in very little time. For example, asserting that the condition <tt>if ( height * v3 &lt;= 0 || 120586240 / v3 &lt;= height )</tt> must be false translates to the following, which is syntactically quite close to the C code:</p>
<p><tt></p>
<pre>
tmp_var = ((img_width + 31) &gt;&gt; 3) &amp; 0xfffffffc
(img_height * tmp_var &gt; 0).assertIt()
(const / tmp_var &gt; img_height).assertIt()
</pre>
<p></tt></p>
<p>Checking if the function does in fact prevent integer overflow is then simple.</p>
<div id="attachment_706" class="wp-caption aligncenter" style="width: 510px"><a href="http://seanhn.files.wordpress.com/2011/05/check_ovf.png"><img src="http://seanhn.files.wordpress.com/2011/05/check_ovf.png?w=500&#038;h=103" alt="" title="Check Overflow" width="500" height="103" class="size-full wp-image-706" /></a><p class="wp-caption-text">Using the solver to check if an overflow is possible on the argument to malloc</p></div>
<p>So, modulo modelling errors on our behalf, the check is safe and prevents an overflow on the size argument to malloc*. So what now? Well, in the case of this particular code-base an interesting behaviour appeared later in the code if the product of <tt>width</tt> and <tt>height</tt> is sufficiently large and the above function succeeded in allocating memory. That is, the height and width were small enough such that <tt>height * v3</tt> was less than 0&#215;7300000 but due to multiplication with other non-constants later in the code may then overflow. The question we then want to answer is, what is the maximum value of <tt>image * height</tt> that can be achieved that also passes the above check?</p>
<p><b>Solving Optimisation Problems with Universal Quantification**</b></p>
<p>This problem is essentially one of optimisation. There are many assignments to the input variables that will pass the overflow check but we are interested in those that maximise the resulting product <tt>image</tt> and <tt>height</tt>. Naturally this problem can be solved on paper with relative ease for small code fragments but with longer, more complex code this approach quickly becomes an more attractive.  </p>
<p>The first thing to note is that at the line marked as <b>2</b> in the above Python code we used a useful new feature of ID, the <tt>SmtLib2Exporter</tt>***, to dump the model constructed in CVC3 out to a file in SMT-LIB v2 syntax. This is useful for two reasons, firstly we can use a solver other than CVC3, e.g. Z3 which is much faster for most problems, and secondly we can manually modify the formula to include things that our Python wrapper currently doesn&#8217;t have, such as universal quantification. </p>
<p>Universal quantification, normally denoted by the symbol ∀ and the dual to existential quantification, is used to apply a predicate to all members of a set. e.g. ∀x ∈ N.P(x) states that for all elements x of the natural numbers some predicate P holds. Assume that the conditions of the integer overflow check are embodied in a function called <tt>sat_inputs</tt> and M is the set of natural numbers module 2^32 then the formula that we want to check is <b>(sat_inputs(x, y) =&gt; (∀ a, b ∈ M | sat_inputs(a, b), x * y &gt;= a * b))</b>, that is that we consider <tt>x</tt> and <tt>y</tt> to be solutions if <tt>x</tt> and <tt>y</tt> satisfy the conditions of <tt>sat_inputs</tt> implies that the product <tt>x * y</tt> is greater or equal to the product of any other two values <tt>a</tt> and <tt>b</tt> that also satisfy <tt>sat_inputs</tt>. This property is encoded in the function <tt>is_largest</tt> in the following SMT-LIB v2 code. The rest of the code is dumped by the previous Python script so checking this extra condition was less than 5 lines of work for us. The details of <tt>sat_inputs</tt> has been excluded for brevity. It simply encodes the semantics the integer overflow checking code. </p>
<p><tt></p>
<pre>
(declare-funs ((img_width BitVec[32])(img_height BitVec[32])))

(define-fun sat_inputs ((img_width BitVec[32])(img_height BitVec[32])) Bool
    (and
         ; Model of the code goes here
    )
)

(define-fun is_largest ((i BitVec[32])(j BitVec[32])) Bool
    (forall ((a BitVec[32]) (b BitVec[32]))
        (implies (sat_inputs a b)
            (bvsge (bvmul i j) (bvmul a b))
        )
    )
)

(assert (and
    (sat_inputs img_width img_height)
    (is_largest img_width img_height)
    )
)

(check-sat)
(get-info model)
</pre>
<p></tt><br />
<div id="attachment_724" class="wp-caption aligncenter" style="width: 510px"><a href="http://seanhn.files.wordpress.com/2011/05/max_val_sat.png"><img src="http://seanhn.files.wordpress.com/2011/05/max_val_sat.png?w=500&#038;h=112" alt="" title="Maximising the product" width="500" height="112" class="size-full wp-image-724" /></a><p class="wp-caption-text">Finding the maximum product of height and width</p></div></p>
<p>Running this through Z3 takes 270 seconds (using universal quantification results in a significant increase in the problem size) and we are provided with an assignment to the height and width variables that not only pass the checks in the code but are guaranteed to provide a maximal product. The end result is that with the above two inputs <tt>height * width</tt> is 0x397fffe0, which is guaranteed to be the maximal product, and <tt>height * (((width + 31) &gt;&gt; 3) &amp; 0xfffffffc)</tt> is 0x72ffffc, as you would expect, which is less than 0&#215;7300000 and therefore satisfies the conditions imposed by the code. Maximising or minimising other variables or products is similarly trivial, although for such a small code snippet not particularly interesting (Even maximising the product of height and width can be done without a solver in your head pretty easily but instructive examples aren&#8217;t meant to be rocket science). </p>
<p>This capability becomes far more interesting on larger or more complex functions and code paths. In such cases the ability to use a solver as a vehicle for precisely exploring the state space of a program can mean the difference between spotting a subtle bug and missing out.</p>
<p><b>Conclusion</b><br />
By its nature code auditing is about tracking state spaces. The task is to discover those states implied by the code but not considered by the developer. In the same way that one may look at a painting and discover meaning not intended by the artist, an exploit developer will look at a program and discover a shadow-program, not designed or purposefully created, but in existence nonetheless. In places this shadow-program is thick, it is easily discovered, has many entry points and can be easily leveraged to provide exploit primitives. In other places, this shadow-program clings to the intended program and is barely noticeable. An accidental decrement here, an off-by-one bound there. Easy to miss and perhaps adding few states to the true program. It is from these cases that some of the most entertaining exploits derive. From state spaces that are barren and lacking in easily leveragable primitives. Discovering such gateway states, those that move us from the intended program to its more enjoyable twin, is an exercise in precision. This is why it continues to surprise me that we have such little tool support for truly extending our capacity to deal with massive state spaces in a precise fashion.</p>
<p>Of course we have some very useful features for making a program easier to manually analyse, among them HexRays decompiler and IDA&#8217;s various features for annotating and shaping a disassembly, as well as plugin architectures for writing your own tools with Immunity Debugger, IDA and others. What we lack is real, machine driven, assistance in determining the state space of a program and, dually, providing reverse engineers with function and basic block level information on how a given chunk of code effects this state space. </p>
<p>While efforts still need to be made to develop and integrate automation technologies into our workflows I hope this post, and the others, have provided some motivation to build tools that not only let us analyse code but that help us deal with the large state spaces underneath. </p>
<p><i><br />
* As a side note, Z3 solves these constraints in about half a second. We hope to make it our solving backend pretty soon for obvious reasons.<br />
** True optimisation problems in the domain of satisfiability are different and usually fall under the heading of MaxSAT and OptSAT. The former deals with maximising the number of satisfied clauses while the latter assigns weights to clauses and looks for solutions that minimise or maximise the sum of these weights. We are instead dealing with optimisation within the variables of the problem domain. OptSAT might provide interesting solutions for automatic gadget chaining though and is a fun research area.<br />
*** This will be in the next release which should be out soon. If you want it now just drop me a mail.</p>
<p>Thanks to Rolf Rolles for initially pointing out the usefulness of Z3&#8242;s support for universal/existential quantification for similar problems.<br />
</i></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/seanhn.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/seanhn.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/seanhn.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/seanhn.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/seanhn.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/seanhn.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/seanhn.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/seanhn.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/seanhn.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/seanhn.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/seanhn.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/seanhn.wordpress.com/684/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/seanhn.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/seanhn.wordpress.com/684/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=684&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://seanhn.wordpress.com/2011/05/08/finding-optimal-solutions-to-arithmetic-constraints/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/72a292ee63247e7ef61caf1c8c5e18b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">seanhn</media:title>
		</media:content>

		<media:content url="http://seanhn.files.wordpress.com/2011/05/check_ovf.png" medium="image">
			<media:title type="html">Check Overflow</media:title>
		</media:content>

		<media:content url="http://seanhn.files.wordpress.com/2011/05/max_val_sat.png" medium="image">
			<media:title type="html">Maximising the product</media:title>
		</media:content>
	</item>
		<item>
		<title>Exploit Necromancy in TCMalloc &#8211; Reviving the 4-to-N Byte Overflow Primitive with Insert to FreeList[X]</title>
		<link>http://seanhn.wordpress.com/2011/04/14/exploit-necromancy-in-tcmalloc-reviving-the-4-to-n-byte-overflow-primitive-with-insert-to-freelistx/</link>
		<comments>http://seanhn.wordpress.com/2011/04/14/exploit-necromancy-in-tcmalloc-reviving-the-4-to-n-byte-overflow-primitive-with-insert-to-freelistx/#comments</comments>
		<pubDate>Thu, 14 Apr 2011 04:38:52 +0000</pubDate>
		<dc:creator>seanhn</dc:creator>
				<category><![CDATA[Chrome]]></category>
		<category><![CDATA[Heap Exploitation]]></category>
		<category><![CDATA[Infiltrate 2011]]></category>
		<category><![CDATA[TCMalloc]]></category>
		<category><![CDATA[WebKit]]></category>

		<guid isPermaLink="false">http://seanhn.wordpress.com/?p=584</guid>
		<description><![CDATA[A couple of months back while looking into a heap overflow in Chrome* I found myself poking around in the internals of TCMalloc. What I found there was pretty interesting. Perhaps the implementers have been watching Twitter and decided to take a proactive approach to security researchers moaning about the difficulties of modern heap exploitation. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=584&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A couple of months back while looking into a heap overflow in Chrome* I found myself poking around in the internals of <a href="http://goog-perftools.sourceforge.net/doc/tcmalloc.html">TCMalloc</a>. What I found there was pretty interesting. Perhaps the implementers have been watching Twitter and decided to take a proactive approach to security researchers <a href="http://twitter.com/comex/status/45532201789558784">moaning</a> about the <a href="http://twitter.com/0xcharlie/status/44626990627688448">difficulties</a> of modern heap exploitation. Maybe they just got caught up in making a blazing fast allocator and couldn&#8217;t bring themselves to slow it down with nasty things like integrity checks. Either way, TCMalloc provides a cosy environment for heap exploits to thrive and is worth fully exploring to discover the possibilities it provides. </p>
<p>At <a href="https://www.immunityinc.com/infiltrate/speakers.html#heelan">Infiltrate</a> next weekend, Agustin Gianni and I will be discussing (among other things) TCMalloc from the point of view of exploitation. Essentially our research into TCMalloc wasn&#8217;t so much &#8216;research&#8217; as it was vulnerability necromancy. In the name of speed TCMalloc has forsaken almost any type of sanity checks you might think of. There are annoyances of course but they are artefacts of the algorithms rather than coherent attempts to detect corruption and prevent exploitation. As a result we can revive a number of primitives from heap exploits past as well as some unique to TCMalloc. </p>
<p>Like most custom allocators, TCMalloc operates by requesting large chunks of memory from the operating system via mmap, sbrk or VirtualAlloc and then uses its own methods to manage these chunks on calls to malloc, free, new, delete etc. Agustin wrote a high level overview <a href="http://gruba.blogspot.com/2011/03/webkit-heap-exploitation-rise-of-undead.html">here</a> and Google&#8217;s <a href="http://goog-perftools.sourceforge.net/doc/tcmalloc.html">project page</a> is also useful for gaining a general idea of its workings. In this post I&#8217;m not going to go into the exact details of how the memory is managed (come see our presentation for that!) but instead briefly discuss how thread local free lists function and one useful exploit primitive we gain from it. </p>
<p><b>ThreadCache FreeList allocation</b><br />
The front end allocator for TCMalloc (with TC standing for ThreadCache) consists of <tt>kNumClasses</tt>** per-thread free lists for allocations of size &lt; 32768. The FreeLists store chunks of the same size. The bucket sizes are generated in <tt>SizeMap::Init</tt> and allocations that don&#8217;t map directly to a bucket size are simply rounded up to the next size. For allocations larger than 32768 the front end allocator is the PageHeap which stores Spans (runs of contiguous pages) and is not thread specific. The following code shows the interface to the allocator through a call to <tt>malloc</tt>***.</p>
<p><tt>
<pre>
3604 static ALWAYS_INLINE void* do_malloc(size_t size) {
3605   void* ret = NULL;
3606
...
3611   // The following call forces module initialization
3612   TCMalloc_ThreadCache* heap = TCMalloc_ThreadCache::GetCache();
...
3621   if (size &gt; kMaxSize) {
3622     // Use page-level allocator
3623     SpinLockHolder h(&amp;pageheap_lock);
3624     Span* span = pageheap-&gt;New(pages(size));
3625     if (span != NULL) {
3626       ret = SpanToMallocResult(span);
3627     }
3628   } else {
3629     // The common case, and also the simplest.  This just pops the
3630     // size-appropriate freelist, afer replenishing it if it's empty.
3631     ret = CheckedMallocResult(heap-&gt;Allocate(size));
3632   }
...
</pre>
<p></tt></p>
<p>At line 3612 the <tt>ThreadCache</tt> pointer for the current thread is retrieved. This object contains a number of thread specific details but the one we are interested in is the <tt>FreeList</tt> array. A FreeList object contains some metadata and a singly-linked list of free chunks that are managed by very simple primitives such as <tt>SLL_Pop</tt>, <tt>SLL_Push</tt> etc. If the allocation size is less than 32768 then the following code is called to retrieve a chunk of the required size.</p>
<p><tt>
<pre>
2888 ALWAYS_INLINE void* TCMalloc_ThreadCache::Allocate(size_t size) {
2889   ASSERT(size &lt;= kMaxSize);
2890   const size_t cl = SizeClass(size);
2891   FreeList* list = &amp;list_[cl];
2892   size_t allocationSize = ByteSizeForClass(cl);
2893   if (list-&gt;empty()) {
...
2896   }
2897   size_ -= allocationSize;
2898   return list-&gt;Pop();
2899 }
</pre>
<p></tt></p>
<p>The correct FreeList is retrieved at line 2891 and presuming that it is not empty we retrieve a chunk by calling the <tt>Pop</tt> method. This results in a call to <tt>SLL_Pop</tt> with the address of the pointer to the head of the free list. </p>
<p><tt>
<pre>
 761 static inline void *SLL_Next(void *t) {
 762   return *(reinterpret_cast(t));
 763 }
...
 774 static inline void *SLL_Pop(void **list) {
 775   void *result = *list;
 776   *list = SLL_Next(*list);
 777   return result;
 778 }
</pre>
<p></tt></p>
<p><b>Insert to FreeList[X] &#8211; Reviving the 4-to-N byte Overflow Primitive</b><br />
The effect of <tt>SLL_Pop</tt> is through <tt>SLL_Next</tt> to follow the list head pointer and retrieve the DWORD there and then make that the new list head. Notice there are no checks of any kind on the value of the <tt>*list</tt> pointer. When I first saw a crash within TCMalloc it was at line 762 in the <tt>SLL_Next</tt> function. The value of <tt>t</tt> was 0&#215;41414141 which I had previously overflowed an allocated chunk with. What this means is that on that call to <tt>SLL_Pop</tt> the list head was an address that I controlled. How could this happen? Consider the following FreeList layout:</p>
<div id="attachment_591" class="wp-caption aligncenter" style="width: 510px"><a href="http://seanhn.files.wordpress.com/2011/03/freelist.png"><img src="http://seanhn.files.wordpress.com/2011/03/freelist.png?w=500&#038;h=83" alt="" title="FreeListExample" width="500" height="83" class="size-full wp-image-591" /></a><p class="wp-caption-text">FreeList example</p></div>
<p>The above FreeList has 3 chunks. For simplicity assume that the addresses of <b>A</b> and <b>B</b> are contiguous****. On the first call to <tt>malloc</tt> chunk <b>A</b> is returned to the application. </p>
<div id="attachment_593" class="wp-caption aligncenter" style="width: 510px"><a href="http://seanhn.files.wordpress.com/2011/03/freelist_allocate2.png"><img src="http://seanhn.files.wordpress.com/2011/03/freelist_allocate2.png?w=500&#038;h=108" alt="" title="FreeList_allocate2" width="500" height="108" class="size-full wp-image-593" /></a><p class="wp-caption-text">FreeList allocation</p></div>
<p>Assume an overflow then occurs when the application writes data to <b>A</b> and <b>B</b> is partially corrupted. </p>
<p><div id="attachment_595" class="wp-caption aligncenter" style="width: 510px"><a href="http://seanhn.files.wordpress.com/2011/03/freelist_ovf1.png"><img src="http://seanhn.files.wordpress.com/2011/03/freelist_ovf1.png?w=500&#038;h=144" alt="" title="FreeList_ovf1" width="500" height="144" class="size-full wp-image-595" /></a><p class="wp-caption-text">Overflow of Chunk A into Chunk B</p></div><br />
The first DWORD of <b>B</b> is the pointer to the next chunk in the FreeList and so we have corrupted the singly-linked list. The first chunk is now <b>B</b> and, as we control the first DWORD of this chunk, we control what the next chunk in the FreeList will be. The chunk <b>C</b> is effectively no longer part of the FreeList. The next allocation of this size will then return the chunk <b>B</b> (presuming no free calls on chunks of this size, within this thread, have occured. Free chunks are prepended to the head of the list) and it will give us control of the list head pointer when <tt>*list = SLL_Next(*list)</tt> executes.</p>
<div id="attachment_597" class="wp-caption aligncenter" style="width: 510px"><a href="http://seanhn.files.wordpress.com/2011/03/freelist_ovf2.png"><img src="http://seanhn.files.wordpress.com/2011/03/freelist_ovf2.png?w=500&#038;h=155" alt="" title="FreeList_ovf2" width="500" height="155" class="size-full wp-image-597" /></a><p class="wp-caption-text">Allocation from a corrupted FreeList</p></div>
<p>As we control the DWORD at <tt>**list</tt> we control what the new list head is. In my initial TCMalloc crash this set the list head to 0&#215;41414141. One more allocation of this size will then call <tt>SLL_Pop</tt> with a controlled list head pointer. The crash I encountered was when <tt>SLL_Next</tt> attempted to read the next pointer at 0&#215;41414141.</p>
<p>The important part of all this is that by controlling the list head pointer we control the address of the chunk returned and can give back any usable memory region we want to the application. So after one more allocation the situation is as follows:</p>
<div id="attachment_606" class="wp-caption aligncenter" style="width: 510px"><a href="http://seanhn.files.wordpress.com/2011/03/freelist_ovf3.png"><img src="http://seanhn.files.wordpress.com/2011/03/freelist_ovf3.png?w=500&#038;h=144" alt="" title="FreeList_ovf3" width="500" height="144" class="size-full wp-image-606" /></a><p class="wp-caption-text">A memory region of our choosing is handed back to the application </p></div>
<p>The result of this allocation is a pointer which we control. It is important to note that the first DWORD at this address then becomes the new list head. For a clean exploit it is desirable for either this to be 0&#215;0 or the FreeList list <tt>length_</tt> attribute to be 0 after returning our controlled pointer. If we cannot force either of these conditions then future allocations of this size will follow this DWORD, which may not be under our control, and may result in instability in the program before we gain code execution. Another point to note is that if a free call occurs on a pointer that has not previously been noted by TCMalloc as part of a memory region under its control then it will raise an exception that will likely terminate the program. This is important to keep in mind if the memory location we are inserting into the free list is not controlled by TCMalloc. If this is the case then we have to prevent this address from being passed to <tt>free</tt> before we gain control of the programs execution.</p>
<p>To gain code execution from this primitive it is necessary to find the address of some useful structure to hand back to the application and then have it overwrite that structure with data that we control. For modern systems this will require a memory leak of some kind to find such an address reliably (although heap spraying of useful objects might be an interesting alternative if we can write over one of these objects and then trigger a call through a function pointer or something similar). Despite this requirement, we have a functioning exploit primitive from a heap overflow and one that has not required us to deal with any heap integrity checks or protections. This is not a bad place to be in given the effort that has gone into securing the default allocators for most operating systems.The primitive basically gives the same effect as the <i>Insert to Lookaside</i> technique that was popular on Windows XP but has since been killed on Windows Vista and 7. </p>
<p><b>Conclusion</b><br />
In this post I&#8217;ve given a quick overview of how we can convert a 4-byte overflow into one that overflows N bytes in an application using TCMalloc. This type of technique has been seen in various places in the past, most notably when overflowing chunks in the Windows XP Lookaside list. It has been largely killed elsewhere but due to the lack of integrity checks in TCMalloc it lives on in a relatively easy to use fashion. </p>
<p>At Infiltrate Agustin and I will discuss a variety of topics related to exploiting WebKit and TCMalloc in much greater detail. The <i>Insert to FreeList[X]</i> technique is but one example of a simple and old heap exploitation tactic that has been revived. Many other exploitation vectors exist and with a grasp of how the allocator works the possibilities are interesting and varied. In our presentation we will describe some of these techniques and also the details of how TCMalloc manages the processes of allocation and deallocation such as is relevant to manipulating the heap for exploitation. </p>
<p><i><br />
* TCMalloc is not the allocator used by the Chrome Javascript engine V8. That has its own allocator that can be found in the <tt>src/v8/src/</tt> directory of the Chrome source. Have a look at <tt>spaces.[cc|h]</tt> and <tt>platform_X.cc</tt> to get started.<br />
** For WebKit in Safari <tt>kNumClasses</tt> is 68 but for Chrome it is 61. There seem to be some other differences between the various uses of TCMalloc and are worth keeping in mind. For example, in Chrome the maximum free list length is 8192 whereas in Safari is is 256.<br />
*** We&#8217;re looking at the TCMalloc code embedded in WebKit revision 79746 <tt>Source/JavaScriptCore/wtf/FastMalloc.cpp</tt>. It&#8217;s effectively the same as that in the Chrome release and elsewhere, modulo the previous point.<br />
**** FreeLists are created from Spans which are contiguous pages in memory that get subdivided to give the FreeList chunks. The chunks are prepended to the list during creation though so they end up in reverse address order. This means that if A, B are two contiguous addresses then the FreeList will initially be ordered as B -&gt; A. In order to get the desired ordering we need to rearrange the fresh FreeList by allocating B, allocating A, free&#8217;ing B and then free&#8217;ing A. This will result in the ordering A -&gt; B and another allocation from this FreeList will give back the address A.<br />
</i></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/seanhn.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/seanhn.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/seanhn.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/seanhn.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/seanhn.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/seanhn.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/seanhn.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/seanhn.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/seanhn.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/seanhn.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/seanhn.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/seanhn.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/seanhn.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/seanhn.wordpress.com/584/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=584&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://seanhn.wordpress.com/2011/04/14/exploit-necromancy-in-tcmalloc-reviving-the-4-to-n-byte-overflow-primitive-with-insert-to-freelistx/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/72a292ee63247e7ef61caf1c8c5e18b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">seanhn</media:title>
		</media:content>

		<media:content url="http://seanhn.files.wordpress.com/2011/03/freelist.png" medium="image">
			<media:title type="html">FreeListExample</media:title>
		</media:content>

		<media:content url="http://seanhn.files.wordpress.com/2011/03/freelist_allocate2.png" medium="image">
			<media:title type="html">FreeList_allocate2</media:title>
		</media:content>

		<media:content url="http://seanhn.files.wordpress.com/2011/03/freelist_ovf1.png" medium="image">
			<media:title type="html">FreeList_ovf1</media:title>
		</media:content>

		<media:content url="http://seanhn.files.wordpress.com/2011/03/freelist_ovf2.png" medium="image">
			<media:title type="html">FreeList_ovf2</media:title>
		</media:content>

		<media:content url="http://seanhn.files.wordpress.com/2011/03/freelist_ovf3.png" medium="image">
			<media:title type="html">FreeList_ovf3</media:title>
		</media:content>
	</item>
		<item>
		<title>Heap Scripts for TCMalloc with GDB&#8217;s Python API</title>
		<link>http://seanhn.wordpress.com/2011/03/30/heap-scripts-for-tcmalloc-with-gdbs-python-api/</link>
		<comments>http://seanhn.wordpress.com/2011/03/30/heap-scripts-for-tcmalloc-with-gdbs-python-api/#comments</comments>
		<pubDate>Wed, 30 Mar 2011 23:28:51 +0000</pubDate>
		<dc:creator>seanhn</dc:creator>
				<category><![CDATA[Heap Exploitation]]></category>
		<category><![CDATA[Infiltrate 2011]]></category>
		<category><![CDATA[TCMalloc]]></category>

		<guid isPermaLink="false">http://seanhn.wordpress.com/?p=628</guid>
		<description><![CDATA[When writing heap exploits it&#8217;s necessary to be able to view the heap state during debugging. As part of our work on TCMalloc, Agustin and I have written up some heap scripts for Immunity Debugger and GDB that make the process of tracking what TCMalloc is up to quite easy. There are quite a few [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=628&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>When writing heap exploits it&#8217;s necessary to be able to view the heap state during debugging. As part of our work on TCMalloc, Agustin and I have written up some heap scripts for Immunity Debugger and GDB that make the process of tracking what TCMalloc is up to quite easy. There are quite a few scripts that do similar things for ID and different heap implementations so in this post I&#8217;m going to focus on the GDB side of things. Recent versions of GDB contain an <a href="http://sourceware.org/gdb/wiki/PythonGdb">embedded Python interpreter</a> that provides easy access to the internals of an application being debugged. Being able to write scripts in Python to automate debugging tasks is really useful and hopefully the GDB guys will maintain the project. </p>
<p>The scripts I will be discussing in this post can be found <a href="http://localhostr.com/file/wBNwUx1/tcmalloc_gdb.tar">here</a>. Copy the <i>gdbinit</i> file your home directory or the one where you will be launching <tt>gdb</tt> from and rename it to <i>.gdbinit</i>. Modify the <tt>pyscripts_dir</tt> variable to point to the directory where you extracted the scripts. To load the commands run <tt>source /path/to/scripts/dump_free_list.py</tt> or <tt>source /path/to/scripts/search_free_lists.py</tt>. This will make the commands <tt>dump_free_list</tt> and <tt>search_free_lists</tt> available within <tt>gdb</tt>.</p>
<p>The data structures used by TCMalloc are relatively simple so the scripts have little to do besides reading memory and walking lists. Each thread in an application using TCMalloc has its own <tt>TCMalloc_ThreadCache</tt> object which contains information on the FreeLists for that specific thread. The FreeLists themselves are <tt>ThreadCache_FreeList</tt> objects, each of which contains a <tt>list_</tt> attribute that points to the head of a singly linked list of free chunks. We can access the TLS for a given thread via <tt>pthread_self()</tt> in GDB and find the <tt>ThreadCache_FreeList</tt> pointer at the offset -4 from there. The file <i>tcmalloc.py</i> contains abstractions for the <tt>ThreadCache_FreeList</tt> and <tt>TCMalloc_ThreadCache</tt> structures. In <i>dump_free_lists.py</i> we can see the initialisation of the <tt>ThreadCache</tt> abstraction via the following code:</p>
<p><tt>
<pre>
 24         tls = gdb.parse_and_eval("pthread_self()")
 25         # threadlocal_heap is at TLS - 4
 26         threadlocal_heap = buf_to_le(
 27             self.cur_proc.read_memory(tls - 4, DWORD_SIZE))
 28
 29         tc = ThreadCache(self.cur_proc, threadlocal_heap)
</pre>
<p></tt></p>
<p>The <tt>ThreadCache</tt> instance then provides access to the FreeLists for the current thread through <tt>getFreeLists</tt>. The size classes that TCMalloc uses to bucket chunks together for allocations less than 32768 in size are generated at run-time. To view them run <i>tcmalloc.py</i> outside of <tt>gdb</tt> with no arguments. </p>
<p>The number of size classes, and hence the number of FreeLists per thread, is dependent a constant that may change between applications embedding TCMalloc. For Chrome it is 61 and hence we have 61 different FreeLists per thread. If we run <tt>dump_free_lists</tt> within <tt>gdb</tt> with no arguments it will dump all free lists, their lengths and the chunk size that list is responsible for.<br />
<div id="attachment_635" class="wp-caption aligncenter" style="width: 492px"><a href="http://seanhn.files.wordpress.com/2011/03/freelist_dump1.png"><img src="http://seanhn.files.wordpress.com/2011/03/freelist_dump1.png?w=500" alt="" title="freelist_dump"   class="size-full wp-image-635" /></a><p class="wp-caption-text">TCMalloc FreeLists within Chrome on Linux</p></div></p>
<p>The <tt>getFreeLists</tt> function will return each of these FreeLists as a <tt>FreeList</tt> object. This object contains a <tt>list_ptr</tt> attribute that corresponds to the <tt>list_</tt> pointer found in the TCMalloc source for each FreeList. It points to a singly linked list of free chunks that we can iterate over using the <tt>getChunks</tt> function of a <tt>FreeList</tt> object. </p>
<p>If we run <tt>dump_free_lists</tt> with one or more space separated addresses it will treat them as pointers to <tt>ThreadCache_FreeList</tt> structures and dump them accordingly.</p>
<div id="attachment_637" class="wp-caption aligncenter" style="width: 481px"><a href="http://seanhn.files.wordpress.com/2011/03/freelist_chunks.png"><img src="http://seanhn.files.wordpress.com/2011/03/freelist_chunks.png?w=500" alt="" title="freelist_chunks"   class="size-full wp-image-637" /></a><p class="wp-caption-text">The chunks in a given FreeList</p></div>
<p>The scripts archive also contains the <tt>search_free_lists</tt> command which will search the FreeLists to see if an address lies within a chunk on any of the FreeLists. After <a href="https://www.immunityinc.com/infiltrate.shtml">Infiltrate</a> we will release the rest of the scripts for ID and GDB but these should be enough to get you started with TCMalloc and GDBs Python API. </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/seanhn.wordpress.com/628/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/seanhn.wordpress.com/628/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/seanhn.wordpress.com/628/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/seanhn.wordpress.com/628/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/seanhn.wordpress.com/628/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/seanhn.wordpress.com/628/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/seanhn.wordpress.com/628/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/seanhn.wordpress.com/628/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/seanhn.wordpress.com/628/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/seanhn.wordpress.com/628/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/seanhn.wordpress.com/628/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/seanhn.wordpress.com/628/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/seanhn.wordpress.com/628/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/seanhn.wordpress.com/628/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=628&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://seanhn.wordpress.com/2011/03/30/heap-scripts-for-tcmalloc-with-gdbs-python-api/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/72a292ee63247e7ef61caf1c8c5e18b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">seanhn</media:title>
		</media:content>

		<media:content url="http://seanhn.files.wordpress.com/2011/03/freelist_dump1.png" medium="image">
			<media:title type="html">freelist_dump</media:title>
		</media:content>

		<media:content url="http://seanhn.files.wordpress.com/2011/03/freelist_chunks.png" medium="image">
			<media:title type="html">freelist_chunks</media:title>
		</media:content>
	</item>
		<item>
		<title>Misleading the Public for Fun and Profit</title>
		<link>http://seanhn.wordpress.com/2010/12/07/misleading-the-public-for-fun-and-profit/</link>
		<comments>http://seanhn.wordpress.com/2010/12/07/misleading-the-public-for-fun-and-profit/#comments</comments>
		<pubDate>Tue, 07 Dec 2010 03:32:49 +0000</pubDate>
		<dc:creator>seanhn</dc:creator>
				<category><![CDATA[Exploit generation]]></category>

		<guid isPermaLink="false">http://seanhn.wordpress.com/?p=529</guid>
		<description><![CDATA[Sometimes I read a research paper, usually in the area where computer science meets application, and it&#8217;s obvious that the authors are far overstating the practical impact of the work. This can be due to the researchers simply not having any exposure to the practical side of the field in which they are investigating and [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=529&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Sometimes I read a research paper, usually in the area where computer science meets application, and it&#8217;s obvious that the authors are far overstating the practical impact of the work. This can be due to the researchers simply not having any exposure to the practical side of the field in which they are investigating and thus accidentally (through ignorance) overstate their claims. Alternatively it can be a purposeful and deliberate attempt to mislead and posture in front of a readership that hopefully won&#8217;t know any better. </p>
<p>The first case is presumably simple ignorance but is still lamentable. The obvious solution here is to avoid making such claims at all. If the research cannot stand on its own then perhaps it is not worthwhile? Researchers (both academic and industrial) have a habit of jumping on problems they underestimate, throwing a variety of techniques at them, hoping one sticks and then calling the problem solved. This typically occurs when they are not actually required to solve the problem correctly and robustly but merely as a &#8216;prototype&#8217;. They then get pilloried by anyone who actually has to solve the problem properly and almost always because of a disparity between claims made and the real impact rather than issues with methodology, recording or technical aspects.</p>
<p>The second case is far more insidious and unfortunately I think not uncommon. In academic research it can be easy to impress by combining cutting edge, but not necessarily original, research with a practical problem, sort-of solving parts of it and like before declaring it solved. Often followed quickly by phrases involving &#8216;game changing&#8217;, &#8216;paradigm shifting&#8217; and so forth. Personally, I think this is a serious problem in the research areas that are less theoretical and more practical. Often the investigators refuse to accept they aren&#8217;t actually aware of the true nature of the problem they are dealing with or how it occurs in the real world. Egotistically this is difficult as they are often lauded by their academic peers and therefore surely must grasp the trivialities of the practical world, no? At this point a mixture of ego, need to impress and lack of ethics combine to give us papers that are at best deluded and at worst downright wrong. </p>
<p>Regardless of whether a paper ends up making such claims mistakenly for the first or the second reason the result is the same. It cheapens the actual value of the research, results in a general loss of respect for the capabilities of academia, deludes the researchers further and causes general confusion as to where research efforts should be focused. Worse still is when attempts to overstate the impact are believed by both the media and other researchers resulting in a complete distortion between the actual practical and theoretical value of the research and it&#8217;s perceived impact. </p>
<p>Now, on to the paper that has reminded me of this most recently: The latest paper from David Brumleys group (authored by Thanassis Avgerinos) at CMU titled <a href="http://www.ece.cmu.edu/~aavgerin/papers/aeg-ndss-2011.pdf">&#8220;<i>AEG: Automatic Exploit Generation</i>&#8220;</a>. I was looking forward to reading this paper as it was the area I worked on during my thesis but quite honestly it&#8217;s incredibly disappointing at best and has serious factual issues at worst. For now let&#8217;s focus on the topic at hand &#8216;<i>overstating the impact of academic research cheapens it and spreads misinformation</i>&#8216;. With the original Patch-Based Exploit Generation paper we had all sorts of stories about how it would change the way in which patches had to be distributed, how attackers would be pushing buttons to generate their exploits in no time at all and in general how the world was about to end. Naturally none of this happened and people continued to use PatchDiff. Unfortunately this is more of the same.</p>
<p>Near the beginning of the most recent paper we have the following claim &#8220;<i>Our automatic exploit generation techniques have several immediate security implications. First, practical AEG fundamentally changes the perceived capabilities of attackers</i>&#8220;. This statement is fundamentally flawed. It assumes that practical AEG is currently possible on bugs that people actually care about. This is patently false. I&#8217;ve written one of these <a href="http://seanhn.files.wordpress.com/2009/09/thesis1.pdf">systems</a>. Did it generate exploits? Yes it did. Is it going to pop any program running on a modern operating system with the kinds of vulnerabilities we typically see? Nope. That would require at a minimum another 2 years of development and at that point I would expect a system that is usable by a skilled exploit writer as an augmentation of his skillset rather than a replacement. The few times I did use the tool I built for real exploits it was in this context rather than full blown exploit generation. The system discussed in the mentioned paper has more bells and whistles in some areas and is more primitive in others and it is still an unfathomable distance from having any impact on a realistic threat model.</p>
<p>Moving on, &#8220;<i>For example, previously it has been believed that it is relatively difficult for untrained attackers to find novel vulnerabilities and create zero-day exploits. Our research shows this assumption is unfounded</i>&#8220;. It&#8217;s at this point the distance between the authors of this paper and the realities of industrial/government/military vulnerability detection and exploit development can be seen. Who are the people we are to believe have this view? I would assume the authors themselves do and then extrapolated to the general exploit creating/consuming community. This is an egotistical flaw that has been displayed in many forays by academia into the vulnerability detection/exploit generation world. </p>
<p>Let&#8217;s discuss this in two parts. Firstly, in the context of the exploits discussed in this paper and secondly in the context of exploits seen in the real world.</p>
<p>In the case of the bug classes considered in the paper this view is entirely incorrect. Anyone who looks at Full Disclosure can regularly see low hanging bugs being fuzzed and exploited in a cookie cutter style. Fuzz the bug, overwrite the SEH chain, find your trampoline, jump to your shellcode bla bla bla rinse and repeat, start a leet h4x0r group and flood Exploit DB. All good fun, no useful research dollars wasted. The bugs found and exploited by the system described are of that quality. Low hanging, fuzzable fruit. The &#8216;training&#8217; involved here is no more than would be required to set up, install and debug whatever issues come up in the course of running the AEG tool. In our most basic class at Immunity I&#8217;ve seen people who&#8217;ve never seen a debugger before writing exploits of this quality in a couple of days. </p>
<p>For more complex vulnerabilities and exploits that require a skilled attacker this AEG system doesn&#8217;t change the threat model. It simply doesn&#8217;t apply. A fully functional AEG tool that I can point at Firefox and press the &#8216;hack&#8217; button (or any tool that had some sort of impact on real threats. I&#8217;d be happy with exploit assistance rather than exploit generation as long as it works) would of course, but we are a long, long way from that. This is not to say we won&#8217;t get there or that this paper isn&#8217;t a step in the right direction but making the claim now is simply laughable. To me it just reeks of a research group desperate to shout &#8216;FIRST!&#8217; and ignoring the real issues. </p>
<p>A few more choice phrases for your viewing pleasure:</p>
<p>&#8220;<i>Automated exploit generation can be fed into signature generation algorithms by defenders without requiring real-life attacks</i>&#8221; &#8211; Fiction again. This would be possible *if* one had a usable AEG system. The word I presume they are looking for is *could*, &#8220;could be fed into&#8221;. </p>
<p>&#8220;<i>In order to extend AEG to handle heap-based overflows we would need to also consider heap management structures, which is a straight-forward extension</i>&#8221; &#8211; Again, this displays a fundamental ignorance of what has been required to write a heap exploit for the past six or so years. I presume they heard about the unlink() technique and investigated no further. Automatic exploit generation of heap exploits requires one to be able to discover and trigger heap manipulation primitives as well as whatever else must be done. This is a difficult problem to solve automatically and one that is completely ignored. </p>
<p>In reference to overflows that smash local variables and arguments that are dereferenced before the function returns and therefore must be valid &#8211; &#8220;<i>If there is not enough space to place the payload before the return address, AEG can still generate an exploit by applying stack restoration, where the local variables and function arguments are overwritten, but we impose constraints that their values should remain unchanged. To do so, AEG again relies on our dynamic analysis component to retrieve the runtime values of the local variables and arguments</i>&#8221; &#8211; It&#8217;s at this point that I start to wonder if anyone even reviewed this thing. In any program with some amount of heap non-determinism, through normal behaviour or heap base randomisation, this statement makes no sense. Any pointers to heap allocated data passed as arguments or stored as local variables will be entirely different. You may be lucky and end up with that pointer being in an allocated heap region but the chances of it pointing to the same object are rather slim in general. Even in the context of local exploits where you have much more information on heap bases etc. this statement trivialises many problems that will be encountered. </p>
<p>As a side note, 12 out of the 28 references are to papers written by people from, or who have been produced by, the same Berkelely and Stanford research groups. As much fun as nepotism is and all I would suggest that some external reading might do wonders. </p>
<p><b>Conclusion</b></p>
<p>With the above paper I have two main issues. One is with the correctness of some of the technical statements made and the other is with distortion between reality and the stated impact and generality of the work. For the technical issues I think the simple observation that they are there is enough to highlight the problem. The flawed statements on impact and generality are more problematic as they display a fundamental corruption of what a scientific paper should be. </p>
<p>I have a deep respect for scientific research and the ideals that I believe it should embody. Much of this research is done by university research groups and some of the papers produced in the last century are among humanities greatest intellectual achievements. Not all papers can be revolutionary of course but even those that aren&#8217;t should aim to uphold a level of scientific decorum so that they may contribute to the sum of our knowledge. In my opinion this single idea should be at the heart of any university researcher being funded to perform scientific investigation. A researcher is not a journalist nor a politician and their papers should not be opinion pieces or designed to promote themselves at the expense of facts. There is nothing wrong with discussing perceived impact of a paper within the paper itself but these statements should be subjected to the same scientific rigour that the theoretical content of the paper is. If one finds themselves unqualified (as in the above paper) to make such statements then they should be excluded. Facts are all that matter in a scientific paper, distorting them through ignorance is incompetence, distorting them on purpose is unethical and corrupt.  </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/seanhn.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/seanhn.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/seanhn.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/seanhn.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/seanhn.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/seanhn.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/seanhn.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/seanhn.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/seanhn.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/seanhn.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/seanhn.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/seanhn.wordpress.com/529/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/seanhn.wordpress.com/529/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/seanhn.wordpress.com/529/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=529&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://seanhn.wordpress.com/2010/12/07/misleading-the-public-for-fun-and-profit/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/72a292ee63247e7ef61caf1c8c5e18b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">seanhn</media:title>
		</media:content>
	</item>
		<item>
		<title>Augment your Auditing with a Theorem Prover</title>
		<link>http://seanhn.wordpress.com/2010/11/05/augment-your-auditing-with-a-theorem-prover/</link>
		<comments>http://seanhn.wordpress.com/2010/11/05/augment-your-auditing-with-a-theorem-prover/#comments</comments>
		<pubDate>Fri, 05 Nov 2010 03:17:04 +0000</pubDate>
		<dc:creator>seanhn</dc:creator>
				<category><![CDATA[Bug hunting]]></category>
		<category><![CDATA[SMT solving]]></category>

		<guid isPermaLink="false">http://seanhn.wordpress.com/?p=502</guid>
		<description><![CDATA[A better post title may have been &#8216;Outsourcing your thinking when lack of sleep makes basic arithmetic difficult&#8217;. Anyways, I was looking at some WebKit code yesterday, when I came across the ArrayBuffer::tryAllocate function found in WebCore/html/canvas/ArrayBuffer.cpp. 85 void* ArrayBuffer::tryAllocate(unsigned numElements, unsigned elementByteSize) 86 { 87 void* result; 88 // Do not allow 32-bit overflow [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=502&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A better post title may have been &#8216;Outsourcing your thinking when lack of sleep makes basic arithmetic difficult&#8217;. Anyways, I was looking at some WebKit code yesterday, when I came across the <tt>ArrayBuffer::tryAllocate</tt> function found in <a href="http://trac.webkit.org/browser/trunk/WebCore/html/canvas/ArrayBuffer.cpp">WebCore/html/canvas/ArrayBuffer.cpp</a>.</p>
<p><code></p>
<pre>
85	void* ArrayBuffer::tryAllocate(unsigned numElements, unsigned elementByteSize)
86	{
87	    void* result;
88	    // Do not allow 32-bit overflow of the total size
89	    if (numElements) {
90	        unsigned totalSize = numElements * elementByteSize;
91	        if (totalSize / numElements != elementByteSize)
92	            return 0;
93	    }
94	    if (WTF::tryFastCalloc(numElements, elementByteSize).getValue(result))
95	        return result;
96	    return 0;
97	}
</pre>
<p></code></p>
<p>The check at line 91 for integer overflow isn&#8217;t one I&#8217;ve seen many times before (if at all) and I couldn&#8217;t quickly come up with a proof of correctness on paper (algebra fail). After some quick discussion with a friend it seemed that it may be possible to wrap <tt>totalSize</tt> and pass the check on line 91. After a bunch of tests I still couldn&#8217;t achieve this but neither did I have any proof that it wasn&#8217;t possible. Given I&#8217;d already failed to do this on paper my next approach was to model the code as an SMT formula and throw a theorem prover at it. </p>
<p><code></p>
<pre>
[sean@sean-laptop bin]$ cat test.smt
(benchmark uint_ovf
:status unknown
:logic QF_BV

:extrafuns ((totalSize BitVec[32])(numElements BitVec[32])(elSize BitVec[32]))
:extrafuns ((a BitVec[64])(b BitVec[64])(big32 BitVec[64]))

; if (numElements) {
:assumption (bvugt numElements bv0[32])

; unsigned totalSize = numElements * elementByteSize;
:assumption (= totalSize (bvmul numElements elSize))

; totalSize / numElements != elementByteSize
:assumption (= elSize (bvudiv totalSize numElements))

; Check if an overflow is possible in the presence of the
; above conditions
:assumption (= big32 bv4294967295[64])
:assumption (= a (zero_extend[32] numElements))
:assumption (= b (zero_extend[32] elSize))
:formula (bvugt (bvmul a b) big32)
)</pre>
<p></code></p>
<p>(<i>The above .smt file is in SMTLIB format. Further information can be found at [1], [2] and [3]</i>)<br />
The above models <tt>tryAllocate</tt> pretty much exactly.  The final three assumptions and the formula are used to check if the integer overflow can occur. Mixing bitvectors of different types isn&#8217;t allowed for most operations so it is necessary first to extend <tt>numElements</tt> and <tt>elSize</tt> into 64 bit variables. We then check for overflow by multiplying these 64 bit extensions by each other and checking if the result can be greater than <tt>0xffffffff</tt> (<tt>big32</tt>) while also satisfying the conditions imposed in modelling the function.<br />
<code></p>
<pre>
[sean@sean-laptop bin]$ ./yices -V
Yices 2.0 prototype. Copyright SRI International, 2009
GMP 4.3.1. Copyright Free Software Foundation, Inc.
Build date: Fri Apr 23 11:15:16 PDT 2010
Platform: x86_64-unknown-linux-gnu (static)
[sean@sean-laptop bin]$ time ./yices -f &lt; test.smt
unsat

real	0m0.587s
user	0m0.576s
sys	0m0.008s
</pre>
<p></code></p>
<p>And there we have it, our proof of safety (modulo modelling errors on my behalf and implementation errors in <tt>yices</tt>). Given the assumptions specified it is not possible for the multiplication at line 90 to overflow and still satisfy the condition at line 91. Total time from starting modelling to a proof, about 10 minutes. </p>
<p>[1] <a href="http://goedel.cs.uiowa.edu/smtlib/logics/QF_BV.smt2">Logic for quantifier free bit-vector logic</a><br />
[2] <a href="http://goedel.cs.uiowa.edu/smtlib/theories/Fixed_Size_BitVectors.smt2">Theory for fixed size bit-vectors</a><br />
[3] <a href="http://goedel.cs.uiowa.edu/smtlib/papers/smt-lib-reference-v2.0-r10.08.28.pdf">SMT-LIB v2 reference</a></p>
<p><i>Edit: I modified the above formula and some of the text after noticing an error. Hence any comments below may refer to an older version of the post. </i></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/seanhn.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/seanhn.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/seanhn.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/seanhn.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/seanhn.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/seanhn.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/seanhn.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/seanhn.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/seanhn.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/seanhn.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/seanhn.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/seanhn.wordpress.com/502/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/seanhn.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/seanhn.wordpress.com/502/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=seanhn.wordpress.com&amp;blog=7649502&amp;post=502&amp;subd=seanhn&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://seanhn.wordpress.com/2010/11/05/augment-your-auditing-with-a-theorem-prover/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/72a292ee63247e7ef61caf1c8c5e18b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">seanhn</media:title>
		</media:content>
	</item>
	</channel>
</rss>
