This week I’m attending the first SAT/SMT Summer School, organised by Vijay Ganesh and hosted at MIT. There are plenty of interesting talks, organised into three categories, so I figured it might be useful to do a brief summary of each day with some links to relevant material. I’ll update this post as the week progresses.

** Introduction to Satisfiability Solving with Practical Applications (Niklas Een)**

The first day of talks was a preliminary day, providing introductions to the foundations of the SAT and SMT problems and a quick history of how the research areas have progressed. Niklas Een, one of the MiniSAT developers, opened the technical part of the conference discussing the history of automated approaches to SAT. He then moved on to an overview of the algorithms that form the core of most SAT solvers. In Niklas’ opinion the most important algorithms in terms of their impact on the ability of SAT solvers have been those for conflict clause analysis and variable activity tracking. Perhaps an obvious point, but one worth making was that often industrial SAT problems are trivial once you know what part of the graph to explore and hence why algorithms for locating this are quite important. One idea that was reiterated multiple times was that the development of SAT solvers is largely an experimental science and that even though we can give intuitions as to why certain algorithms are useful, in many cases nobody has a clue what the true reason is, not even the algorithms inventors.

**SMT Theory and DPLL(T) (Albert Oliveras)**

The second talk was by Albert Oliveras and provided an introduction to SMT solving. After a quick discussion on the motivation for SMT solvers Albert gave an overview of both the lazy and eager approaches to SMT solving. The eager approach, embodied in solvers like STP, involves directly encoding the expressions over different theories into a SAT problem and then using a SAT solver on that problem. The lazy approach, found in Z3 among others, has two distinct systems – a SAT solver and one or more theory specific solvers, and proceeds by finding SAT solutions to a propositional skeleton and using theory specific solvers to check for consistency of the returned model in their domains. Albert also provided a good high level summary of how these theory specific solvers share information. The slides from this talk can be found here and are definitely worth a look for an introduction to the topic.

**SAT Solvers for Formal Verification (Ed Clarke)**

After lunch Ed Clarke continued the introductions with a basic summary of bounded model checking (BMC) and linear temporal logic (LTL). BMC has progressed over the years from initially using basic data structures to explicitly represent states to the use of binary decision diagrams and on to SAT encodings. Its an interesting topic and there are many parallels and cross overs between modern versions of these algorithms (and CEGAR) and symbolic execution based approaches to verification. The second part of Ed’s talk was on current research his students are doing into the use of bounded model on systems that have properties modelled by differential equations. I apparently didn’t pay enough attention in calculus classes because most of this went over my head =) The work is being led by Sicun Gao so if you’re interested his research page is probably helpful.

**SMT-LIB Initiative (Cesare Tinelli)**

Following this, Cesare Tinelli talked about the SMT-LIB and SMT-COMP initiatives. It was interesting to hear the story behind how it started. One thing of note that Cesare said is that back when this was started it was incredibly hard to tell what algorithms and extensions to SAT solvers were truly useful. This was because of unstandardised reporting and also because the tendency of developers to report on the benchmarks that their tools performed particularly well on. This is relevant I think because we are at a similar stage with symbolic execution and security focused tools. It’s hard to tell what really works as sometimes the tools aren’t released and when they are it can be quite difficult to recreate the authors results. It might be useful for some sort of standardised benchmark/testing suite with a focus on modern problems, especially as people move into other areas like automatic exploit generation.

An interesting discussion broke out near the end on the usefulness of the SMT-LIB standard as an input mechanism for tools given that it is not designed for efficient storage and so writing large problems to disk to read into a solver isn’t feasible for many cases in symbolic execution. The solution here seems to be to simply embed a solver and use its C/OCaml/etc API but that does somewhat nullify the usefulness of the SMT-LIB language as anything more than a standard and teaching tool. It might be interesting to see a version of the language developed specifically with the goal of efficient storage and parsing though.

**Constraint Solving Challenges in Dynamic Symbolic Execution (Cristian Cadar)**

The final talk of the day was by Christian Cadar on the technologies behind EXE and KLEE, with a focus on the problems of solving the types of constraints generated. Using the example of constraints over arrays, Christian made the point that it can be useful to modify a solvers algorithms for a particular theory for domain specific cases. The graph presented showed a reduction from something near exponential slowdown to linear slowdown by removing the array axioms typically added for every array based operation (and one other thing that escapes me right now!) and thus checking an under-approximation of the original formula.

These tools are essentially mixed symbolic/concrete execution tools with a focus on bug finding, equivalence checking and so forth. KLEE is open source, which is a definite plus and worth checking out. I’d love to hear some feedback from people on its performance as I’ve ran it a few times and haven’t really gotten results that are as useful as fuzzing or manual auditing for bug finding. I think for tools such as this a large set of standardised benchmarks in modern applications could be very useful for gauging progress and focusing research efforts. Apparently Microsoft Research have found this approach very useful in their SAGE framework but it’s hard to tell if it can be used generally when you don’t have a data-center and a dedicated research group.

And that was it! Fun day and tomorrow looks awesome =)

One other thing Cristian mentioned was caching the results of previous queries to the solver. The weird and beautiful thing pointed out during the questions was that the caching and the pruning of irrelevant constraints individually had similar effect, but together they worked remarkably well. Wonder what’s going on there.

Ah! Thanks David, thats the other thing I couldn’t remember. Yea, it was curious to see that their individual effects were nearly identical and together they performed so much better. It would be really interesting to see some investigation into what is the reason behind this and what the real cause is. There’s definitely some black magic in the whole solving game still and that seems to be a common enough comment from the different people working on optimisations and algorithms. Pretty cool stuff =) I guess there’s still a whole lot we don’t understand about the fundamentals of many of these problems.

Cheers Sean!

[…] Solvers. What the current state and how were other industries were applying them? Sean Heelan has a great series of blogposts that summarize all the days of presentations so we aren’t going to parrot any of that. Check […]

[…] current state of research on this stuff? and how were other industries applying it? Sean Heelan has a great series of blogposts that summarize all the days of presentations so we aren’t going to parrot any of that. Check […]