SVMwis is a piece of software for finding blocks of code in binary executables. It is based on SVMstruct by Thorsten Joachims.


SVMwis is available upon email request. The experiments in the NIPS paper where performed with a Python version built on top of SVMpython which is also available via email request but deprecated. The C version will eventually be publicly released but requires some final polishing.

The program is free for scientific and educational use.


The program depends on the distorm library for disassembling byte sequences. My code has been tested with the version I include, and newer versions will most likely work.

Compiling is done in the usual way:

            tar -zxvf svmwis.tar.gz
            cd svmwis

Then copy the executables svmwis_learn and svmwis_classify to a directory in your PATH.


SVMwis consists of a learning program (svmwis_learn) and a classification program (svmwis_classify). The learning program takes as input a set of training examples and outputs the model it learned. The classification program takes as input a model and a set of examples and outputs the redictions of the model on the examples.

svmwis_learn is called this way:

            svmwis_learn [options] trainingset model
            Available options:
                -c <float>  : tradeoff between model complexity and upper bound on the loss
                -l <int>    : loss function to use. valid choices are 0, 1, 2, and 3 (default: 0)

The input file 'trainingset' contains the training examples. It shouldi have one filename per line and the referenced file should be in the format that OllyDbg writes its disassembly output.

svmwis_classify is called this way:

            svmwis_classify data model predictions

The input file 'data' contains the test examples and should be in the same format as the training examples.

For each test example, the prediction of the model (stored in the 'model' file) is written to the 'predictions' file.


Q: Can SVMwis work on any more general problems than detecting blocks of code in executables

A: In principle, it can handle other similar problems that reduce to weighted interval scheduling (actually a trivial generalization where the cost of switching between two intervals can also be modeled). However, the code is currently not written in a way that makes porting to similar problems easy.