Saturday, December 2, 2017

So I spent some more time working on the decompiler. The part I was working on was subgraph recognition. Each flow control structure has a unique sructure, and by recognizing these structures, we can identify code blocks and the control operations.

Here are several example graphs to help explain this better. Each letter represents a block of non-branching code. The arrows show the possible execution paths between these blocks.

Example 1: Do while loop

A->B->C
^--'

Execution starts at block A, proceeds to block B and then either loops back to A or proceeds to C. This graph is equivalent to the following pseudocode:

do {
  A
} while(B);
C

Example 2: While loop

.-----v
A->B->C
^--'

This is similar to the do while loop, but the loop determination is in block A. Here's the equivalent pseudocode:

while(A) {
  B
}
C

Example 3: If

A->B->C
'-----^

This is a subgraph of the while loop, so the detection logic needs to distinguish between them. Again the decision whether to execute block B is made in block A. In either case, execution proceeds to block C. Here's the equivalent pseudocode:

if(A) {
  B
}
C

Example 4: If else

   .-----v
A->B  C->D
'-----^

I think you get the idea by now. Here's the equivalent pseudocode:

if(A) {
  B
} else {
  C
}
D

There's still tons to do for the decompiler, but getting this right is one of the central problems that needed to be solved. It's pretty neat to see this working.

No comments:

Post a Comment