Automagical ASM Control Flow Arrow-Annotation

In: Computers|Hacks|Programming|Technology

11 Jun 2013
Angelina Jolie doing 'sexy movie hacking' in 'Hackers'

Angelina Jolie doing ‘sexy movie hacking’ in ‘Hackers’

UPDATE: Since my initial post I’ve updated the code a couple of times to better support 64bit linux binaries and also objdumps of windows binaries. Let me know if you run into som sort of disassembly output from objdump that returns the input unaltered and I’ll see if I can make it work.

UPDATE 2: I’ve made a variant of this script for disassemblies from llvm-objdump. Read all about it.

UPDATE 3: Fixed some missing instructions and instruction synonyms. I’ve also added descriptions of the given jump instruction at the end of the line realising that trying to keep line lengths within 80 columns is a lost cause with this script ;)

I’m currently following a course on “Proactive Computer Security”, which, if you accept the fact that computers are the closest equivalent we’ve got to magic, equates to a “Defence Against The Dark Arts” class, and based on the same credo that one should familiarize oneself with how to attack in order to better defend.

Hacking of computer programs typically involves feeding that program an input that it wasn’t designed to handle properly, resulting in the attacker getting control or gaining information that she wasn’t supposed to.

So in this particular course we’ve been doing a lot of reverse engineering on disassembled programs in order to find weaknesses in their input handling.

A function handling an input of unknown length n bytes will typically have a loop that keeps running until a certain condition is met. Like in the following example where the instruction at 0×08048725 keeps jumping back to the instruction at 0x080486d9 if the latest character read from the input is not a newline. Notice also the jump at 0×08048711 that will jump past the test for newline if the end of the input has been reached.

080486cb <get_line>:
     80486cb:	55                   	push   ebp
     80486cc:	89 e5                	mov    ebp,esp
     80486ce:	53                   	push   ebx
     80486cf:	83 ec 10             	sub    esp,0x10
     80486d2:	c7 45 f8 00 00 00 00 	mov    [ebp-0x8],0x0
 ,---80486d9:	eb 29                	jmp    8048704 <get_line+0x39>
 |,->80486db:	8b 45 f8             	mov    eax,[ebp-0x8]
 ||  80486de:	8b 55 08             	mov    edx,[ebp+0x8]
 ||  80486e1:	8d 0c 02             	lea    ecx,[edx+eax*1]
 ||  80486e4:	8b 1d 4c a0 04 08    	mov    ebx,ds:0x804a04c
 ||  80486ea:	a1 54 a0 04 08       	mov    eax,ds:0x804a054
 ||  80486ef:	89 c2                	mov    edx,eax
 ||  80486f1:	01 da                	add    edx,ebx
 ||  80486f3:	0f b6 12             	movzx  edx,BYTE PTR [edx]
 ||  80486f6:	88 11                	mov    BYTE PTR [ecx],dl
 ||  80486f8:	83 45 f8 01          	add    [ebp-0x8],0x1
 ||  80486fc:	83 c0 01             	add    eax,0x1
 ||  80486ff:	a3 54 a0 04 08       	mov    ds:0x804a054,eax
 '|->8048704:	8b 15 54 a0 04 08    	mov    edx,ds:0x804a054
  |  804870a:	a1 50 a0 04 08       	mov    eax,ds:0x804a050
  |  804870f:	39 c2                	cmp    edx,eax
,-|--8048711:	7d 14                	jge    8048727 <get_line+0x5c>
| |  8048713:	8b 15 4c a0 04 08    	mov    edx,ds:0x804a04c
| |  8048719:	a1 54 a0 04 08       	mov    eax,ds:0x804a054
| |  804871e:	01 d0                	add    eax,edx
| |  8048720:	0f b6 00             	movzx  eax,BYTE PTR [eax]
| |  8048723:	3c 0a                	cmp    al,0xa
| '--8048725:	75 b4                	jne    80486db <get_line+0x10>
'--->8048727:	a1 54 a0 04 08       	mov    eax,ds:0x804a054
     804872c:	83 c0 01             	add    eax,0x1
     804872f:	a3 54 a0 04 08       	mov    ds:0x804a054,eax
     8048734:	8b 45 f8             	mov    eax,[ebp-0x8]
     8048737:	83 c4 10             	add    esp,0x10
     804873a:	5b                   	pop    ebx
     804873b:	5d                   	pop    ebp
     804873c:	c3                   	ret    

Now imagine following the above control flow without the arrows, which objdump, my disassembler of choice does, not provide for some reason.

Ida Pro Control Flow View

Ida Pro Control Flow View

Having these arrows in the assembly makes it both easier to identify sections of the code with a lot of control flow and perhaps more importantly faster to deduce the logic in said section.

Both the free reverse engineering framework radare and the rather costly IDA Pro suite does this already with a more graph like layout, but I believe my approach has qualities that they lack, in being so unobtrusive, simple and without other requirements than a python installation and the objdump program.

My arrow-annotating python code follows below, but people who expect to run it should probably download the file directly as the syntax highlighter tends to mangle the code enough to confuse the python intepreter.

#!/usr/bin/env python
# encoding: utf-8
"""
asm_jmps.py
Created by Daniel Fairchild on 2013-06-10.
Usage (in shell):
  objdump -M intel -d  target.bin | ./asm_jmps.py
License:
  I'd appreciate a comment at: http://blog.fairchild.dk/?p=633
  if you find the following usefull. That'll be all.
"""
import sys
import re

JMPS = { #define jumps, synomyms on same line
'ja':'if above', 'jnbe':'if not below or equal',
'jae':'if above or equal','jnb':'if not below','jnc':'if not carry',
'jb':'if below', 'jnae':'if not above or equal', 'jc':'if carry',
'jbe':'if below or equal', 'jna':'if not above',
'jcxz':'if cx register is 0', 'jecxz':'if cx register is 0',
'je':'if equal', 'jz':'if zero',
'jg':'if greater', 'jnle':'if not less or equal',
'jge':'if greater or equal',
'jl':'if less', 'jnge':'if not greater or equal',
'jle':'if less or equal', 'jnl':'if not less',
'jmp':'unconditional',
'jne':'if not equal', 'jnz':'if not zero',
'jng':'if not greater',
'jno':'if not overflow',
'jnp':'if not parity', 'jpo':'if parity odd',
'jns':'if not sign',
'jo':'if overflow',
'jp':'if parity', 'jpe':'if parity even',
'js':'if sign'}

fcl = re.compile(" +([\da-f]+)\:")
fjre=re.compile("".join([
            " +([\da-f]+)\:\\t.*(",
            "".join(map(lambda x: x+"|", JMPS))[:-1],
            ")\s+\*?0?x?([\da-f]+)"]))

def j_line(ln, ljmps):
  jl = len(ljmps)
  outl = [" "]*(jl+2)
  jdesc=""
  for i in range(jl):
    if ljmps[i][0] == ln: #jmp from
      outl[-(jl-i+2):] = ["-"]*(jl-i+2)
      outl[i] = "," if ljmps[i][0] < ljmps[i][1] else "\'"
      jdesc = "; jump %s" % ljmps[i][2]
    if ljmps[i][1] == ln: #jmp to
      outl[-(jl-i+1):] = ["-"]*(jl-i+1)
      outl[-1] = ">"
      outl[i] = "," if ljmps[i][0] > ljmps[i][1] else "\'"
    if ljmps[i][0] < ln and ljmps[i][1] > ln:
      outl[i] = "|"
    elif ljmps[i][0] > ln and ljmps[i][1] < ln:
      outl[i] = "|"
  return ("".join(outl),jdesc)

def drw_jmps(all_lines, fun_lines):
  ljmps = []
  for cl in fun_lines:
    m = fjre.match(all_lines[fun_lines[cl]])
    if m != None:
      if fun_lines.has_key(m.group(3)):
        ljmps.append((fun_lines[cl], fun_lines[m.group(3)],JMPS[m.group(2)]))
  #the following sorting bands same endpoints together
  ljmps = sorted(ljmps, key=lambda x: -x[1])
  for cl in sorted(fun_lines, key=lambda x: int(x,16)):
    jlr = j_line(fun_lines[cl],ljmps)
    all_lines[fun_lines[cl]]="".join([
                    jlr[0],
                    all_lines[fun_lines[cl]][:-1].lstrip(),jlr[1],
                    "\n"])

if __name__ == "__main__":
  #read lines from stdin
  nasml = sys.stdin.readlines()
  #make a dictionary of asm lines
  asm_lines = {}
  for i in range(len(nasml)):
    m = fcl.match(nasml[i])
    if m != None:
      asm_lines[m.group(1)] = i
    if nasml[i] == "\n" or "\tret " in nasml[i]:
      drw_jmps(nasml, asm_lines)
      asm_lines = {}
      fun_decl_lines = {}
  print "".join(nasml)

14 Responses to Automagical ASM Control Flow Arrow-Annotation

Avatar

Julian

June 11th, 2013 at 16:58

That’s great, but unfortunately it doesn’t work for me. It just pipes everything through without annotating arrows.

Avatar

fairchild

June 11th, 2013 at 17:17

Hmmm, that’s weird.

Maybe my regex are too restrictive? If you provide me with a sample of the disassembly that you’re feeding into the script, I’ll see if I can fix it.

Avatar

Michael

June 11th, 2013 at 17:24

I am have the same problem as Julian. I just ran this command on linux.

objdump -M intel -d /usr/bin/yes | ./asm_jmps.py

Avatar

Annotating decompiled assembly with arrows to show control flow » Wisconsin Web Works

June 11th, 2013 at 17:41

[...] by Sebbe [link] [comment] …read [...]

Avatar

fairchild

June 11th, 2013 at 18:04

I’m guessing that you’re running 64bit linux? We only do 32bit in the course I’m following (and I didn’t expect one of my friends to post it to reddit either ;))

Anyway, I’ve updated the code in the file download and in the post to handle 64bit code on my machine.

I also fixed the code to work better with stripped binaries (ie. consider a ret instruction as an end of function)

Please have a try and let me know if it works better for you now.

Avatar

Michael

June 11th, 2013 at 19:12

Yes it works now and I was using 64bit Linux. I might try this on ARM

Avatar

Michael

June 11th, 2013 at 19:13

Yes it works now and I was using 64bit Linux. I might try modifing this to work on ARM

Avatar

runejuhl

June 12th, 2013 at 08:12

Great tool. It seems to work pretty well using input from my small tool that merges objdump output with information from readelf and strings: https://gist.github.com/runejuhl/5593610

Avatar

fairchild

June 12th, 2013 at 09:21

Well it better, considering that I used output from your script in the initial development and not direct output from objdump ;)
In combination the two scripts provide me with all the annotation I was previously relying on IDA Pro for.

And btw. you providing that script to the rest of the participants on the PCS course was a big part of the inspiration for me to put a little more effort into my tools and share them, both this and the ROPfinder.

Avatar

llvm-objdump ASM Control Flow Arrow Annotation | Avoiding Perfection

June 12th, 2013 at 14:13

[...] I wrote about a small script that visualizes control flow in ASM dumps with arrows and when a friend of mine posted it [...]

Avatar

Julian

June 12th, 2013 at 14:59

Works now. Very useful!

Avatar

dkf

June 15th, 2013 at 22:58

Interesting (I’ve been applying it to other disassembly output for a rather different system, with some changed REs for detecting jumps). The code runs into problems when faced with code with very large numbers of jumps in, such as from a large ‘switch’ statement or just where there’s rather a lot of conditions being checked…

Avatar

Daniel Fairchild

June 18th, 2013 at 13:17

@dkf: I’ve changed the script to reuse available columns instead of reserving one for each jump instruction within a function. Maybe it solves the problem you mentioned above?
have a look here: http://blog.fairchild.dk/2013/06/compact-asm-ctrl-flow-visualiser/

Avatar

Compact ASM ctrl-flow Visualiser | Avoiding Perfection

June 18th, 2013 at 13:21

[...] got a comment to my initial post about ASM ctrl-flow visualisation which ends with the following [...]

Comment Form

About this blog

The name is inspired by the saying: "perfection is the bane of all good things".
Read more

Switch to our mobile site