llvm-objdump ASM Control Flow Arrow Annotation

In: Compilers|Computers|Geekdom|Programming|Technology

12 Jun 2013
Me getting a lot hits from reddit.com/r/programming

Me getting a lot hits from reddit.com/r/programming

Yesterday I wrote about a small script that visualizes control flow in ASM dumps with arrows and when a friend of mine posted it in reddit.com/r/programming I suddenly got a lot more traffic than usual and a bit of useful feedback that helped me improve the robustness and applicability of the script considerably.

This morning I noticed that the I’d also been getting a few referrer hits from this unlikely source http://llvm.org/bugs/show_bug.cgi?id=16297.

Reading that “request for feature” got me thinking that it ought to be dead easy to modify the existing script to also handle disassemblies from llvm-objdump.

An example of the result can be seen below.

     80486cb:	55                    	push	EBP
     80486cc:	89 e5                 	mov	EBP, ESP
     80486ce:	53                    	push	EBX
     80486cf:	83 ec 10              	sub	ESP, 16
     80486d2:	c7 45 f8 00 00 00 00  	mov	[EBP - 8], 0
 ,---80486d9:	eb 29                 	jmp	41
 |,->80486db:	8b 45 f8              	mov	EAX, [EBP - 8]
 ||  80486de:	8b 55 08              	mov	EDX, [EBP + 8]
 ||  80486e1:	8d 0c 02              	lea	ECX, [EDX + EAX]
 ||  80486e4:	8b 1d 4c a0 04 08     	mov	EBX, [134520908]
 ||  80486ea:	a1 54 a0 04 08        	mov	EAX, 134520916
 ||  80486ef:	89 c2                 	mov	EDX, EAX
 ||  80486f1:	01 da                 	add	EDX, EBX
 ||  80486f3:	0f b6 12              	movzx	EDX, BYTE PTR [EDX]
 ||  80486f6:	88 11                 	mov	BYTE PTR [ECX], DL
 ||  80486f8:	83 45 f8 01           	add	[EBP - 8], 1
 ||  80486fc:	83 c0 01              	add	EAX, 1
 ||  80486ff:	a3 54 a0 04 08        	mov	134520916, EAX
 '|->8048704:	8b 15 54 a0 04 08     	mov	EDX, [134520916]
  |  804870a:	a1 50 a0 04 08        	mov	EAX, 134520912
  |  804870f:	39 c2                 	cmp	EDX, EAX
,-|--8048711:	7d 14                 	jge	20
| |  8048713:	8b 15 4c a0 04 08     	mov	EDX, [134520908]
| |  8048719:	a1 54 a0 04 08        	mov	EAX, 134520916
| |  804871e:	01 d0                 	add	EAX, EDX
| |  8048720:	0f b6 00              	movzx	EAX, BYTE PTR [EAX]
| |  8048723:	3c 0a                 	cmp	AL, 10
| '--8048725:	75 b4                 	jne	-76
'--->8048727:	a1 54 a0 04 08        	mov	EAX, 134520916
     804872c:	83 c0 01              	add	EAX, 1
     804872f:	a3 54 a0 04 08        	mov	134520916, EAX
     8048734:	8b 45 f8              	mov	EAX, [EBP - 8]
     8048737:	83 c4 10              	add	ESP, 16
     804873a:	5b                    	pop	EBX
     804873b:	5d                    	pop	EBP
     804873c:	c3                    	ret

What made this version of the script a bit trickier to make was that where the classical objdump is kind enough to translate the target address for jump instructions to one of the relative offsets seen in the first column (ie. 80486db), llvm-objdump only does a litteral translation of the instruction name followed by the relative offset in bytes as a signed integer.

And with the x86 architectures notorious variable instruction length, finding the target address isn’t just a matter of counting the offset divided by instruction size as number of lines as it would be in a fixed length instruction set, but rather a matter of counting every byte on the way there. A slight added complication to this calculation is that the offset to start counting from is that of the instruction following the jump instruction.

If I had to work with assembly output like that, I would be begging for something like the sort of annotation this script provides ;)

The llvm version of the script is available for downlad here, and I assume that a diff between that and the vanilla objdump version would make an excellent starting point for anyone wanting to adapt the script to some other disassembly dialect.

1 Response to llvm-objdump ASM Control Flow Arrow Annotation

Avatar

Automagical ASM Control Flow Arrow-Annotation | Avoiding Perfection

June 12th, 2013 at 14:32

[…] UPDATE 2: I’ve made a variant of this script for disassemblies from llvm-objdump. Read all about it. […]

Comment Form

About this blog

The name is inspired by the saying: "perfection is the bane of all good things".
Read more