How is a relative JMP (x86) implemented in an Assembler?
- by Pindatjuh
While building my assembler for the x86 platform I encountered some problems with encoding the JMP instruction:
enc     inst        size in bytes
EB cb   JMP rel8    2
E9 cw   JMP rel16   4 (because of 0x66 16-bit prefix)
E9 cd   JMP rel32   5
...
(from my favourite x86 instruction website, http://siyobik.info/index.php?module=x86&id=147)
All are relative jumps, where the size of each encoding (operation + operand) is in the third column.
Now my original (and thus fault because of this) design reserved the maximum (5 bytes) space for each instruction. The operand is not yet known, because it's a jump to a yet unknown location. So I've implemented a "rewrite" mechanism, that rewrites the operands in the correct location in memory, if the location of the jump is known, and fills the rest with NOPs. This is a somewhat serious concern in tight-loops.
Now my problem is with the following situation:
b: XXX
c: JMP a
e: XXX
   ...
   XXX
d: JMP b
a: XXX      (where XXX is any instruction, depending
             on the to-be assembled program)
The problem is that I want the smallest possible encoding for a JMP instruction (and no  NOP filling).
I have to know the size of the instruction at c before I can calculate the relative distance between a and b for the operand at d. The same applies for the JMP at c: it needs to know the size of d before it can calculate the relative distance between e and a.
How do existing assemblers implement this, or how would you implement this?
This is what I am thinking which solves the problem:
  First encode all the instructions to opcodes between the JMP and it's target, and if this region contains a variable-sized opcode, use the maximum size, i.e. 5 for JMP. Then in some conditions, the JMP is oversized (because it may fit in a smaller encoding): so another pass will search for oversized JMPs, shrink them, and move all instructions ahead), and set absolute branching instructions (i.e. external CALLs) after this pass is completed.
I wonder, perhaps this is an over-engineered solution, that's why I ask this question.