Overview 4: How is Control Flow Translated?

2014-04-03 | Dagger Team

Calls are translated to IR call instructions. Because generated functions take a register set structure pointer parameter, some code is inserted before/after the generated call to save/restore all registers, from local variables to the register set structure and the other way around. All this code is initially inserted in a new basic block, created for the exact call instruction. This is done for simplicity, and is quickly eliminated by later optimizations.

Here is an example X86 call:

  ...
  br label %bb_1000008D8_call

bb_1000008D8_call: ; preds = %bb_1000008D8
  %65 = load i32* %EFLAGS
  store i32 %65, i32* %EFLAGS_ptr
  %66 = load i64* %RAX
  store i64 %66, i64* %RAX_ptr
  %67 = load i64* %RBP
  store i64 %67, i64* %RBP_ptr
  ...
  call void @fn_100000C64(%regset* %0)
  %EFLAGS_17 = load i32* %EFLAGS_ptr
  %RAX_17 = load i64* %RAX_ptr
  %EAX_10 = trunc i64 %RAX_17 to i32
  %AX_6 = trunc i64 %RAX_17 to i16
  %AL_7 = trunc i64 %RAX_17 to i8
  %74 = lshr i64 %RAX_17, 8
  %AH_6 = trunc i64 %74 to i8
  ...
  store i8 %AH_6, i8* %AH
  store i8 %AL_7, i8* %AL
  store i16 %AX_6, i16* %AX
  ....
  br label %bb_c1000008E6

bb_c1000008E6: ; preds = %bb_1000008D8_call
  ...

Calls to external functions (for instance, defined in dynamically linked shared libraries) introduce an additional complication: usually, the target function's definition isn't readily available for translation, and it is not always desirable to translate it anyway. By default, dynamic translation stops at the binary object boundary; static translation is forced to do so, because the binary object is all that is available. Anything not defined in these boundaries is considered an external function. External function calls are wrapped with a register context switch, which restores/saves registers from the register set IR structure to the real architectural registers. Obviously, this only works when translating to IR and back to the original target; shared libraries can't be used across targets anyway.

Indirect calls are treated similarly to direct calls, except that the function target isn't an already translated function, but a function pointer. This pointer is obtained using a runtime, that provides the generated function that corresponds to a specific address. For the general case, the external functions are translated just-in-time when possible, using the same DC code as static translation. However, in the common case, indirect call targets are defined in the binary object, and are thus very likely to have already been translated by the static translator. Thus, the goal is also to generate a special function, that maps indirect call target addresses to already translated functions, avoiding resort to the full runtime.

Indirect branches are treated in the same way as indirect calls, except that after the call instruction, the generated IR branches to the exit block of the function. In this situation, the goal is also to optimize the common case, which is jump-table based indirect branches for switch statements. These follow an easy to match pattern: an immediate address, pointing to a pointer array in read-only (text) memory (usually inserted as data-in-code, very close to the indirect branch, partly so smaller instruction encodings can be used). The jump table is then accessed through an indirect offset. Value-range analysis is very helpful in constraining the offset, making it possible to translate the jump table to an IR switch, and resume the static translation process without any generated indirection.

A similar technique might be used for virtual methods/dispatch tables, though that is a much more involved problem, though less useful (indirect calls are less of a problem than indirect branches) and less predictable (the virtual pointer is still dynamic, so the table base is too).