Overview 3: How are Registers Translated?

2014-04-03 | Dagger Team

The IR initially generated by the translation process has to accurately reproduce accesses to and from registers. At the assembly level, for most architectures, there are no explicit function parameters: everything goes through registers or memory (usually the stack, in turn accessed through registers). Thus, all the functions we generated take a single argument, a pointer to a structure representing the context of the function, consisting of the set of all registers defined by the architecture. Similarly, at the assembly level, return values go through registers, which is why our generated functions don't return anything (void):

declare void @fn_100000880(%regset* noalias nocapture)

An interesting complication is that architecturally-defined registers may overlap. For instance, it is common for registers to form a hierarchy: subsets of SIMD lanes may be accessed through dedicated architectural registers; similarly for parts of general purpose registers. For instance on X86, the lower 16 bits of 32 bit registers (such as EAX) are accessible through a subregister (here, AX); similarly for the lowest (AL) or second to lowest (AH) 8 bits.

Here's the register set for X86. All the general purpose registers are the i64s; the i16/i32 are mostly control, debug, or segment registers. i512s are YMM vectors, and i80s are FPU registers.

%regset = type { i16, i16, i32, i16, i16, i16, i16, i64, i64, i64, i64, i64,
i64, i64, i64, i64, i16, i64, i64, i64, i64, i64, i64, i64, i64, i64, i64,
i64, i64, i64, i64, i64, i64, i32, i32, i32, i32, i32, i32, i32, i32, i80,
i80, i80, i80, i80, i80, i80, i16, i16, i16, i16, i16, i16, i16, i16, i64,
i64, i64, i64, i64, i64, i64, i64, i64 , i64, i64, i64, i64, i64, i64, i64,
i80, i80, i80, i80, i80, i80, i80, i80, i512, i512, i512, i512, i512, i512,
i512, i512, i512, i512, i512, i512, i512, i512, i512, i512, i512, i512, i512,
i512, i512, i512, i512, i512, i512, i512, i512, i512, i512, i512, i512, i512 }

Only the largest superregisters are represented in the register set structure, and none of the subregisters. Modifications to registers initially generate updates to all affected registers in the sub-/super-register hierarchy. The dead updates are easily optimized out later on.

...
;; Easy case: truncating the largest super-register
%RDI_9 = load i64* %55            ;     def RDI @1000008cd: movq 8(%rsi), %rdi
%EDI_14 = trunc i64 %RDI_9 to i32
%DI_13 = trunc i64 %RDI_9 to i16
%DIL_13 = trunc i64 %RDI_9 to i8
...

When translating functions to IR, alloca'd local variables are created, for each (used) register. Inside each basic block, the register set structure isn't accessed, only the local variables are: at the beginning of each basic blocks, register values are loaded from the local variables; inside the basic block, SSA values are used and generated as needed; at the end, the last generated value for a register is stored to the corresponding local variable. The mem2reg alloca promotion pass is then used to transform this to proper SSA.

Here's an example entry block, with allocation and initialization of local variables:

%RAX_ptr = getelementptr inbounds %regset* %0, i32 0, i32 7
%RAX_init = load i64* %RAX_ptr            ; Load the largest super-reg from the regset
%RAX = alloca i64                         ; Allocate local variable for register
store i64 %RAX_init, i64* %RAX            ; Initialize it
%EAX_init = trunc i64 %RAX_init to i32    ; Ditto for sub-registers
%EAX = alloca i32
store i32 %EAX_init, i32* %EAX
%AX_init = trunc i64 %RAX_init to i16
%AX = alloca i16
store i16 %AX_init, i16* %AX
%AL_init = trunc i64 %RAX_init to i8
%AL = alloca i8
store i8 %AL_init, i8* %AL
%1 = lshr i64 %RAX_init, 8
%AH_init = trunc i64 %1 to i8
%AH = alloca i8
store i8 %AH_init, i8* %AH

And here is an annotated basic block, only consisting of a load instruction:

%RSI_21 = load i64* %RSI
%54 = add i64 %RSI_21, 8
%55 = inttoptr i64 %54 to i64*       ;  op-use 8(%rsi) @1000008cd: movq 8(%rsi), %rdi
%RDI_9 = load i64* %55               ;     def RDI     @1000008cd: movq 8(%rsi), %rdi
%EDI_14 = trunc i64 %RDI_9 to i32
%DI_13 = trunc i64 %RDI_9 to i16
%DIL_13 = trunc i64 %RDI_9 to i8
%RIP_99 = add i64 %RIP_98, 5
%EIP_88 = trunc i64 %RIP_99 to i32
%IP_88 = trunc i64 %RIP_99 to i16
store i16 %DI_13, i16* %DI
store i8 %DIL_13, i8* %DIL
store i32 %EDI_14, i32* %EDI
store i64 %RDI_9, i64* %RDI

Individual flags in status registers (such as EFLAGS on X86) are treated somewhat differently from other sub-registers: new SSA values are generated when flags are updated, but they aren't individually stored to local variables: the full status register is re-generated every time. Also, SSA values are also generated for condition codes defined by the architecture (which usually provide a light abstraction on status flags). Status registers are one of the biggest areas of potential improvements. Here is the annotated "naive" IR for an X86 compare instruction:

%CC_A_0 = icmp ugt i32 %EDI_2, 2
%CC_AE_0 = icmp uge i32 %EDI_2, 2
%CC_B_0 = icmp ult i32 %EDI_2, 2
%CC_BE_0 = icmp ule i32 %EDI_2, 2
%CC_L_0 = icmp slt i32 %EDI_2, 2
%CC_LE_0 = icmp sle i32 %EDI_2, 2
%CC_G_0 = icmp sgt i32 %EDI_2, 2
%CC_GE_0 = icmp sge i32 %EDI_2, 2
%CC_E_0 = icmp eq i32 %EDI_2, 2
%CC_NE_0 = icmp ne i32 %EDI_2, 2
%24 = sub i32 %EDI_2, 2
%ZF_0 = icmp eq i32 %24, 0
%SF_0 = icmp slt i32 %24, 0
%25 = call { i32, i1 } @llvm.ssub.with.overflow.i32(i32 %EDI_2, i32 2)
%OF_0 = extractvalue { i32, i1 } %25, 1
%26 = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 %EDI_2, i32 2)
%CF_0 = extractvalue { i32, i1 } %26, 1
%27 = trunc i32 %24 to i8
%28 = call i8 @llvm.ctpop.i8(i8 %27)
%29 = trunc i8 %28 to i1
%PF_0 = icmp eq i1 %29, false
%30 = and i32 %EFLAGS_11, -2262
%31 = zext i1 %CF_0 to i32
%32 = shl i32 %31, 0
%33 = or i32 %32, %30
%34 = zext i1 %PF_0 to i32
%35 = shl i32 %34, 2
%36 = or i32 %35, %33
%37 = zext i1 false to i32
%38 = shl i32 %37, 4
%39 = or i32 %38, %36
%40 = zext i1 %ZF_0 to i32
%41 = shl i32 %40, 6
%42 = or i32 %41, %39
%43 = zext i1 %SF_0 to i32
%44 = shl i32 %43, 7
%45 = or i32 %44, %42
%46 = zext i1 %OF_0 to i32
%47 = shl i32 %46, 11
%EFLAGS_12 = or i32 %47, %45   ; imp-def EFLAGS @1000008c3: cmpl $2, %edi