rtl_to_x86: * recognise alub(X,X,sub,1,lt,L1,L2,P) and turn it into 'dec', this might improve the reduction test code slightly (X is the pseudo for FCALLS) * recognise alu(Z,X,add,Y) and turn it into 'lea'. * rewrite tailcalls as parallel assignments before regalloc x86: * Use separate constructors for real regs (x86_reg) and pseudos (x86_temp). Frame: * drop tailcall rewrite Registers: * make the 2 regs now reserved for frame's tailcall rewrite available for arg passing Optimizations: * replace jcc cc,L1; jmp L0; L1: with jcc <not cc> L0; L1: (length:len/2) * Kill move X,X insns, either in frame or finalise * Instruction scheduling module * We can now choose to not have HP in %esi. However, this currently loses performance due to (a) repeated moves to/from P_HP(P), and (b) spills of the temp that contains a copy of P_HP(P). Both of these problems should be fixed, and then, if we don't have any noticeable performance degradation, we should permanently change to a non-reserved HP strategy. Loader: Assembler: Encode: