Inline Assembly

danieltan1517 · October 26, 2021, 9:48am

Inline assembly can be used to specify exactly what machine language instructions need to be executed in order to get the most optimized code, or doing SIMD instructions for parallelizing data transformations. Here is the basic starter code for inline assembly blocks. Currently, the x64 platform is supported, but there are still a lot of holes in the x64 assembly instructions support.

Places where you can find inline assembly examples: modules/Atomics, modules/Bit_Operations, modules/Runtime_Support.

Here is an excerpt of atomic swap from the Atomics module that uses assembly language:

atomic_swap :: (dest: *$T, new_value: T) -> (old_value: T) {
  SIZE :: size_of(T);
  // The Intel documentation says that the lock prefix is ignored
  // for xchg, but we'll put it here just in case I guess?
  v := new_value;
  #if SIZE == 1 {
    #asm { lock_xchg.b v, [dest]; }
  } else #if SIZE == 2 {
    #asm { lock_xchg.w v, [dest]; }
  } else #if SIZE == 4 {
    #asm { lock_xchg.d v, [dest]; }
  } else #if SIZE == 8 {
    #asm { lock_xchg.q v, [dest]; }
  } else {
    #assert false, "Invalid size passed to atomic_swap; argument must be 1, 2, 4, or 8 bytes.";
  }
  return v;
}

The lock_xchg is the atomic swap assembly instruction. The .q, .d, .w, and .b specifies the size of the assignment. Here is the list of different operations:

.q is quad-word (64-bit integer).

.d is double-word (32-bit integer).

.w is a regular word (16-bit integer).

.b is a byte (8-bit integer).

.x is the SSE is in the feature set, xmmword (128-bit)

.y is the AVX is in the feature set, ymmword (256-bit)

.z is the AVX512F is in the feature set, zmmword (512-bit)

List of All the Assembly Instructions

Instructions are named based on the mnemonic and operands provided. Instruction mnemonics are identical to the official mnemonic provided by Intel and AMD. With that being said, you can refer to official manuals when programming instead of having to indirectly go through the intrinsics guide. Here is a list of all possible assembly instructions supported: x86 and amd64 instruction reference
Note: This list is based on the my best knowledge. This list could possibly be incorrect, but as far as I know, this is a correct list.

Current Assembly Limitations

There are no goto and no jump instructions in the current assembly. There are no call instructions, and you cannot call a function in the middle of an assembly block.

Assembly Language Data Types

The data types usable within inline assembly are gpr, str, vec, or omr.

gpr stands for general purpose register.

gpr.a means that the gpr must be pinned to the register a (e.g. EAX === a)

mem means the operation must be a memory operand (e.g. lea.q [EAX], rax)

str stands for stack register, this is used by the fpu and mmx instructions.

vec stands for a vector type. This is used for manipulating SIMD instructions

omr stands for op-mask register

Here are some valid assembly language syntax declaration examples:

#asm {
  var: gpr; // declared a general purpose register named 'var'
  mov var, 1; // assign var = 1
}

#asm {
  // declared a general purpose register named 'var', and mov 1 into it
  mov var: gpr, 1; // assign var = 1
}

#asm {
  // implicitly declare 'var' without specifying the type
  mov var:, 1;
}

Register Allocation

In inline assmebly, the compiler implements register allocation to replace variables with registers, allowing you to use variable names to convey data flow the same way as in high level code. The register allocator takes lifetimes into account. Register management is turned into a working set size problem rather than an annoying book-keeping one. There is no automatic spilling of registers, meaning if you ever exceed the maximum number of alive registers, you will get an error from the compiler.

Pinning a variable to a register

The === operator is used to pin variables to general purpose registers. In this simplified byte swap example, result is assigned to a register. The === operator can be used to map to registers a, b, c, d, sp, bp, si, di, or an integer between 0 and 15.

byte_swap :: (input: s64) -> s64 {
  result := input;
  #asm { 
     result === a;   // result is represented as register a
     bswap.q result;
  }
  return result;
}

In the following example below, the multiply requires the d register and the a register for the multiply instruction. To do z = x * y;, pin the x value to register a, followed by pinning the z value to register d.

x: u64 = 197589578578;
y: u64 = 895173299817;
z: u64 = ---;
#asm {
   x === a; // We pin the high level var 'x' to gpr 'a' as required by mul.
   z === d; // We pin the high level var 'z' to gpr 'd' as required by mul.
   mul z, x, y;
}

Assembly Memory Operands

In x86, there are several memory operands with the format base + index * scale + displacement. Just like in a traditional assembly, you can indicate a memory operand by wrapping it with brackets []. The ordering of the expression is rigid, and must be in the order base + index * scale + displacement. You cannot place the displacement first, or the base second, etc. This reduces ambiguity and confusion when fields can be ambiguous identifiers.

The scale is limited to the number literals 8, 4, 2.

In this example, we demonstrate loading memory into registers.

array: [32] u8;
pointer := array.data;
#asm {
  mov a:, [pointer];      // a := array.data
  mov i:, 10;             // declare i:=10
  mov a,  [pointer + 8];
  mov a,  [pointer + i*1];
}

Load Effective Address (LEA) Load and Read Instruction Example

Here is a basic example to do load effective address. Note that in rax*4, the constant must go after the register. You can look up what LEA does here

#asm {lea.q rax, [rdx];}
#asm {lea.q rax, [rdx + rax*4];}

// NOTE: This does not work, 4*rax is wrong, must be rax*4
// #asm {lea.q rax, [rdx + 4*rax];}

Cross Block `#asm` Referencing

Cross block #asm referencing keeps your registers alive across the procedure. LLVM optimizations cound potentially spill them if required.

block_1 :: #asm { pxor x:, x; }
block_2 :: #asm { movdqu y:, block_1.x; }

Assembly Feature Flag Tagging

This is a feature flag tagging introduced in beta 0.0.084. When you make a block with assembly feature flag tagging, the compiler will error if you use a feature from a feature set you haven’t tagged the block with, unless the feature has been enabled globally in a build script. TODO: need a better description of this!

#asm AVX, AVX2 {

}

Passing Registers through Macro Arguments

As of beta 0.0.090, Registers can be passed through macro arguments, giving you the power of macros while using inline assembly.

add_regs :: (c: __reg, d: __reg) #expand {
  #asm {
     add c, d;
  }
}

main :: () {
  #asm {
     mov a:, 10;
     mov b:, 7;
  }

  add_regs(b, a);
}

SIMD Vector Inline Assembly Example

These are some basic SIMD Vector Code for a few 32-bit floats together at the same time. .x means to adding 4 floats at the same time, while .y indicates adding 8 floats together at the same time.

This example uses addps.x to add 4 32-bit floats together at the same time.

array := float32.[1, 2, 3, 4];
ptr := array.data;
print("array before: %\n",array); // outputs 1, 2, 3, 4
#asm {
  v: vec;
  movups.x v, [ptr];
  addps.x v, v;
  movups.x [ptr], v;
}
print("array after: %\n", array); // outputs 2, 4, 6, 8

This example uses addps.y to add 8 32-bit floats together at the same time.

array := float32.[1, 2, 3, 4, 5, 6, 7, 8];
ptr := array.data;
print("array before: %\n",array); // outputs 1, 2, 3, 4, 5, 6, 7, 8
#asm {
    v: vec;
    movups.y v, [ptr];
    addps.y v, v, v;
    movups.y [ptr], v;
}
print("array after: %\n", array); // outputs 2, 4, 6, 8, 10, 12, 14, 16

Also read: Inline assembly Official Howto

More Assembly Examples

Reversing 64-bits using Inline Assembly
Miscellaneous Assembly Code Examples