Inline assembly can be used to specify exactly what machine language instructions need to be executed in order to get the most optimized code, or doing SIMD instructions for parallelizing data transformations. Here is the basic starter code for inline assembly blocks. Currently, the x64 platform is supported, but there are still a lot of holes in the x64 assembly instructions support.
Places where you can find inline assembly examples: modules/Atomics
, modules/Bit_Operations
, modules/Runtime_Support
.
Here is an excerpt of atomic swap from the Atomics
module that uses assembly language:
atomic_swap :: (dest: *$T, new_value: T) -> (old_value: T) {
SIZE :: size_of(T);
// The Intel documentation says that the lock prefix is ignored
// for xchg, but we'll put it here just in case I guess?
v := new_value;
#if SIZE == 1 {
#asm { lock_xchg.b v, [dest]; }
} else #if SIZE == 2 {
#asm { lock_xchg.w v, [dest]; }
} else #if SIZE == 4 {
#asm { lock_xchg.d v, [dest]; }
} else #if SIZE == 8 {
#asm { lock_xchg.q v, [dest]; }
} else {
#assert false, "Invalid size passed to atomic_swap; argument must be 1, 2, 4, or 8 bytes.";
}
return v;
}
The lock_xchg
is the atomic swap
assembly instruction. The .q
, .d
, .w
, and .b
specifies the size of the assignment. Here is the list of different operations:
.q
is quad-word (64-bit integer).
.d
is double-word (32-bit integer).
.w
is a regular word (16-bit integer).
.b
is a byte (8-bit integer).
.x
is the SSE is in the feature set, xmmword (128-bit)
.y
is the AVX is in the feature set, ymmword (256-bit)
.z
is the AVX512F is in the feature set, zmmword (512-bit)
List of All the Assembly Instructions
Instructions are named based on the mnemonic and operands provided. Instruction mnemonics are identical to the official mnemonic provided by Intel and AMD. With that being said, you can refer to official manuals when programming instead of having to indirectly go through the intrinsics guide. Here is a list of all possible assembly instructions supported: x86 and amd64 instruction reference
Note: This list is based on the my best knowledge. This list could possibly be incorrect, but as far as I know, this is a correct list.
Current Assembly Limitations
There are no goto
and no jump
instructions in the current assembly. There are no call
instructions, and you cannot call a function in the middle of an assembly block.
Assembly Language Data Types
The data types usable within inline assembly are gpr
, str
, vec
, or omr
.
gpr
stands for general purpose register.
gpr.a
means that the gpr must be pinned to the register a
(e.g. EAX === a
)
mem
means the operation must be a memory operand (e.g. lea.q [EAX], rax)
str
stands for stack register, this is used by the fpu and mmx instructions.
vec
stands for a vector type. This is used for manipulating SIMD instructions
omr
stands for op-mask register
Here are some valid assembly language syntax declaration examples:
#asm {
var: gpr; // declared a general purpose register named 'var'
mov var, 1; // assign var = 1
}
#asm {
// declared a general purpose register named 'var', and mov 1 into it
mov var: gpr, 1; // assign var = 1
}
#asm {
// implicitly declare 'var' without specifying the type
mov var:, 1;
}
Register Allocation
In inline assmebly, the compiler implements register allocation to replace variables with registers, allowing you to use variable names to convey data flow the same way as in high level code. The register allocator takes lifetimes into account. Register management is turned into a working set size problem rather than an annoying book-keeping one. There is no automatic spilling of registers, meaning if you ever exceed the maximum number of alive registers, you will get an error from the compiler.
Pinning a variable to a register
The ===
operator is used to pin variables to general purpose registers. In this simplified byte swap example, result is assigned to a register. The ===
operator can be used to map to registers a
, b
, c
, d
, sp
, bp
, si
, di
, or an integer between 0 and 15.
byte_swap :: (input: s64) -> s64 {
result := input;
#asm {
result === a; // result is represented as register a
bswap.q result;
}
return result;
}
In the following example below, the multiply requires the d
register and the a
register for the multiply instruction. To do z = x * y;
, pin the x
value to register a
, followed by pinning the z
value to register d
.
x: u64 = 197589578578;
y: u64 = 895173299817;
z: u64 = ---;
#asm {
x === a; // We pin the high level var 'x' to gpr 'a' as required by mul.
z === d; // We pin the high level var 'z' to gpr 'd' as required by mul.
mul z, x, y;
}
Assembly Memory Operands
In x86, there are several memory operands with the format base + index * scale + displacement
. Just like in a traditional assembly, you can indicate a memory operand by wrapping it with brackets []
. The ordering of the expression is rigid, and must be in the order base + index * scale + displacement
. You cannot place the displacement first, or the base second, etc. This reduces ambiguity and confusion when fields can be ambiguous identifiers.
The scale
is limited to the number literals 8
, 4
, 2
.
In this example, we demonstrate loading memory into registers.
array: [32] u8;
pointer := array.data;
#asm {
mov a:, [pointer]; // a := array.data
mov i:, 10; // declare i:=10
mov a, [pointer + 8];
mov a, [pointer + i*1];
}
Load Effective Address (LEA) Load and Read Instruction Example
Here is a basic example to do load effective address. Note that in rax*4
, the constant must go after the register. You can look up what LEA
does here
#asm {lea.q rax, [rdx];}
#asm {lea.q rax, [rdx + rax*4];}
// NOTE: This does not work, 4*rax is wrong, must be rax*4
// #asm {lea.q rax, [rdx + 4*rax];}
Cross Block #asm
Referencing
Cross block #asm
referencing keeps your registers alive across the procedure. LLVM optimizations cound potentially spill them if required.
block_1 :: #asm { pxor x:, x; }
block_2 :: #asm { movdqu y:, block_1.x; }
Assembly Feature Flag Tagging
This is a feature flag tagging introduced in beta 0.0.084. When you make a block with assembly feature flag tagging, the compiler will error if you use a feature from a feature set you haven’t tagged the block with, unless the feature has been enabled globally in a build script. TODO: need a better description of this!
#asm AVX, AVX2 {
}
Passing Registers through Macro Arguments
As of beta 0.0.090, Registers can be passed through macro arguments, giving you the power of macros while using inline assembly.
add_regs :: (c: __reg, d: __reg) #expand {
#asm {
add c, d;
}
}
main :: () {
#asm {
mov a:, 10;
mov b:, 7;
}
add_regs(b, a);
}
SIMD Vector Inline Assembly Example
These are some basic SIMD Vector Code for a few 32-bit floats together at the same time. .x
means to adding 4 floats at the same time, while .y
indicates adding 8 floats together at the same time.
This example uses addps.x
to add 4 32-bit floats together at the same time.
array := float32.[1, 2, 3, 4];
ptr := array.data;
print("array before: %\n",array); // outputs 1, 2, 3, 4
#asm {
v: vec;
movups.x v, [ptr];
addps.x v, v;
movups.x [ptr], v;
}
print("array after: %\n", array); // outputs 2, 4, 6, 8
This example uses addps.y
to add 8 32-bit floats together at the same time.
array := float32.[1, 2, 3, 4, 5, 6, 7, 8];
ptr := array.data;
print("array before: %\n",array); // outputs 1, 2, 3, 4, 5, 6, 7, 8
#asm {
v: vec;
movups.y v, [ptr];
addps.y v, v, v;
movups.y [ptr], v;
}
print("array after: %\n", array); // outputs 2, 4, 6, 8, 10, 12, 14, 16
Also read: Inline assembly Official Howto
More Assembly Examples
Reversing 64-bits using Inline Assembly
Miscellaneous Assembly Code Examples