docs: Add reference for Thumb2 inline assembler.
Thanks to Peter Hinch for contributing this.
This commit is contained in:
parent
aef3846c13
commit
2110dc5a6d
|
@ -1,3 +1,5 @@
|
||||||
|
.. _pyboard_tutorial_assembler:
|
||||||
|
|
||||||
Inline assembler
|
Inline assembler
|
||||||
================
|
================
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,50 @@
|
||||||
|
Arithmetic instructions
|
||||||
|
=======================
|
||||||
|
|
||||||
|
Document conventions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Notation: ``Rd, Rm, Rn`` denote ARM registers R0-R7. ``immN`` denotes an immediate
|
||||||
|
value having a width of N bits e.g. ``imm8``, ``imm3``. ``carry`` denotes
|
||||||
|
the carry condition flag, ``not(carry)`` denotes its complement. In the case of instructions
|
||||||
|
with more than one register argument, it is permissible for some to be identical. For example
|
||||||
|
the following will add the contents of R0 to itself, placing the result in R0:
|
||||||
|
|
||||||
|
* add(r0, r0, r0)
|
||||||
|
|
||||||
|
Arithmetic instructions affect the condition flags except where stated.
|
||||||
|
|
||||||
|
Addition
|
||||||
|
--------
|
||||||
|
|
||||||
|
* add(Rdn, imm8) ``Rdn = Rdn + imm8``
|
||||||
|
* add(Rd, Rn, imm3) ``Rd = Rn + imm3``
|
||||||
|
* add(Rd, Rn, Rm) ``Rd = Rn +Rm``
|
||||||
|
* adc(Rd, Rn) ``Rd = Rd + Rn + carry``
|
||||||
|
|
||||||
|
Subtraction
|
||||||
|
-----------
|
||||||
|
|
||||||
|
* sub(Rdn, imm8) ``Rdn = Rdn - imm8``
|
||||||
|
* sub(Rd, Rn, imm3) ``Rd = Rn - imm3``
|
||||||
|
* sub(Rd, Rn, Rm) ``Rd = Rn - Rm``
|
||||||
|
* sbc(Rd, Rn) ``Rd = Rd - Rn - not(carry)``
|
||||||
|
|
||||||
|
Negation
|
||||||
|
--------
|
||||||
|
|
||||||
|
* neg(Rd, Rn) ``Rd = -Rn``
|
||||||
|
|
||||||
|
Multiplication and division
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
* mul(Rd, Rn) ``Rd = Rd * Rn``
|
||||||
|
|
||||||
|
This produces a 32 bit result with overflow lost. The result may be treated as
|
||||||
|
signed or unsigned according to the definition of the operands.
|
||||||
|
|
||||||
|
* sdiv(Rd, Rn, Rm) ``Rd = Rn / Rm``
|
||||||
|
* udiv(Rd, Rn, Rm) ``Rd = Rn / Rm``
|
||||||
|
|
||||||
|
These functions perform signed and unsigned division respectively. Condition flags
|
||||||
|
are not affected.
|
|
@ -0,0 +1,90 @@
|
||||||
|
Comparison instructions
|
||||||
|
=======================
|
||||||
|
|
||||||
|
These perform an arithmetic or logical instruction on two arguments, discarding the result
|
||||||
|
but setting the condition flags. Typically these are used to test data values without changing
|
||||||
|
them prior to executing a conditional branch.
|
||||||
|
|
||||||
|
Document conventions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Notation: ``Rd, Rm, Rn`` denote ARM registers R0-R7. ``imm8`` denotes an immediate
|
||||||
|
value having a width of 8 bits.
|
||||||
|
|
||||||
|
The Application Program Status Register (APSR)
|
||||||
|
----------------------------------------------
|
||||||
|
|
||||||
|
This contains four bits which are tested by the conditional branch instructions. Typically a
|
||||||
|
conditional branch will test multiple bits, for example ``bge(LABEL)``. The meaning of
|
||||||
|
condition codes can depend on whether the operands of an arithmetic instruction are viewed as
|
||||||
|
signed or unsigned integers. Thus ``bhi(LABEL)`` assumes unsigned numbers were processed while
|
||||||
|
``bgt(LABEL)`` assumes signed operands.
|
||||||
|
|
||||||
|
APSR Bits
|
||||||
|
---------
|
||||||
|
|
||||||
|
* Z (zero)
|
||||||
|
|
||||||
|
This is set if the result of an operation is zero or the operands of a comparison are equal.
|
||||||
|
|
||||||
|
* N (negative)
|
||||||
|
|
||||||
|
Set if the result is negative.
|
||||||
|
|
||||||
|
* C (carry)
|
||||||
|
|
||||||
|
An addition sets the carry flag when the result overflows out of the MSB, for example adding
|
||||||
|
0x80000000 and 0x80000000. By the nature of two's complement arithmetic this behaviour is reversed
|
||||||
|
on subtraction, with a borrow indicated by the carry bit being clear. Thus 0x10 - 0x01 is executed
|
||||||
|
as 0x10 + 0xffffffff which will set the carry bit.
|
||||||
|
|
||||||
|
* V (overflow)
|
||||||
|
|
||||||
|
The overflow flag is set if the result, viewed as a two's compliment number, has the "wrong" sign
|
||||||
|
in relation to the operands. For example adding 1 to 0x7fffffff will set the overflow bit because
|
||||||
|
the result (0x8000000), viewed as a two's complement integer, is negative. Note that in this instance
|
||||||
|
the carry bit is not set.
|
||||||
|
|
||||||
|
Comparison instructions
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
These set the APSR (Application Program Status Register) N (negative), Z (zero), C (carry) and V
|
||||||
|
(overflow) flags.
|
||||||
|
|
||||||
|
* cmp(Rn, imm8) ``Rn - imm8``
|
||||||
|
* cmp(Rn, Rm) ``Rn - Rm``
|
||||||
|
* cmn(Rn, Rm) ``Rn + Rm``
|
||||||
|
* tst(Rn, Rm) ``Rn & Rm``
|
||||||
|
|
||||||
|
Conditional execution
|
||||||
|
---------------------
|
||||||
|
|
||||||
|
The ``it`` and ``ite`` instructions provide a means of conditionally executing from one to four subsequent
|
||||||
|
instructions without the need for a label.
|
||||||
|
|
||||||
|
* it(<condition>) If then
|
||||||
|
|
||||||
|
Execute the next instruction if <condition> is true:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
cmp(r0, r1)
|
||||||
|
it(eq)
|
||||||
|
mov(r0, 100) # runs if r0 == r1
|
||||||
|
# execution continues here
|
||||||
|
|
||||||
|
* ite(<condition>) If then else
|
||||||
|
|
||||||
|
If <condtion> is true, execute the next instruction, otherwise execute the
|
||||||
|
subsequent one. Thus:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
cmp(r0, r1)
|
||||||
|
ite(eq)
|
||||||
|
mov(r0, 100) # runs if r0 == r1
|
||||||
|
mov(r0, 200) # runs if r0 != r1
|
||||||
|
# execution continues here
|
||||||
|
|
||||||
|
This may be extended to control the execution of upto four subsequent instructions: it[x[y[z]]]
|
||||||
|
where x,y,z=t/e; e.g. itt, itee, itete, ittte, itttt, iteee, etc.
|
|
@ -0,0 +1,36 @@
|
||||||
|
Assembler Directives
|
||||||
|
====================
|
||||||
|
|
||||||
|
Labels
|
||||||
|
------
|
||||||
|
|
||||||
|
* label(INNER1)
|
||||||
|
|
||||||
|
This defines a label for use in a branch instruction. Thus elsewhere in the code a ``b(INNER1)``
|
||||||
|
will cause execution to continue with the instruction after the label directive.
|
||||||
|
|
||||||
|
Defining inline data
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
The following assembler directives facilitate embedding data in an assembler code block.
|
||||||
|
|
||||||
|
* data(size, d0, d1 .. dn)
|
||||||
|
|
||||||
|
The data directive creates n array of data values in memory. The first argument specifies the
|
||||||
|
size in bytes of the subsequent arguments. Hence the first statement below will cause the
|
||||||
|
assembler to put three bytes (with values 2, 3 and 4) into consecutive memory locations
|
||||||
|
while the second will cause it to emit two four byte words.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
data(1, 2, 3, 4)
|
||||||
|
data(4, 2, 100000)
|
||||||
|
|
||||||
|
Data values longer than a single byte are stored in memory in little-endian format.
|
||||||
|
|
||||||
|
* align(nBytes)
|
||||||
|
|
||||||
|
Align the following instruction to an nBytes value. ARM Thumb-2 instructions must be two
|
||||||
|
byte aligned, hence it's advisable to issue ``align(2)`` after ``data`` directives and
|
||||||
|
prior to any subsequent code. This ensures that the code will run irrespective of the
|
||||||
|
size of the data array.
|
|
@ -0,0 +1,77 @@
|
||||||
|
Floating Point instructions
|
||||||
|
==============================
|
||||||
|
|
||||||
|
These instructions support the use of the ARM floating point coprocessor
|
||||||
|
(on platforms such as the Pyboard which are equipped with one). The FPU
|
||||||
|
has 32 registers known as ``s0-s31`` each of which can hold a single
|
||||||
|
precision float. Data can be passed between the FPU registers and the
|
||||||
|
ARM core registers with the ``vmov`` instruction.
|
||||||
|
|
||||||
|
Note that MicroPython doesn't support passing floats to
|
||||||
|
assembler functions, nor can you put a float into ``r0`` and expect a
|
||||||
|
reasonable result. There are two ways to overcome this. The first is to
|
||||||
|
use arrays, and the second is to pass and/or return integers and convert
|
||||||
|
to and from floats in code.
|
||||||
|
|
||||||
|
Document conventions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Notation: ``Sd, Sm, Sn`` denote FPU registers, ``Rd, Rm, Rn`` denote ARM core
|
||||||
|
registers. The latter can be any ARM core register although registers
|
||||||
|
``R13-R15`` are unlikely to be appropriate in this context.
|
||||||
|
|
||||||
|
Arithmetic
|
||||||
|
----------
|
||||||
|
|
||||||
|
* vadd(Sd, Sn, Sm) ``Sd = Sn + Sm``
|
||||||
|
* vsub(Sd, Sn, Sm) ``Sd = Sn - Sm``
|
||||||
|
* vneg(Sd, Sm) ``Sd = -Sm``
|
||||||
|
* vmul(Sd, Sn, Sm) ``Sd = Sn * Sm``
|
||||||
|
* vdiv(Sd, Sn, Sm) ``Sd = Sn / Sm``
|
||||||
|
* vsqrt(Sd, Sm) ``Sd = sqrt(Sm)``
|
||||||
|
|
||||||
|
Registers may be identical: ``vmul(S0, S0, S0)`` will execute ``S0 = S0*S0``
|
||||||
|
|
||||||
|
Move between ARM core and FPU registers
|
||||||
|
---------------------------------------
|
||||||
|
|
||||||
|
* vmov(Sd, Rm) ``Sd = Rm``
|
||||||
|
* vmov(Rd, Sm) ``Rd = Sm``
|
||||||
|
|
||||||
|
The FPU has a register known as FPSCR, similar to the ARM core's APSR, which stores condition
|
||||||
|
codes plus other data. The following instructions provide access to this.
|
||||||
|
|
||||||
|
* vmrs(APSR\_nzcv, FPSCR)
|
||||||
|
|
||||||
|
Move the floating-point N, Z, C, and V flags to the APSR N, Z, C, and V flags.
|
||||||
|
|
||||||
|
This is done after an instruction such as an FPU
|
||||||
|
comparison to enable the condition codes to be tested by the assembler
|
||||||
|
code. The following is a more general form of the instruction.
|
||||||
|
|
||||||
|
* vmrs(Rd, FPSCR) ``Rd = FPSCR``
|
||||||
|
|
||||||
|
Move between FPU register and memory
|
||||||
|
------------------------------------
|
||||||
|
|
||||||
|
* vldr(Sd, [Rn, offset]) ``Sd = [Rn + offset]``
|
||||||
|
* vstr(Sd, [Rn, offset]) ``[Rn + offset] = Sd``
|
||||||
|
|
||||||
|
Where ``[Rn + offset]`` denotes the memory address obtained by adding Rn to the offset. This
|
||||||
|
is specified in bytes. Since each float value occupies a 32 bit word, when accessing arrays of
|
||||||
|
floats the offset must always be a multiple of four bytes.
|
||||||
|
|
||||||
|
Data Comparison
|
||||||
|
---------------
|
||||||
|
|
||||||
|
* vcmp(Sd, Sm)
|
||||||
|
|
||||||
|
Compare the values in Sd and Sm and set the FPU N, Z,
|
||||||
|
C, and V flags. This would normally be followed by ``vmrs(APSR_nzcv, FPSCR)``
|
||||||
|
to enable the results to be tested.
|
||||||
|
|
||||||
|
Convert between integer and float
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
* vcvt\_f32\_s32(Sd, Sm) ``Sd = float(Sm)``
|
||||||
|
* vcvt\_s32\_f32(Sd, Sm) ``Sd = int(Sm)``
|
|
@ -0,0 +1,232 @@
|
||||||
|
Hints and tips
|
||||||
|
==============
|
||||||
|
|
||||||
|
The following are some examples of the use of the inline assembler and some
|
||||||
|
information on how to work around its limitations. In this document the term
|
||||||
|
"assembler function" refers to a function declared in Python with the
|
||||||
|
``@micropython.asm_thumb`` decorator, whereas "subroutine" refers to assembler
|
||||||
|
code called from within an assembler function.
|
||||||
|
|
||||||
|
Code branches and subroutines
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
It is important to appreciate that labels are local to an assembler function.
|
||||||
|
There is currently no way for a subroutine defined in one function to be called
|
||||||
|
from another.
|
||||||
|
|
||||||
|
To call a subroutine the instruction ``bl(LABEL)`` is issued. This transfers
|
||||||
|
control to the instruction following the ``label(LABEL)`` directive and stores
|
||||||
|
the return address in the link register (``lr`` or ``r14``). To return the
|
||||||
|
instruction ``bx(lr)`` is issued which causes execution to continue with
|
||||||
|
the instruction following the subroutine call. This mechanism implies that, if
|
||||||
|
a subroutine is to call another, it must save the link register prior to
|
||||||
|
the call and restore it before terminating.
|
||||||
|
|
||||||
|
The following rather contrived example illustrates a function call. Note that
|
||||||
|
it's necessary at the start to branch around all subroutine calls: subroutines
|
||||||
|
end execution with ``bx(lr)`` while the outer function simply "drops off the end"
|
||||||
|
in the style of Python functions.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
@micropython.asm_thumb
|
||||||
|
def quad(r0):
|
||||||
|
b(START)
|
||||||
|
label(DOUBLE)
|
||||||
|
add(r0, r0, r0)
|
||||||
|
bx(lr)
|
||||||
|
label(START)
|
||||||
|
bl(DOUBLE)
|
||||||
|
bl(DOUBLE)
|
||||||
|
|
||||||
|
print(quad(10))
|
||||||
|
|
||||||
|
The following code example demonstrates a nested (recursive) call: the classic
|
||||||
|
Fibonacci sequence. Here, prior to a recursive call, the link register is saved
|
||||||
|
along with other registers which the program logic requires to be preserved.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
@micropython.asm_thumb
|
||||||
|
def fib(r0):
|
||||||
|
b(START)
|
||||||
|
label(DOFIB)
|
||||||
|
push({r1, r2, lr})
|
||||||
|
cmp(r0, 1)
|
||||||
|
ble(FIBDONE)
|
||||||
|
sub(r0, 1)
|
||||||
|
mov(r2, r0) # r2 = n -1
|
||||||
|
bl(DOFIB)
|
||||||
|
mov(r1, r0) # r1 = fib(n -1)
|
||||||
|
sub(r0, r2, 1)
|
||||||
|
bl(DOFIB) # r0 = fib(n -2)
|
||||||
|
add(r0, r0, r1)
|
||||||
|
label(FIBDONE)
|
||||||
|
pop({r1, r2, lr})
|
||||||
|
bx(lr)
|
||||||
|
label(START)
|
||||||
|
bl(DOFIB)
|
||||||
|
|
||||||
|
for n in range(10):
|
||||||
|
print(fib(n))
|
||||||
|
|
||||||
|
Argument passing and return
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
The tutorial details the fact that assembler functions can support from zero to
|
||||||
|
three arguments, which must (if used) be named ``r0``, ``r1`` and ``r2``. When
|
||||||
|
the code executes the registers will be initialised to those values.
|
||||||
|
|
||||||
|
The data types which can be passed in this way are integers and memory
|
||||||
|
addresses. Further, integers are restricted in that the top two bits
|
||||||
|
must be identical, limiting the range to -2**30 to 2**30 -1. Return
|
||||||
|
values are similarly limited. These limitations can be overcome by means
|
||||||
|
of the ``array`` module to allow any number of values of any type to
|
||||||
|
be accessed.
|
||||||
|
|
||||||
|
Multiple arguments
|
||||||
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
If a Python array of integers is passed as an argument to an assembler
|
||||||
|
function, the function will receive the address of a contiguous set of integers.
|
||||||
|
Thus multiple arguments can be passed as elements of a single array. Similarly a
|
||||||
|
function can return multiple values by assigning them to array elements.
|
||||||
|
Assembler functions have no means of determining the length of an array:
|
||||||
|
this will need to be passed to the function.
|
||||||
|
|
||||||
|
This use of arrays can be extended to enable more than three arrays to be used.
|
||||||
|
This is done using indirection: the ``uctypes`` module supports ``addressof()``
|
||||||
|
which will return the address of an array passed as its argument. Thus you can
|
||||||
|
populate an integer array with the addresses of other arrays:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
from uctypes import addressof
|
||||||
|
@micropython.asm_thumb
|
||||||
|
def getindirect(r0):
|
||||||
|
ldr(r0, [r0, 0]) # Address of array loaded from passed array
|
||||||
|
ldr(r0, [r0, 4]) # Return element 1 of indirect array (24)
|
||||||
|
|
||||||
|
def testindirect():
|
||||||
|
a = array.array('i',[23, 24])
|
||||||
|
b = array.array('i',[0,0])
|
||||||
|
b[0] = addressof(a)
|
||||||
|
print(getindirect(b))
|
||||||
|
|
||||||
|
Non-integer data types
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
These may be handled by means of arrays of the appropriate data type. For
|
||||||
|
example, single precison floating point data may be processed as follows.
|
||||||
|
This code example takes an array of floats and replaces its contents with
|
||||||
|
their squares.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
from array import array
|
||||||
|
|
||||||
|
@micropython.asm_thumb
|
||||||
|
def square(r0, r1):
|
||||||
|
label(LOOP)
|
||||||
|
vldr(s0, [r0, 0])
|
||||||
|
vmul(s0, s0, s0)
|
||||||
|
vstr(s0, [r0, 0])
|
||||||
|
add(r0, 4)
|
||||||
|
sub(r1, 1)
|
||||||
|
bgt(LOOP)
|
||||||
|
|
||||||
|
a = array('f', (x for x in range(10)))
|
||||||
|
square(a, len(a))
|
||||||
|
print(a)
|
||||||
|
|
||||||
|
The uctypes module supports the use of data structures beyond simple
|
||||||
|
arrays. It enables a Python data structure to be mapped onto a bytearray
|
||||||
|
instance which may then be passed to the assembler function.
|
||||||
|
|
||||||
|
Named constants
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Assembler code may be made more readable and maintainable by using named
|
||||||
|
constants rather than littering code with numbers. This may be achieved
|
||||||
|
thus:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
MYDATA = const(33)
|
||||||
|
|
||||||
|
@micropython.asm_thumb
|
||||||
|
def foo():
|
||||||
|
mov(r0, MYDATA)
|
||||||
|
|
||||||
|
The const() construct causes MicroPython to replace the variable name
|
||||||
|
with its value at compile time. If constants are declared in an outer
|
||||||
|
Python scope they can be shared between mutiple assembler functions and
|
||||||
|
with Python code.
|
||||||
|
|
||||||
|
Assembler code as class methods
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
MicroPython passes the address of the object instance as the first argument
|
||||||
|
to class methods. This is normally of little use to an assembler function.
|
||||||
|
It can be avoided by declaring the function as a static method thus:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
class foo:
|
||||||
|
@staticmethod
|
||||||
|
@micropython.asm_thumb
|
||||||
|
def bar(r0):
|
||||||
|
add(r0, r0, r0)
|
||||||
|
|
||||||
|
Use of unsupported instructions
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
These can be coded using the data statement as shown below. While
|
||||||
|
``push()`` and ``pop()`` are supported the example below illustrates the
|
||||||
|
principle. The necessary machine code may be found in the ARM v7-M
|
||||||
|
Architecture Reference Manual. Note that the first argument of data
|
||||||
|
calls such as
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
data(2, 0xe92d, 0x0f00) # push r8,r9,r10,r11
|
||||||
|
|
||||||
|
indicates that each subsequent argument is a two byte quantity.
|
||||||
|
|
||||||
|
Overcoming MicroPython's integer restriction
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
|
The Pyboard chip includes a CRC generator. Its use presents a problem in
|
||||||
|
MicroPython because the returned values cover the full gamut of 32 bit
|
||||||
|
quantities whereas small integers in MicroPython cannot have differing values
|
||||||
|
in bits 30 and 31. This limitation is overcome with the following code, which
|
||||||
|
uses assembler to put the result into an array and Python code to
|
||||||
|
coerce the result into an arbitrary precision unsigned integer.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
from array import array
|
||||||
|
import stm
|
||||||
|
|
||||||
|
def enable_crc():
|
||||||
|
stm.mem32[stm.RCC + stm.RCC_AHB1ENR] |= 0x1000
|
||||||
|
|
||||||
|
def reset_crc():
|
||||||
|
stm.mem32[stm.CRC+stm.CRC_CR] = 1
|
||||||
|
|
||||||
|
@micropython.asm_thumb
|
||||||
|
def getval(r0, r1):
|
||||||
|
movwt(r3, stm.CRC + stm.CRC_DR)
|
||||||
|
str(r1, [r3, 0])
|
||||||
|
ldr(r2, [r3, 0])
|
||||||
|
str(r2, [r0, 0])
|
||||||
|
|
||||||
|
def getcrc(value):
|
||||||
|
a = array('i', [0])
|
||||||
|
getval(a, value)
|
||||||
|
return a[0] & 0xffffffff # coerce to arbitrary precision
|
||||||
|
|
||||||
|
enable_crc()
|
||||||
|
reset_crc()
|
||||||
|
for x in range(20):
|
||||||
|
print(hex(getcrc(0)))
|
|
@ -0,0 +1,73 @@
|
||||||
|
.. _asm_thumb2_index:
|
||||||
|
|
||||||
|
Inline Assembler for Thumb2 architectures
|
||||||
|
=========================================
|
||||||
|
|
||||||
|
This document assumes some familiarity with assembly language programming and should be read after studying
|
||||||
|
the :ref:`tutorial <pyboard_tutorial_assembler>`. For a detailed description of the instruction set consult the
|
||||||
|
Architecture Reference Manual detailed below.
|
||||||
|
The inline assembler supports a subset of the ARM Thumb-2 instruction set described here. The syntax tries
|
||||||
|
to be as close as possible to that defined in the above ARM manual, converted to Python function calls.
|
||||||
|
|
||||||
|
Instructions operate on 32 bit signed integer data except where stated otherwise. Most supported instructions
|
||||||
|
operate on registers ``R0-R7`` only: where ``R8-R15`` are supported this is stated. Registers ``R8-R12`` must be
|
||||||
|
restored to their initial value before return from a function. Registers ``R13-R15`` constitute the Link Register,
|
||||||
|
Stack Pointer and Program Counter respectively.
|
||||||
|
|
||||||
|
Document conventions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Where possible the behaviour of each instruction is described in Python, for example
|
||||||
|
|
||||||
|
* add(Rd, Rn, Rm) ``Rd = Rn + Rm``
|
||||||
|
|
||||||
|
This enables the effect of instructions to be demonstrated in Python. In certain case this is impossible
|
||||||
|
because Python doesn't support concepts such as indirection. The pseudocode employed in such cases is
|
||||||
|
described on the relevant page.
|
||||||
|
|
||||||
|
Instruction Categories
|
||||||
|
----------------------
|
||||||
|
|
||||||
|
The following sections details the subset of the ARM Thumb-2 instruction set supported by MicroPython.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
:numbered:
|
||||||
|
|
||||||
|
asm_thumb2_mov.rst
|
||||||
|
asm_thumb2_ldr.rst
|
||||||
|
asm_thumb2_str.rst
|
||||||
|
asm_thumb2_logical_bit.rst
|
||||||
|
asm_thumb2_arith.rst
|
||||||
|
asm_thumb2_compare.rst
|
||||||
|
asm_thumb2_label_branch.rst
|
||||||
|
asm_thumb2_stack.rst
|
||||||
|
asm_thumb2_misc.rst
|
||||||
|
asm_thumb2_float.rst
|
||||||
|
asm_thumb2_directives.rst
|
||||||
|
|
||||||
|
Usage examples
|
||||||
|
--------------
|
||||||
|
|
||||||
|
These sections provide further code examples and hints on the use of the assembler.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
:numbered:
|
||||||
|
|
||||||
|
asm_thumb2_hints_tips.rst
|
||||||
|
|
||||||
|
References
|
||||||
|
----------
|
||||||
|
|
||||||
|
- :ref:`Assembler Tutorial <pyboard_tutorial_assembler>`
|
||||||
|
- `Wiki hints and tips
|
||||||
|
<http://wiki.micropython.org/platforms/boards/pyboard/assembler>`__
|
||||||
|
- `uPy Inline Assembler source-code,
|
||||||
|
emitinlinethumb.c <https://github.com/micropython/micropython/blob/master/py/emitinlinethumb.c>`__
|
||||||
|
- `ARM Thumb2 Instruction Set Quick Reference
|
||||||
|
Card <http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001l/QRC0001_UAL.pdf>`__
|
||||||
|
- `RM0090 Reference
|
||||||
|
Manual <http://www.google.ae/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&sqi=2&ved=0CBoQFjAA&url=http%3A%2F%2Fwww.st.com%2Fst-web-ui%2Fstatic%2Factive%2Fen%2Fresource%2Ftechnical%2Fdocument%2Freference_manual%2FDM00031020.pdf&ei=G0rSU66xFeuW0QWYwoD4CQ&usg=AFQjCNFuW6TgzE4QpahO_U7g3f3wdwecAg&sig2=iET-R0y9on_Pbflzf9aYDw&bvm=bv.71778758,bs.1,d.bGQ>`__
|
||||||
|
- ARM v7-M Architecture Reference Manual (Available on the
|
||||||
|
ARM site after a simple registration procedure. Also available on academic sites but beware of out of date versions.)
|
|
@ -0,0 +1,85 @@
|
||||||
|
Branch instructions
|
||||||
|
===================
|
||||||
|
|
||||||
|
These cause execution to jump to a target location usually specified by a label (see the ``label``
|
||||||
|
assembler directive). Conditional branches and the ``it`` and ``ite`` instructions test
|
||||||
|
the Application Program Status Register (APSR) N (negative), Z (zero), C (carry) and V
|
||||||
|
(overflow) flags to determine whether the branch should be executed.
|
||||||
|
|
||||||
|
Most of the exposed assembler instructions (including move operations) set the flags but
|
||||||
|
there are explicit comparison instructions to enable values to be tested.
|
||||||
|
|
||||||
|
Further detail on the meaning of the condition flags is provided in the section
|
||||||
|
describing comparison functions.
|
||||||
|
|
||||||
|
Document conventions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Notation: ``Rm`` denotes ARM registers R0-R15. ``LABEL`` denotes a label defined with the
|
||||||
|
``label()`` assembler directive. ``<condition>`` indicates one of the following condition
|
||||||
|
specifiers:
|
||||||
|
|
||||||
|
* eq Equal to (result was zero)
|
||||||
|
* ne Not equal
|
||||||
|
* cs Carry set
|
||||||
|
* cc Carry clear
|
||||||
|
* mi Minus (negaive)
|
||||||
|
* pl Plus (positive)
|
||||||
|
* vs Overflow set
|
||||||
|
* vc Overflow clear
|
||||||
|
* hi > (unsigned comparison)
|
||||||
|
* ls <= (unsigned comparison)
|
||||||
|
* ge >= (signed comparison)
|
||||||
|
* lt < (signed comparison)
|
||||||
|
* gt > (signed comparison)
|
||||||
|
* le <= (signed comparison)
|
||||||
|
|
||||||
|
Branch to label
|
||||||
|
---------------
|
||||||
|
|
||||||
|
* b(LABEL) Unconditional branch
|
||||||
|
* beq(LABEL) branch if equal
|
||||||
|
* bne(LABEL) branch if not equal
|
||||||
|
* bge(LABEL) branch if greater than or equal
|
||||||
|
* bgt(LABEL) branch if greater than
|
||||||
|
* blt(LABEL) branch if less than (<) (signed)
|
||||||
|
* ble(LABEL) branch if less than or equal to (<=) (signed)
|
||||||
|
* bcs(LABEL) branch if carry flag is set
|
||||||
|
* bcc(LABEL) branch if carry flag is clear
|
||||||
|
* bmi(LABEL) branch if negative
|
||||||
|
* bpl(LABEL) branch if positive
|
||||||
|
* bvs(LABEL) branch if overflow flag set
|
||||||
|
* bvc(LABEL) branch if overflow flag is clear
|
||||||
|
* bhi(LABEL) branch if higher (unsigned)
|
||||||
|
* bls(LABEL) branch if lower or equal (unsigned)
|
||||||
|
|
||||||
|
Long branches
|
||||||
|
-------------
|
||||||
|
|
||||||
|
The code produced by the branch instructions listed above uses a fixed bit width to specify the
|
||||||
|
branch destination, which is PC relative. Consequently in long programs where the
|
||||||
|
branch instruction is remote from its destination the assembler will produce a "branch not in
|
||||||
|
range" error. This can be overcome with the "wide" variants such as
|
||||||
|
|
||||||
|
* beq\_w(LABEL) long branch if equal
|
||||||
|
|
||||||
|
Wide branches use 4 bytes to encode the instruction (compared with 2 bytes for standard branch instructions).
|
||||||
|
|
||||||
|
Subroutines (functions)
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
When entering a subroutine the processor stores the return address in register r14, also
|
||||||
|
known as the link register (lr). Return to the instruction after the subroutine call is
|
||||||
|
performed by updating the program counter (r15 or pc) from the link register, This
|
||||||
|
process is handled by the following instructions.
|
||||||
|
|
||||||
|
* bl(LABEL)
|
||||||
|
|
||||||
|
Transfer execution to the instruction after ``LABEL`` storing the return address in
|
||||||
|
the link register (r14).
|
||||||
|
|
||||||
|
* bx(Rm) Branch to address specified by Rm.
|
||||||
|
|
||||||
|
Typically ``bx(lr)`` is issued to return from a subroutine. For nested subroutines the
|
||||||
|
link register of outer scopes must be saved (usually on the stack) before performing
|
||||||
|
inner subroutine calls.
|
|
@ -0,0 +1,23 @@
|
||||||
|
Load register from memory
|
||||||
|
=========================
|
||||||
|
|
||||||
|
Document conventions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Notation: ``Rt, Rn`` denote ARM registers R0-R7 except where stated. ``immN`` represents an immediate
|
||||||
|
value having a width of N bits hence ``imm5`` is constrained to the range 0-31. ``[Rn + immN]`` is the contents
|
||||||
|
of the memory address obtained by adding Rn and the offset ``immN``. Offsets are measured in
|
||||||
|
bytes. These instructions affect the condition flags.
|
||||||
|
|
||||||
|
Register Load
|
||||||
|
-------------
|
||||||
|
|
||||||
|
* ldr(Rt, [Rn, imm7]) ``Rt = [Rn + imm7]`` Load a 32 bit word
|
||||||
|
* ldrb(Rt, [Rn, imm5]) ``Rt = [Rn + imm5]`` Load a byte
|
||||||
|
* ldrh(Rt, [Rn, imm6]) ``Rt = [Rn + imm6]`` Load a 16 bit half word
|
||||||
|
|
||||||
|
Where a byte or half word is loaded, it is zero-extended to 32 bits.
|
||||||
|
|
||||||
|
The specified immediate offsets are measured in bytes. Hence in the case of ``ldr`` the 7 bit value
|
||||||
|
enables 32 bit word aligned values to be accessed with a maximum offset of 31 words. In the case of ``ldrh`` the
|
||||||
|
6 bit value enables 16 bit half-word aligned values to be accessed with a maximum offset of 31 half-words.
|
|
@ -0,0 +1,53 @@
|
||||||
|
Logical & Bitwise instructions
|
||||||
|
==============================
|
||||||
|
|
||||||
|
Document conventions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Notation: ``Rd, Rn`` denote ARM registers R0-R7 except in the case of the
|
||||||
|
special instructions where R0-R15 may be used. ``Rn<a-b>`` denotes an ARM register
|
||||||
|
whose contents must lie in range ``a <= contents <= b``. In the case of instructions
|
||||||
|
with two register arguments, it is permissible for them to be identical. For example
|
||||||
|
the following will zero R0 (Python ``R0 ^= R0``) regardless of its initial contents.
|
||||||
|
|
||||||
|
* eor(r0, r0)
|
||||||
|
|
||||||
|
These instructions affect the condition flags except where stated.
|
||||||
|
|
||||||
|
Logical instructions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
* and\_(Rd, Rn) ``Rd &= Rn``
|
||||||
|
* orr(Rd, Rn) ``Rd |= Rn``
|
||||||
|
* eor(Rd, Rn) ``Rd ^= Rn``
|
||||||
|
* mvn(Rd, Rn) ``Rd = Rn ^ 0xffffffff`` i.e. Rd = 1's complement of Rn
|
||||||
|
* bic(Rd, Rn) ``Rd &= ~Rn`` bit clear Rd using mask in Rn
|
||||||
|
|
||||||
|
Note the use of "and\_" instead of "and", because "and" is a reserved keyword in Python.
|
||||||
|
|
||||||
|
Shift and rotation instructions
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
* lsl(Rd, Rn<0-31>) ``Rd <<= Rn``
|
||||||
|
* lsr(Rd, Rn<1-32>) ``Rd = (Rd & 0xffffffff) >> Rn`` Logical shift right
|
||||||
|
* asr(Rd, Rn<1-32>) ``Rd >>= Rn`` arithmetic shift right
|
||||||
|
* ror(Rd, Rn<1-31>) ``Rd = rotate_right(Rd, Rn)`` Rd is rotated right Rn bits.
|
||||||
|
|
||||||
|
A rotation by (for example) three bits works as follows. If Rd initially
|
||||||
|
contains bits ``b31 b30..b0`` after rotation it will contain ``b2 b1 b0 b31 b30..b3``
|
||||||
|
|
||||||
|
Special instructions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Condition codes are unaffected by these instructions.
|
||||||
|
|
||||||
|
* clz(Rd, Rn) ``Rd = count_leading_zeros(Rn)``
|
||||||
|
|
||||||
|
count_leading_zeros(Rn) returns the number of binary zero bits before the first binary one bit in Rn.
|
||||||
|
|
||||||
|
* rbit(Rd, Rn) ``Rd = bit_reverse(Rn)``
|
||||||
|
|
||||||
|
bit_reverse(Rn) returns the bit-reversed contents of Rn. If Rn contains bits ``b31 b30..b0`` Rd will be set
|
||||||
|
to ``b0 b1 b2..b31``
|
||||||
|
|
||||||
|
Trailing zeros may be counted by performing a bit reverse prior to executing clz.
|
|
@ -0,0 +1,10 @@
|
||||||
|
Miscellaneous instructions
|
||||||
|
==========================
|
||||||
|
|
||||||
|
* nop() ``pass`` no operation.
|
||||||
|
* wfi() Suspend execution in a low power state until an interrupt occurs.
|
||||||
|
* cpsid(flags) set the Priority Mask Register - disable interrupts.
|
||||||
|
* cpsie(flags) clear the Priority Mask Register - enable interrupts.
|
||||||
|
|
||||||
|
Currently the ``cpsie()`` and ``cpsid()`` functions are partially implemented.
|
||||||
|
They require but ignore the flags argument and serve as a means of enabling and disabling interrupts.
|
|
@ -0,0 +1,28 @@
|
||||||
|
Register move instructions
|
||||||
|
==========================
|
||||||
|
|
||||||
|
Document conventions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Notation: ``Rd, Rn`` denote ARM registers R0-R15. ``immN`` denotes an immediate
|
||||||
|
value having a width of N bits. These instructions affect the condition flags.
|
||||||
|
|
||||||
|
Register moves
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Where immediate values are used, these are zero-extended to 32 bits. Thus
|
||||||
|
``mov(R0, 0xff)`` will set R0 to 255.
|
||||||
|
|
||||||
|
* mov(Rd, imm8) ``Rd = imm8``
|
||||||
|
* mov(Rd, Rn) ``Rd = Rn``
|
||||||
|
* movw(Rd, imm16) ``Rd = imm16``
|
||||||
|
* movt(Rd, imm16) ``Rd = (Rd & 0xffff) | (imm16 << 16)``
|
||||||
|
|
||||||
|
movt writes an immediate value to the top halfword of the destination register.
|
||||||
|
It does not affect the contents of the bottom halfword.
|
||||||
|
|
||||||
|
* movwt(Rd, imm30) ``Rd = imm30``
|
||||||
|
|
||||||
|
movwt is a pseudo-instruction: the MicroPython assembler emits a ``movw`` and a ``movt``
|
||||||
|
to move a zero extended 30 bit value into Rd. Where the full 32 bits are required a
|
||||||
|
workround is to use the movw and movt operations.
|
|
@ -0,0 +1,20 @@
|
||||||
|
Stack push and pop
|
||||||
|
==================
|
||||||
|
|
||||||
|
Document conventions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
The ``push()`` and ``pop()`` instructions accept as their argument a register set containing
|
||||||
|
a subset, or possibly all, of the general-purpose registers R0-R12 and the link register (lr or R14).
|
||||||
|
As with any Python set the order in which the registers are specified is immaterial. Thus the
|
||||||
|
in the following example the pop() instruction would restore R1, R7 and R8 to their contents prior
|
||||||
|
to the push():
|
||||||
|
|
||||||
|
* push({r1, r8, r7}) Save three registers on the stack.
|
||||||
|
* pop({r7, r1, r8}) Restore them
|
||||||
|
|
||||||
|
Stack operations
|
||||||
|
----------------
|
||||||
|
|
||||||
|
* push({regset}) Push a set of registers onto the stack
|
||||||
|
* pop({regset}) Restore a set of registers from the stack
|
|
@ -0,0 +1,21 @@
|
||||||
|
Store register to memory
|
||||||
|
========================
|
||||||
|
|
||||||
|
Document conventions
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
Notation: ``Rt, Rn`` denote ARM registers R0-R7 except where stated. ``immN`` represents an immediate
|
||||||
|
value having a width of N bits hence ``imm5`` is constrained to the range 0-31. ``[Rn + imm5]`` is the
|
||||||
|
contents of the memory address obtained by adding Rn and the offset ``imm5``. Offsets are measured in
|
||||||
|
bytes. These instructions do not affect the condition flags.
|
||||||
|
|
||||||
|
Register Store
|
||||||
|
--------------
|
||||||
|
|
||||||
|
* str(Rt, [Rn, imm7]) ``[Rn + imm7] = Rt`` Store a 32 bit word
|
||||||
|
* strb(Rt, [Rn, imm5]) ``[Rn + imm5] = Rt`` Store a byte (b0-b7)
|
||||||
|
* strh(Rt, [Rn, imm6]) ``[Rn + imm6] = Rt`` Store a 16 bit half word (b0-b15)
|
||||||
|
|
||||||
|
The specified immediate offsets are measured in bytes. Hence in the case of ``str`` the 7 bit value
|
||||||
|
enables 32 bit word aligned values to be accessed with a maximum offset of 31 words. In the case of ``strh`` the
|
||||||
|
6 bit value enables 16 bit half-word aligned values to be accessed with a maximum offset of 31 half-words.
|
Loading…
Reference in New Issue