For architectures where size_t is less than 32 bits (eg 16 bits) the args
must be casted to uint32_t so the left shift will work. For architectures
where size_t is greater than 32 bits (eg 64 bits) this new casting will not
lose any bits because the end result must anyway fit in a uint32_t.
When obj.h is compiled as C++ code, the cl compiler emits a warning about
possibly unsafe mixing of size_t and bool types in the or operation in
MP_OBJ_FUN_MAKE_SIG. Similarly there's an implicit narrowing integer
conversion in runtime.h. This commit fixes this by being explicit.
With 5 arguments to mp_arg_check_num(), some architectures need to pass
values on the stack. So compressing n_args_min, n_args_max, takes_kw into
a single word and passing only 3 arguments makes the call more efficient,
because almost all calls to this function pass in constant values. Code
size is also reduced by a decent amount:
bare-arm: -116
minimal x86: -64
unix x64: -256
unix nanbox: -112
stm32: -324
cc3200: -192
esp8266: -192
esp32: -144
Prior to this commit a function compiled with the native decorator
@micropython.native would not work correctly when accessing global
variables, because the globals dict was not being set upon function entry.
This commit fixes this problem by, upon function entry, setting as the
current globals dict the globals dict context the function was defined
within, as per normal Python semantics, and as bytecode does. Upon
function exit the original globals dict is restored.
In order to restore the globals dict when an exception is raised the native
function must guard its internals with an nlr_push/nlr_pop pair. Because
this push/pop is relatively expensive, in both C stack usage for the
nlr_buf_t and CPU execution time, the implementation here optimises things
as much as possible. First, the compiler keeps track of whether a function
even needs to access global variables. Using this information the native
emitter then generates three different kinds of code:
1. no globals used, no exception handlers: no nlr handling code and no
setting of the globals dict.
2. globals used, no exception handlers: an nlr_buf_t is allocated on the
C stack but it is not used if the globals dict is unchanged, saving
execution time because nlr_push/nlr_pop don't need to run.
3. function has exception handlers, may use globals: an nlr_buf_t is
allocated and nlr_push/nlr_pop are always called.
In the end, native functions that don't access globals and don't have
exception handlers will run more efficiently than those that do.
Fixes issue #1573.
This new helper function acts like mp_load_method_maybe but is wrapped in
an NLR handler so it can catch exceptions. It prevents AttributeError from
propagating out, and optionally all other exceptions. This helper can be
used to fully implement hasattr (see follow-up commit), and also for cases
where mp_load_method_maybe is used but it must now raise an exception.
This patch introduces the MICROPY_ENABLE_PYSTACK option (disabled by
default) which enables a "Python stack" that allows to allocate and free
memory in a scoped, or Last-In-First-Out (LIFO) way, similar to alloca().
A new memory allocation API is introduced along with this Py-stack. It
includes both "local" and "nonlocal" LIFO allocation. Local allocation is
intended to be equivalent to using alloca(), whereby the same function must
free the memory. Nonlocal allocation is where another function may free
the memory, so long as it's still LIFO.
Follow-up patches will convert all uses of alloca() and VLA to the new
scoped allocation API. The old behaviour (using alloca()) will still be
available, but when MICROPY_ENABLE_PYSTACK is enabled then alloca() is no
longer required or used.
The benefits of enabling this option are (or will be once subsequent
patches are made to convert alloca()/VLA):
- Toolchains without alloca() can use this feature to obtain correct and
efficient scoped memory allocation (compared to using the heap instead
of alloca(), which is slower).
- Even if alloca() is available, enabling the Py-stack gives slightly more
efficient use of stack space when calling nested Python functions, due to
the way that compilers implement alloca().
- Enabling the Py-stack with the stackless mode allows for even more
efficient stack usage, as well as retaining high performance (because the
heap is no longer used to build and destroy stackless code states).
- With Py-stack and stackless enabled, Python-calling-Python is no longer
recursive in the C mp_execute_bytecode function.
The micropython.pystack_use() function is included to measure usage of the
Python stack.
Return the result of called function. If exception happened, return
MP_OBJ_NULL. Allows to use mp_call_function_*_protected() with callbacks
returning values, etc.
Update makeqstrdata.py to sort strings starting with "__" to the beginning
of qstr list, so they get low qstr id's, guaranteedly fitting in 8 bits.
Then use this property to further compact op_id => qstr mapping arrays.
Header files that are considered internal to the py core and should not
normally be included directly are:
py/nlr.h - internal nlr configuration and declarations
py/bc0.h - contains bytecode macro definitions
py/runtime0.h - contains basic runtime enums
Instead, the top-level header files to include are one of:
py/obj.h - includes runtime0.h and defines everything to use the
mp_obj_t type
py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h,
and defines everything to use the general runtime support functions
Additional, specific headers (eg py/objlist.h) can be included if needed.
Qstr values fit in 16-bits (and this fact is used elsewhere in the code) so
no need to use more than that for the large lookup tables. The compiler
will anyway give a warning if the qstr values don't fit in 16 bits. Saves
around 80 bytes of code space for Thumb2 archs.
The unary-op/binary-op enums are already defined, and there are no
arithmetic tricks used with these types, so it makes sense to use the
correct enum type for arguments that take these values. It also reduces
code size quite a bit for nan-boxing builds.
- Changed: ValueError, TypeError, NotImplementedError
- OSError invocations unchanged, because the corresponding utility
function takes ints, not strings like the long form invocation.
- OverflowError, IndexError and RuntimeError etc. not changed for now
until we decide whether to add new utility functions.
The code conventions suggest using header guards, but do not define how
those should look like and instead point to existing files. However, not
all existing files follow the same scheme, sometimes omitting header guards
altogether, sometimes using non-standard names, making it easy to
accidentally pick a "wrong" example.
This commit ensures that all header files of the MicroPython project (that
were not simply copied from somewhere else) follow the same pattern, that
was already present in the majority of files, especially in the py folder.
The rules are as follows.
Naming convention:
* start with the words MICROPY_INCLUDED
* contain the full path to the file
* replace special characters with _
In addition, there are no empty lines before #ifndef, between #ifndef and
one empty line before #endif. #endif is followed by a comment containing
the name of the guard macro.
py/grammar.h cannot use header guards by design, since it has to be
included multiple times in a single C file. Several other files also do not
need header guards as they are only used internally and guaranteed to be
included only once:
* MICROPY_MPHALPORT_H
* mpconfigboard.h
* mpconfigport.h
* mpthreadport.h
* pin_defs_*.h
* qstrdefs*.h
This patch allows the following code to run without allocating on the heap:
super().foo(...)
Before this patch such a call would allocate a super object on the heap and
then load the foo method and call it right away. The super object is only
needed to perform the lookup of the method and not needed after that. This
patch makes an optimisation to allocate the super object on the C stack and
discard it right after use.
Changes in code size due to this patch are:
bare-arm: +128
minimal: +232
unix x64: +416
unix nanbox: +364
stmhal: +184
esp8266: +340
cc3200: +128
Allows to iterate over the following without allocating on the heap:
- tuple
- list
- string, bytes
- bytearray, array
- dict (not dict.keys, dict.values, dict.items)
- set, frozenset
Allows to call the following without heap memory:
- all, any, min, max, sum
TODO: still need to allocate stack memory in bytecode for iter_buf.
qstrs ids are restricted to fit within 2 bytes already (eg in persistent
bytecode) so it's safe to use a uint16_t to store them in mp_arg_t. And
the flags member only needs a maximum of 2 bytes so can also use uint16_t.
Savings in code size can be significant when many mp_arg_t structs are
used for argument parsing. Eg, this patch reduces stmhal by 480 bytes.
Indended to replace raw asserts in bunch of files. Expands to empty
if MICROPY_BUILTIN_METHOD_CHECK_SELF_ARG is defined, otehrwise by
default still to assert, though a particular port may define it to
something else.
Introduce mp_raise_msg(), mp_raise_ValueError(), mp_raise_TypeError()
instead of previous pattern nlr_raise(mp_obj_new_exception_msg(...)).
Save few bytes on each call, which are many.
This allows the mp_obj_t type to be configured to something other than a
pointer-sized primitive type.
This patch also includes additional changes to allow the code to compile
when sizeof(mp_uint_t) != sizeof(void*), such as using size_t instead of
mp_uint_t, and various casts.
Previous to this patch each time a bytes object was referenced a new
instance (with the same data) was created. With this patch a single
bytes object is created in the compiler and is loaded directly at execute
time as a true constant (similar to loading bignum and float objects).
This saves on allocating RAM and means that bytes objects can now be
used when the memory manager is locked (eg in interrupts).
The MP_BC_LOAD_CONST_BYTES bytecode was removed as part of this.
Generated bytecode is slightly larger due to storing a pointer to the
bytes object instead of the qstr identifier.
Code size is reduced by about 60 bytes on Thumb2 architectures.
Previous to this patch the printing mechanism was a bit of a tangled
mess. This patch attempts to consolidate printing into one interface.
All (non-debug) printing now uses the mp_print* family of functions,
mainly mp_printf. All these functions take an mp_print_t structure as
their first argument, and this structure defines the printing backend
through the "print_strn" function of said structure.
Printing from the uPy core can reach the platform-defined print code via
two paths: either through mp_sys_stdout_obj (defined pert port) in
conjunction with mp_stream_write; or through the mp_plat_print structure
which uses the MP_PLAT_PRINT_STRN macro to define how string are printed
on the platform. The former is only used when MICROPY_PY_IO is defined.
With this new scheme printing is generally more efficient (less layers
to go through, less arguments to pass), and, given an mp_print_t*
structure, one can call mp_print_str for efficiency instead of
mp_printf("%s", ...). Code size is also reduced by around 200 bytes on
Thumb2 archs.
I.e. in this mode, C stack will never be used to call a Python function,
but if there's no free heap for a call, it will be reported as
RuntimeError (as expected), not MemoryError.
Despite initial guess, this code factoring does not hamper performance.
In fact it seems to improve speed by a little: running pystone(1.2) on
pyboard (which gives a very stable result) this patch takes pystones
from 1729.51 up to 1742.16. Also, pystones on x64 increase by around
the same proportion (but it's much noisier).
Taking a look at the generated machine code, stack usage with this patch
is unchanged, and call is tail-optimised with all arguments in
registers. Code size decreases by about 50 bytes on Thumb2 archs.
Eg, "() + 1" now tells you that __add__ is not supported for tuple and
int types (before it just said the generic "binary operator"). We reuse
the table of names for slot lookup because it would be a waste of code
space to store the pretty name for each operator.
This patch consolidates all global variables in py/ core into one place,
in a global structure. Root pointers are all located together to make
GC tracing easier and more efficient.
This allows to implement KeyboardInterrupt on unix, and a much safer
ctrl-C in stmhal port. First ctrl-C is a soft one, with hope that VM
will notice it; second ctrl-C is a hard one that kills anything (for
both unix and stmhal).
One needs to check for a pending exception in the VM only for jump
opcodes. Others can't produce an infinite loop (infinite recursion is
caught by stack check).
Because (for Thumb) a function pointer has the LSB set, pointers to
dynamic functions in RAM (eg native, viper or asm functions) were not
being traced by the GC. This patch is a comprehensive fix for this.
Addresses issue #820.