With this patch parse nodes are allocated sequentially in chunks. This
reduces fragmentation of the heap and prevents waste at the end of
individually allocated parse nodes.
Saves roughly 20% of RAM during parse stage.
Previous to this patch each time a bytes object was referenced a new
instance (with the same data) was created. With this patch a single
bytes object is created in the compiler and is loaded directly at execute
time as a true constant (similar to loading bignum and float objects).
This saves on allocating RAM and means that bytes objects can now be
used when the memory manager is locked (eg in interrupts).
The MP_BC_LOAD_CONST_BYTES bytecode was removed as part of this.
Generated bytecode is slightly larger due to storing a pointer to the
bytes object instead of the qstr identifier.
Code size is reduced by about 60 bytes on Thumb2 architectures.
Previous to this patch a call such as list.append(1, 2) would lead to a
seg fault. This is because list.append is a builtin method and the first
argument to such methods is always assumed to have the correct type.
Now, when a builtin method is extracted like this it is wrapped in a
checker object which checks the the type of the first argument before
calling the builtin function.
This feature is contrelled by MICROPY_BUILTIN_METHOD_CHECK_SELF_ARG and
is enabled by default.
See issue #1216.
Hashing is now done using mp_unary_op function with MP_UNARY_OP_HASH as
the operator argument. Hashing for int, str and bytes still go via
fast-path in mp_unary_op since they are the most common objects which
need to be hashed.
This lead to quite a bit of code cleanup, and should be more efficient
if anything. It saves 176 bytes code space on Thumb2, and 360 bytes on
x86.
The only loss is that the error message "unhashable type" is now the
more generic "unsupported type for __hash__".
Exceptions in .close() should be ignored (dumped to sys.stderr, not
propagated), but in uPy, they are propagated. Fix would require
nlr-wrapping .close() call, which is expensive. Bu on the other hand,
.close() is not called often, so maybe that's not too bad (depends,
if it's finally called and that causes stack overflow, there's nothing
good in that). And yet on another hand, .close() can be implemented to
catch exceptions on its side, and that should be the right choice.
This simplifies the API for objects and reduces code size (by around 400
bytes on Thumb2, and around 2k on x86). Performance impact was measured
with Pystone score, but change was barely noticeable.
Despite initial guess, this code factoring does not hamper performance.
In fact it seems to improve speed by a little: running pystone(1.2) on
pyboard (which gives a very stable result) this patch takes pystones
from 1729.51 up to 1742.16. Also, pystones on x64 increase by around
the same proportion (but it's much noisier).
Taking a look at the generated machine code, stack usage with this patch
is unchanged, and call is tail-optimised with all arguments in
registers. Code size decreases by about 50 bytes on Thumb2 archs.
"Base" should rather refer to "base type"."Base object for attribute
lookup" should rather be just "object".
Also, a case of common subexpression elimination.
Previous to this patch, a big-int, float or imag constant was interned
(made into a qstr) and then parsed at runtime to create an object each
time it was needed. This is wasteful in RAM and not efficient. Now,
these constants are parsed straight away in the parser and turned into
objects. This allows constants with large numbers of digits (so
addresses issue #1103) and takes us a step closer to #722.
To enable parsing constants more efficiently, mp_parse should be allowed
to raise an exception, and mp_compile can already raise a MemoryError.
So these functions need to be protected by an nlr push/pop block.
This patch adds that feature in all places. This allows to simplify how
mp_parse and mp_compile are called: they now raise an exception if they
have an error and so explicit checking is not needed anymore.
Eg, "() + 1" now tells you that __add__ is not supported for tuple and
int types (before it just said the generic "binary operator"). We reuse
the table of names for slot lookup because it would be a waste of code
space to store the pretty name for each operator.
This patch consolidates all global variables in py/ core into one place,
in a global structure. Root pointers are all located together to make
GC tracing easier and more efficient.
This patch adds a configuration option (MICROPY_CAN_OVERRIDE_BUILTINS)
which, when enabled, allows to override all names within the builtins
module. A builtins override dict is created the first time the user
assigns to a name in the builtins model, and then that dict is searched
first on subsequent lookups. Note that this implementation doesn't
allow deleting of names.
This patch also does some refactoring of builtins code, creating the
modbuiltins.c file.
Addresses issue #959.
mp_lexer_t type is exposed, mp_token_t type is removed, and simple lexer
functions (like checking current token kind) are now inlined.
This saves 784 bytes ROM on 32-bit unix, 348 bytes on stmhal, and 460
bytes on bare-arm. It also saves a tiny bit of RAM since mp_lexer_t
is a bit smaller. Also will run a bit more efficiently.
Going from MICROPY_ERROR_REPORTING_NORMAL to
MICROPY_ERROR_REPORTING_TERSE now saves 2020 bytes ROM for ARM Thumb2,
and 2200 bytes ROM for 32-bit x86.
This is about a 2.5% code size reduction for bare-arm.
This allows to implement KeyboardInterrupt on unix, and a much safer
ctrl-C in stmhal port. First ctrl-C is a soft one, with hope that VM
will notice it; second ctrl-C is a hard one that kills anything (for
both unix and stmhal).
One needs to check for a pending exception in the VM only for jump
opcodes. Others can't produce an infinite loop (infinite recursion is
caught by stack check).
This has benefits all round: code factoring for parse/compile/execute,
proper context save/restore for exec, allow to sepcify globals/locals
for eval, and reduced ROM usage by >100 bytes on stmhal and unix.
Also, the call to mp_parse_compile_execute is tail call optimised for
the import code, so it doesn't increase stack memory usage.
It seems most sensible to use size_t for measuring "number of bytes" in
malloc and vstr functions (since that's what size_t is for). We don't
use mp_uint_t because malloc and vstr are not Micro Python specific.
Stack is full descending and must be 8-byte aligned. It must start off
pointing to just above the last byte of RAM.
Previously, stack started pointed to last byte of RAM (eg 0x2001ffff)
and so was not 8-byte aligned. This caused a bug in combination with
alloca.
This patch also updates some debug printing code.
Addresses issue #872 (among many other undiscovered issues).
This way, the native glue code is only compiled if native code is
enabled (which makes complete sense; thanks to Paul Sokolovsky for
the idea).
Should fix issue #834.
qstr_init is always called exactly before mp_init, so makes sense to
just have mp_init call it. Similarly with
mp_init_emergency_exception_buf. Doing this makes the ports simpler and
less error prone (ie they can no longer forget to call these).
As stack checking is enabled by default, ports which don't call
stack_ctrl_init() are broken now (report RuntimeError on startup). Save
them trouble and just init stack control framework in interpreter init.
Benefits: won't crash baremetal targets, will provide Python source location
when not implemented feature used (it will no longer provide C source
location, but just grep for error message).
__debug__ now resolves to True or False. Its value needs to be set by
mp_set_debug().
TODO: call mp_set_debug in unix/ port.
TODO: optimise away "if False:" statements in compiler.
This was hit when trying to make urlparse.py from stdlib run. Took
quite some time to debug.
TODO: Reconsile bound method creation process better, maybe callable is
to generic type to bind at all?
Blanket wide to all .c and .h files. Some files originating from ST are
difficult to deal with (license wise) so it was left out of those.
Also merged modpyb.h, modos.h, modstm.h and modtime.h in stmhal/.
By default mingw outputs 3 digits instead of the standard 2 so all float
tests using printf fail. Using setenv at the start of the program fixes this.
To accomodate calling platform specific initialization a
MICROPY_MAIN_INIT_FUNC macro is used which is called in mp_init()
Attempt to address issue #386. unique_code_id's have been removed and
replaced with a pointer to the "raw code" information. This pointer is
stored in the actual byte code (aligned, so the GC can trace it), so
that raw code (ie byte code, native code and inline assembler) is kept
only for as long as it is needed. In memory it's now like a tree: the
outer module's byte code points directly to its children's raw code. So
when the outer code gets freed, if there are no remaining functions that
need the raw code, then the children's code gets freed as well.
This is pretty much like CPython does it, except that CPython stores
indexes in the byte code rather than machine pointers. These indices
index the per-function constant table in order to find the relevant
code.
Based on the discussion in #433. mp_load_attr() is critical-path function,
so any extra check will slowdown any script. As supporting default val
required only for getattr() builtin, move correspending implementation
there (still as a separate function due to concerns of maintainability
of such almost-duplicated code instances).
Finishes addressing issue #424.
In the end this was a very neat refactor that now makes things a lot
more consistent across the py code base. It allowed some
simplifications in certain places, now that everything is a dict object.
Also converted builtins tables to dictionaries. This will be useful
when we need to turn builtins into a proper module.
It's not completely satisfactory, because a failed call to __getattr__
should not raise an exception.
__setattr__ could be implemented, but it would slow down all stores to a
user created object. Need to implement some caching system.
There was thinkos that either send_value or throw_value is specified, but
there were cases with both. Note that send_value is pushed onto generator's
stack - but that's probably only good, because if we throw exception into
gen, it should not ever use send_value, and that will be just extra "assert".
In this case, the exception is just re-thrown - the ideas is that object
doesn't handle this exception specially, so it will propagated per Python
semantics.
Adding this bytecode allows to remove 4 others related to
function/method calls with * and ** support. Will also help with
bytecodes that make functions/closures with default positional and
keyword args.
Pretty much everyone needs to include map.h, since it's such an integral
part of the Micro Python object implementation. Thus, the definitions
are now in obj.h instead. map.h is removed.
Mostly just a global search and replace. Except rt_is_true which
becomes mp_obj_is_true.
Still would like to tidy up some of the names, but this will do for now.
Rationale: setting up the stack (state for locals and exceptions) is
really part of the "code", it's the prelude of the function. For
example, native code adjusts the stack pointer on entry to the function.
Native code doesn't need to know n_state for any other reason. So
putting the state size in the bytecode prelude is sensible.
It reduced ROM usage on STM by about 30 bytes :) And makes it easier to
pass information about the bytecode between functions.
Originally, .methods was used for methods in a ROM class, and
locals_dict for methods in a user-created class. That distinction is
unnecessary, and we can use locals_dict for ROM classes now that we have
ROMable maps.
This removes an entry in the bloated mp_obj_type_t struct, saving a word
for each ROM object and each RAM object. ROM objects that have a
methods table (now a locals_dict) need an extra word in total (removed
the methods pointer (1 word), no longer need the sentinel (2 words), but
now need an mp_obj_dict_t wrapper (4 words)). But RAM objects save a
word because they never used the methods entry.
Overall the ROM usage is down by a few hundred bytes, and RAM usage is
down 1 word per user-defined type/class.
There is less code (no need to check 2 tables), and now consistent with
the way ROM modules have their tables initialised.
Efficiency is very close to equivaluent.
For this, needed to implement DELETE_NAME bytecode (because var bound
in except clause is automatically deleted at its end).
http://docs.python.org/3/reference/compound_stmts.html#except :
"When an exception has been assigned using as target, it is cleared at
the end of the except clause."
mp_module_obj_t can now be put in ROM.
Configuration of float type is now similar to longint: can now choose
none, float or double as the implementation.
math module has basic math functions. For STM port, these are not yet
implemented (they are just stub functions).
Each built-in exception is now a type, with base type BaseException.
C exceptions are created by passing a pointer to the exception type to
make an instance of. When raising an exception from the VM, an
instance is created automatically if an exception type is raised (as
opposed to an exception instance).
Exception matching (RT_BINARY_OP_EXCEPTION_MATCH) is now proper.
Handling of parse error changed to match new exceptions.
mp_const_type renamed to mp_type_type for consistency.
Linear table at the moment, to eventually be replaced with a hash table
generated by a preprocessor.
Dynamic table is retained so that builtins can be overridden.
sys.path is not initialized by rt_init(), that's left for platform-specific
startup code. (For example, bare metal port may have some hardcoded defaults,
and let user change sys.path directly; while port for OS with environment
feature can take path from environment). If it's not explicitly initialized,
modules will be imported only from a current directory.
TODO: Decide if we really need separate bytecode for creating functions
with default arguments - we would need same for closures, then there're
keywords arguments too. Having all combinations is a small exponential
explosion, likely we need just 2 cases - simplest (no defaults, no kw),
and full - defaults & kw.
__bool__() and __len__() are just the same as __neg__() or __invert__(),
and require efficient dispatching implementation (not requiring search/lookup).
type->unary_op() is just the right choice for this short of adding
standalone virtual method(s) to already big mp_obj_type_t structure.
We still have FAST_[0,1,2] byte codes, but they now just access the
fastn array (before they had special local variables). It's now
simpler, a bit faster, and uses a bit less stack space (on STM at least,
which is most important).
The only reason now to keep FAST_[0,1,2] byte codes is for compressed
byte code size.
LOAD_METHOD bug was: emitbc did not correctly calculate the amount of
stack usage for a LOAD_METHOD operation.
small int bug was: int was being used to pass small ints, when it should
have been machine_int_t.