Commit Graph

3903 Commits

Author SHA1 Message Date
Damien George 3c7cab4e98 py/parse: Put const bytes objects in parse tree as const object.
Instead of as an intermediate qstr, which may unnecessarily intern the data
of the bytes object.

Signed-off-by: Damien George <damien@micropython.org>
2022-03-16 00:41:10 +11:00
Damien George 65851ebb51 py/parse: Simplify handling of const int parse nodes.
Signed-off-by: Damien George <damien@micropython.org>
2022-03-16 00:00:25 +11:00
Damien George ac2293161e py/modsys: Add optional mutable attributes sys.ps1/ps2 and use them.
This allows customising the REPL prompt strings.

Signed-off-by: Damien George <damien@micropython.org>
2022-03-10 10:58:33 +11:00
Damien George cac939ddc3 py/modsys: Add optional sys.tracebacklimit attribute.
With behaviour as per CPython.

Signed-off-by: Damien George <damien@micropython.org>
2022-03-10 10:43:21 +11:00
Damien George bc181550a4 py/modsys: Add optional attribute delegation.
To be enabled when needed by specific sys attributes.

Signed-off-by: Damien George <damien@micropython.org>
2022-03-10 10:43:21 +11:00
Damien George 3356b5ef8d py/objmodule: Support delegating failed attr lookups.
This commit adds generic support for mutable module attributes on built in
modules, by adding support for an optional hook function for module
attribute lookup.  If a module wants to support additional attribute load/
store/delete (beyond what is in the constant, globals dict) then it should
add at the very end of its globals dict MP_MODULE_ATTR_DELEGATION_ENTRY().
This should point to a custom function which will handle any additional
attributes.

The mp_module_generic_attr() function is provided as a helper function for
additional attributes: it requires an array of qstrs (terminated in
MP_QSTRnull) and a corresponding array of objects (with a 1-1 mapping
between qstrs and objects).  If the qstr is found in the array then the
corresponding object is loaded/stored/deleted.

Signed-off-by: Damien George <damien@micropython.org>
2022-03-10 10:35:44 +11:00
Damien George 0149cd6b8b windows: Switch to VFS subsystem and use VfsPosix.
Following the unix port.

Signed-off-by: Damien George <damien@micropython.org>
2022-03-10 00:26:36 +11:00
Damien George 94077c6402 samd/moduos: Convert module to use extmod version.
Signed-off-by: Damien George <damien@micropython.org>
2022-03-09 10:03:23 +11:00
Damien George 926b554daf extmod/moduos: Create general uos module to be used by all ports.
Based on the rp2 port version, with the rp2 port converted to use this
module.

Signed-off-by: Damien George <damien@micropython.org>
2022-03-09 10:03:23 +11:00
stijn 795370ca23 py/bc.h: Fix C++ compilation of public API.
Casts between unrelated types must be explicit.  Regression in
f2040bfc7e
2022-03-01 16:17:30 +11:00
Damien George f2040bfc7e py: Rework bytecode and .mpy file format to be mostly static data.
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing.  They are also
smaller on disk.

But the real benefit of .mpy files comes when they are frozen into the
firmware.  This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device.  These C data structures can be executed in-place, ie directly from
ROM.  This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).

The downside of frozen code is that it requires recompiling and reflashing
the entire firmware.  This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).

This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware.  The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place.  If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.

With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).

The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded.  Instead only a small qstr table needs to be built (and put in RAM)
at import time.  This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory.  Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).

In more detail, in the VM what used to be (schematically):

    qst = DECODE_QSTR_VALUE;

is now (schematically):

    idx = DECODE_QSTR_INDEX;
    qst = qstr_table[idx];

That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values.  Only qstr_table needs to be linked
when the .mpy is loaded.

Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.

The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before

The qstr indirection in the bytecode has only a small impact on VM
performance.  For stm32 on PYBv1.0 the performance change of this commit
is:

diff of scores (higher is better)
N=100 M=100             baseline -> this-commit  diff      diff% (error%)
bm_chaos.py               371.07 ->  357.39 :  -13.68 =  -3.687% (+/-0.02%)
bm_fannkuch.py             78.72 ->   77.49 :   -1.23 =  -1.563% (+/-0.01%)
bm_fft.py                2591.73 -> 2539.28 :  -52.45 =  -2.024% (+/-0.00%)
bm_float.py              6034.93 -> 5908.30 : -126.63 =  -2.098% (+/-0.01%)
bm_hexiom.py               48.96 ->   47.93 :   -1.03 =  -2.104% (+/-0.00%)
bm_nqueens.py            4510.63 -> 4459.94 :  -50.69 =  -1.124% (+/-0.00%)
bm_pidigits.py            650.28 ->  644.96 :   -5.32 =  -0.818% (+/-0.23%)
core_import_mpy_multi.py  564.77 ->  581.49 :  +16.72 =  +2.960% (+/-0.01%)
core_import_mpy_single.py  68.67 ->   67.16 :   -1.51 =  -2.199% (+/-0.01%)
core_qstr.py               64.16 ->   64.12 :   -0.04 =  -0.062% (+/-0.00%)
core_yield_from.py        362.58 ->  354.50 :   -8.08 =  -2.228% (+/-0.00%)
misc_aes.py               429.69 ->  405.59 :  -24.10 =  -5.609% (+/-0.01%)
misc_mandel.py           3485.13 -> 3416.51 :  -68.62 =  -1.969% (+/-0.00%)
misc_pystone.py          2496.53 -> 2405.56 :  -90.97 =  -3.644% (+/-0.01%)
misc_raytrace.py          381.47 ->  374.01 :   -7.46 =  -1.956% (+/-0.01%)
viper_call0.py            576.73 ->  572.49 :   -4.24 =  -0.735% (+/-0.04%)
viper_call1a.py           550.37 ->  546.21 :   -4.16 =  -0.756% (+/-0.09%)
viper_call1b.py           438.23 ->  435.68 :   -2.55 =  -0.582% (+/-0.06%)
viper_call1c.py           442.84 ->  440.04 :   -2.80 =  -0.632% (+/-0.08%)
viper_call2a.py           536.31 ->  532.35 :   -3.96 =  -0.738% (+/-0.06%)
viper_call2b.py           382.34 ->  377.07 :   -5.27 =  -1.378% (+/-0.03%)

And for unix on x64:

diff of scores (higher is better)
N=2000 M=2000        baseline -> this-commit     diff      diff% (error%)
bm_chaos.py          13594.20 ->  13073.84 :  -520.36 =  -3.828% (+/-5.44%)
bm_fannkuch.py          60.63 ->     59.58 :    -1.05 =  -1.732% (+/-3.01%)
bm_fft.py           112009.15 -> 111603.32 :  -405.83 =  -0.362% (+/-4.03%)
bm_float.py         246202.55 -> 247923.81 : +1721.26 =  +0.699% (+/-2.79%)
bm_hexiom.py           615.65 ->    617.21 :    +1.56 =  +0.253% (+/-1.64%)
bm_nqueens.py       215807.95 -> 215600.96 :  -206.99 =  -0.096% (+/-3.52%)
bm_pidigits.py        8246.74 ->   8422.82 :  +176.08 =  +2.135% (+/-3.64%)
misc_aes.py          16133.00 ->  16452.74 :  +319.74 =  +1.982% (+/-1.50%)
misc_mandel.py      128146.69 -> 130796.43 : +2649.74 =  +2.068% (+/-3.18%)
misc_pystone.py      83811.49 ->  83124.85 :  -686.64 =  -0.819% (+/-1.03%)
misc_raytrace.py     21688.02 ->  21385.10 :  -302.92 =  -1.397% (+/-3.20%)

The code size change is (firmware with a lot of frozen code benefits the
most):

       bare-arm:  +396 +0.697%
    minimal x86: +1595 +0.979% [incl +32(data)]
       unix x64: +2408 +0.470% [incl +800(data)]
    unix nanbox: +1396 +0.309% [incl -96(data)]
          stm32: -1256 -0.318% PYBV10
         cc3200:  +288 +0.157%
        esp8266:  -260 -0.037% GENERIC
          esp32:  -216 -0.014% GENERIC[incl -1072(data)]
            nrf:  +116 +0.067% pca10040
            rp2:  -664 -0.135% PICO
           samd:  +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS

As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.

In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place.  Performance is not impacted too much.  Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM.  This will
essentially be able to replace frozen code for most applications.

Signed-off-by: Damien George <damien@micropython.org>
2022-02-24 18:08:43 +11:00
Damien George 18acd0318f py/gc: Update debug code to compile with changes to qstr pool types.
Following on from 18b1ba086c and
f46a7140f5.

Signed-off-by: Damien George <damien@micropython.org>
2022-02-17 11:17:21 +11:00
Artyom Skrobov f46a7140f5 py/qstr: Use `const` consistently to avoid a cast.
Originally at adafruit#4707

Signed-off-by: Artyom Skrobov <tyomitch@gmail.com>
2022-02-11 22:55:02 +11:00
Artyom Skrobov 18b1ba086c py/qstr: Separate hash and len from string data.
This allows the compiler to merge strings: e.g. "update",
"difference_update" and "symmetric_difference_update" will all point to the
same memory.

No functional change.

The size reduction depends on the number of qstrs in the build.  The change
this commit brings is:

   bare-arm:    -4 -0.007%
minimal x86:  +150 +0.092% [incl +48(data)]
   unix x64:  -608 -0.118%
unix nanbox:  -572 -0.126% [incl +32(data)]
      stm32: -1392 -0.352% PYBV10
     cc3200:  -448 -0.244%
    esp8266: -1208 -0.173% GENERIC
      esp32: -1028 -0.068% GENERIC[incl -1020(data)]
        nrf:  -440 -0.252% pca10040
        rp2: -1072 -0.217% PICO
       samd:  -368 -0.264% ADAFRUIT_ITSYBITSY_M4_EXPRESS

Performance is also improved (on bare metal at least) for the
core_import_mpy_multi.py, core_import_mpy_single.py and core_qstr.py
performance benchmarks.

Originally at adafruit#4583

Signed-off-by: Artyom Skrobov <tyomitch@gmail.com>
2022-02-11 22:52:32 +11:00
Damien George fbd47fc46c ports: Consolidate inclusion of umachine module in built-ins.
The inclusion of `umachine` in the list of built-in modules is now done
centrally in py/objmodule.c.  Enabling MICROPY_PY_MACHINE will include this
module.

As part of this, all ports now have `umachine` as the core module name
(previously some had only `machine` as the name).

Signed-off-by: Damien George <damien@micropython.org>
2022-02-03 10:08:54 +11:00
stijn dd6967202a py/modmath: Add math.tau, math.nan and math.inf constants.
Configurable by the new MICROPY_PY_MATH_CONSTANTS option.
2022-01-23 09:28:33 +11:00
stijn a9448c0a86 all: Fix MICROPY_OBJ_REPR_D compilation with msvc. 2022-01-23 09:25:01 +11:00
Jeff Epler 037b2c72a1 py/objstr: Support '{:08}'.format("Jan") like Python 3.10.
The new test has an .exp file, because it is not compatible with Python 3.9
and lower.

See CPython version of the issue at https://bugs.python.org/issue27772

Signed-off-by: Jeff Epler <jepler@gmail.com>
2022-01-19 15:34:32 +11:00
Damien George da4b38e756 all: Bump version to 1.18.
Signed-off-by: Damien George <damien@micropython.org>
2022-01-17 09:50:31 +11:00
Damien George 772058a6bd py/mpconfig.h: Define MICROPY_PY_USSL_FINALISER only if not defined.
So a port can define it even if MICROPY_PY_USSL is not defined.

Signed-off-by: Damien George <damien@micropython.org>
2022-01-07 23:59:17 +11:00
stijn 22cf0940e1 py/modbuiltins: Add additional macro for extending builtins.
Mainly useful for defining additional globals in boards and variants.
2022-01-07 11:36:52 +11:00
Emilie Feral 58b56b91c4 py/qstr: Reset mpstate.qstr_last_chunk before raising an error.
The qstr_last_chunk is not collected by the garbage collector.  This relies
on the assertion that qstr_pool_t also references the qstr_last_chunk.  If
an exception is raised while allocating the qstr_pool_t, qstr_last_chunk
has to be invalidated not to become a dangling reference at the next
garbage collection.

Signed-off-by: Emilie Feral <emilie.feral@numworks.com>
2022-01-06 13:00:43 +11:00
Damien George aac5a97d08 ports: Move '.frozen' to second entry in sys.path.
In commit 86ce442607 the '.frozen' entry was
added at the start of sys.path, to allow control over when frozen modules
are searched during import, and retain existing behaviour whereby frozen
was searched before the filesystem.

But Python semantics of sys.path require sys.path[0] to be the directory of
the currently executing script, or ''.

This commit moves the '.frozen' entry to second place in sys.path, so
sys.path[0] retains its correct value (described above).

Signed-off-by: Damien George <damien@micropython.org>
2021-12-29 23:55:36 +11:00
Damien George 2c139bbf4e py/mpz: Fix bugs with bitwise of -0 by ensuring all 0's are positive.
This commit makes sure that the value zero is always encoded in an mpz_t as
neg=0 and len=0 (previously it was just len=0).

This invariant is needed for some of the bitwise operations that operate on
negative numbers, because they cannot handle -0.  For example
(-((1<<100)-(1<<100)))|1 was being computed as -65535, instead of 1.

Fixes issue #8042.

Signed-off-by: Damien George <damien@micropython.org>
2021-12-21 18:00:05 +11:00
Damien George fe9ffff9c0 py/mpstate.h: Only include sys.path/argv objects in state when enabled.
The mp_sys_path_obj and mp_sys_argv_obj objects are only used by the
runtime and accessible from Python if MICROPY_PY_SYS is enabled.  So
exclude them from the runtime state if this option is disabled.

Signed-off-by: Damien George <damien@micropython.org>
2021-12-19 08:55:40 +11:00
Damien George de43b500bd py/runtime: Allow initialising sys.path/argv with defaults.
If MICROPY_PY_SYS_PATH_ARGV_DEFAULTS is enabled (which it is by default)
then sys.path and sys.argv will be initialised and populated with default
values.  This keeps all bare-metal ports aligned.

Signed-off-by: Damien George <damien@micropython.org>
2021-12-18 00:08:07 +11:00
Jim Mussared d6d4a5819b py/mkrules.cmake: Set frozen preprocessor defs early.
This ensures MICROPY_QSTR_EXTRA_POOL and MICROPY_MODULE_FROZEN_MPY are set
if necessary before the CFLAGS are extracted for QSTR generation.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-12-18 00:05:18 +11:00
Jim Mussared e0bf4611c3 py: Only search frozen modules when '.frozen' is found in sys.path.
This changes makemanifest.py & mpy-tool.py to merge string and mpy names
into the same list (now mp_frozen_names).

The various paths for loading a frozen module (mp_find_frozen_module) and
checking existence of a frozen module (mp_frozen_stat) use a common
function that searches this list.

In addition, the frozen lookup will now only take place if the path starts
with ".frozen", which needs to be added to sys.path.

This fixes issues #1804, #2322, #3509, #6419.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-12-18 00:01:59 +11:00
Jim Mussared 92353c2911 all: Remove support for FROZEN_DIR and FROZEN_MPY_DIR.
These have been deprecated for over two years in favour of FROZEN_MANIFEST
and manifest.py.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-12-17 23:54:05 +11:00
Jim Mussared cc23e99f32 py/modio: Remove io.resource_stream function.
This feature is not enabled on any port, it's not in CPython's io module,
and functionality is better suited to the micropython-lib implementation of
pkg_resources.
2021-12-17 23:53:44 +11:00
Damien George d6dc4cb65a py/showbc: Fix printing of raw bytecode header on nanbox builds.
Signed-off-by: Damien George <damien@micropython.org>
2021-12-15 16:54:47 +11:00
Jim Mussared a7fa18c203 py/builtinimport: Refactor module importing.
Simplify and document/comment the handling of builtin import for:
- already-loaded modules
- built-in modules
- built-in umodules (formerly weak links)
- filesystem modules

Retains existing functionality with smaller code size but should also
facilitate potential new features (built-in packages, controlling the
frozen path).

Also makes the (unix-only) -m behavior a bit more obvious and configurable.

Code size change with this commit:

   bare-arm:    +0 +0.000%
minimal x86:   -64 -0.039%
   unix x64:   -32 -0.006%
unix nanbox:    -4 -0.001%
      stm32:  -184 -0.047% PYBV10
     cc3200:  -120 -0.065%
    esp8266:  -228 -0.033% GENERIC
      esp32:  -268 -0.018% GENERIC[incl +16(data)]
        nrf:  -152 -0.087% pca10040
        rp2:  -256 -0.052% PICO
       samd:   -80 -0.057% ADAFRUIT_ITSYBITSY_M4_EXPRESS
2021-12-01 13:23:34 +11:00
Damien George a0890983ea py/objfun.h: Remove obsolete comments about entries in extra_args.
These two entries were removed in 049a7a8153

Signed-off-by: Damien George <damien@micropython.org>
2021-11-25 23:24:40 +11:00
Damien George 11ed94797d py/lexer: Support nested [] and {} characters within f-string params.
Signed-off-by: Damien George <damien@micropython.org>
2021-11-25 21:50:58 +11:00
Laurens Valk e2ca8ab8fc py/runtime: Allow types to use both .attr and .locals_dict.
Make it possible to proceed to a regular lookup in locals_dict if the
custom type->attr fails.  This allows type->attr to extend rather than
completely replace the lookup in locals_dict.

This is useful for custom builtin classes that have mostly regular methods
but just a few special attributes/properties.  This way, type->attr needs
to deal with the special cases only and the default lookup will be used for
generic methods.

Signed-off-by: Laurens Valk <laurens@pybricks.com>
2021-11-22 12:10:35 +11:00
Damien George 123dcdb8e5 py/modsys: Replace non-ASCII quote char with ASCII char.
The source code should stay 7-bit ASCII clean.

Signed-off-by: Damien George <damien@micropython.org>
2021-11-19 17:26:04 +11:00
Damien George 78ab2eeda3 py/showbc: Print unary-op string when dumping bytecode.
Signed-off-by: Damien George <damien@micropython.org>
2021-11-19 17:05:40 +11:00
Laurens Valk fe120484b6 py/gc: Add hook to run code during time consuming GC operations.
This makes it possible for cooperative multitasking systems to keep running
event loops during garbage collector operations.

For example, this can be used to ensure that a motor control loop runs
approximately each 5 ms.  Without this hook, the loop time can jump to
about 15 ms.

Addresses #3475.

Signed-off-by: Laurens Valk <laurens@pybricks.com>
2021-11-01 15:39:37 +11:00
Damien George c62351fbd6 py/mpconfig.h: Revert MICROPY_REPL_INFO to disabled at all levels.
This is an stm32-specific feature that's accessed via the pyb module, so
not something that will be widely enabled.

Signed-off-by: Damien George <damien@micropython.org>
2021-11-01 15:18:22 +11:00
Jim Mussared 0e236eef08 py/mpconfig.h: Define the "extra" feature level.
Some of these will later be moved to CORE or BASIC, but EXTRA is a good
starting point based on what stm32 uses.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-11-01 14:57:28 +11:00
Mike Wadsten fe2bc92b4d py/runtime: Fix crash when exc __new__ doesn't return an exc instance.
See CPython bug https://bugs.python.org/issue39091 for more details.
2021-10-21 12:32:16 +11:00
Damien George 8412568e7b py: Add wrapper macros so hot VM functions can go in fast code location.
For example, on esp32 they can go in iRAM to improve performance.

Signed-off-by: Damien George <damien@micropython.org>
2021-10-15 23:31:19 +11:00
stijn ea880d5674 py/builtinimport: Forward all debug printing to MICROPY_DEBUG_PRINTER. 2021-09-24 13:17:19 +10:00
iabdalkader 2c5e9bbdfa extmod: Add platform module.
It contains the compiler version, and underlying system HAL/SDK version.
2021-09-19 23:35:10 +10:00
Jim Mussared b326edf68c all: Remove MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE.
This commit removes all parts of code associated with the existing
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the
-mcache-lookup-bc option to mpy-cross.

This feature originally provided a significant performance boost for Unix,
but wasn't able to be enabled for MCU targets (due to frozen bytecode), and
added significant extra complexity to generating and distributing .mpy
files.

The equivalent performance gain is now provided by the combination of
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has
been enabled on the unix port in the previous commit).

It's hard to provide precise performance numbers, but tests have been run
on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V,
xtensa) and they all generally agree on the qualitative improvements seen
by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and
MICROPY_OPT_MAP_LOOKUP_CACHE.

For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the
change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined
with MAP_LOOKUP_CACHE is:

diff of scores (higher is better)
N=2000 M=2000       bccache -> attrmapcache      diff      diff% (error%)
bm_chaos.py        13742.56 ->   13905.67 :   +163.11 =  +1.187% (+/-3.75%)
bm_fannkuch.py        60.13 ->      61.34 :     +1.21 =  +2.012% (+/-2.11%)
bm_fft.py         113083.20 ->  114793.68 :  +1710.48 =  +1.513% (+/-1.57%)
bm_float.py       256552.80 ->  243908.29 : -12644.51 =  -4.929% (+/-1.90%)
bm_hexiom.py         521.93 ->     625.41 :   +103.48 = +19.826% (+/-0.40%)
bm_nqueens.py     197544.25 ->  217713.12 : +20168.87 = +10.210% (+/-3.01%)
bm_pidigits.py      8072.98 ->    8198.75 :   +125.77 =  +1.558% (+/-3.22%)
misc_aes.py        17283.45 ->   16480.52 :   -802.93 =  -4.646% (+/-0.82%)
misc_mandel.py     99083.99 ->  128939.84 : +29855.85 = +30.132% (+/-5.88%)
misc_pystone.py    83860.10 ->   82592.56 :  -1267.54 =  -1.511% (+/-2.27%)
misc_raytrace.py   21490.40 ->   22227.23 :   +736.83 =  +3.429% (+/-1.88%)

This shows that the new optimisations are at least as good as the existing
inline-bytecode-caching, and are sometimes much better (because the new
ones apply caching to a wider variety of map lookups).

The new optimisations can also benefit code generated by the native
emitter, because they apply to the runtime rather than the generated code.
The improvement for the native emitter when LOAD_ATTR_FAST_PATH and
MAP_LOOKUP_CACHE are enabled is (same Linux environment as above):

diff of scores (higher is better)
N=2000 M=2000        native -> nat-attrmapcache  diff      diff% (error%)
bm_chaos.py        14130.62 ->   15464.68 :  +1334.06 =  +9.441% (+/-7.11%)
bm_fannkuch.py        74.96 ->      76.16 :     +1.20 =  +1.601% (+/-1.80%)
bm_fft.py         166682.99 ->  168221.86 :  +1538.87 =  +0.923% (+/-4.20%)
bm_float.py       233415.23 ->  265524.90 : +32109.67 = +13.756% (+/-2.57%)
bm_hexiom.py         628.59 ->     734.17 :   +105.58 = +16.796% (+/-1.39%)
bm_nqueens.py     225418.44 ->  232926.45 :  +7508.01 =  +3.331% (+/-3.10%)
bm_pidigits.py      6322.00 ->    6379.52 :    +57.52 =  +0.910% (+/-5.62%)
misc_aes.py        20670.10 ->   27223.18 :  +6553.08 = +31.703% (+/-1.56%)
misc_mandel.py    138221.11 ->  152014.01 : +13792.90 =  +9.979% (+/-2.46%)
misc_pystone.py    85032.14 ->  105681.44 : +20649.30 = +24.284% (+/-2.25%)
misc_raytrace.py   19800.01 ->   23350.73 :  +3550.72 = +17.933% (+/-2.79%)

In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options:
- are simpler;
- take less code size;
- are faster (generally);
- work with code generated by the native emitter;
- can be used on embedded targets with a small and constant RAM overhead;
- allow the same .mpy bytecode to run on all targets.

See #7680 for further discussion.  And see also #7653 for a discussion
about simplifying mpy-cross options.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 16:04:03 +10:00
Jim Mussared 11ef8f22fe py/map: Add an optional cache of (map+index) to speed up map lookups.
The existing inline bytecode caching optimisation, selected by
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, reserves an extra byte in the
bytecode after certain opcodes, which at runtime stores a map index of the
likely location of this field when looking up the qstr.  This scheme is
incompatible with bytecode-in-ROM, and doesn't work with native generated
code.  It also stores bytecode in .mpy files which is of a different format
to when the feature is disabled, making generation of .mpy files more
complex.

This commit provides an alternative optimisation via an approach that adds
a global cache for map offsets, then all mp_map_lookup operations use it.
It's less precise than bytecode caching, but allows the cache to be
independent and external to the bytecode that is executing.  It also works
for the native emitter and adds a similar performance boost on top of the
gain already provided by the native emitter.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 16:02:19 +10:00
Jim Mussared 7b89ad8dbf py/vm: Add a fast path for LOAD_ATTR on instance types.
When the LOAD_ATTR opcode is executed there are quite a few different cases
that have to be handled, but the common case is accessing a member on an
instance type.  Typically, built-in types provide methods which is why this
is common.

Fortunately, for this specific case, if the member is found in the member
map then there's no further processing.

This optimisation does a relatively cheap check (type is instance) and then
forwards directly to the member map lookup, falling back to the regular
path if necessary.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 16:02:15 +10:00
Jim Mussared 01374d941f py/mpconfig.h: Define initial templates for "feature levels".
This is the beginning of a set of changes to simplify enabling/disabling
features.  The goals are:
- Remove redundancy from mpconfigport.h (never set a value to the default
  -- make it clear exactly what's being enabled).
- Improve consistency between ports.  All "similar" ports (i.e. approx same
  flash size) should get the same features.
- Simplify mpconfigport.h -- just get default/sensible options for the size
  of the port.
- Make it easy for defining constrained boards (e.g. STM32F0/L0), they can
  just set a lower level.

This commit makes a step towards this and defines the "core" level as the
current default feature set, and a "minimal" level to turn off everything.
And a few placeholder levels are added for where the other ports will
roughly land.

This is a no-op change for all ports.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2021-09-16 13:19:11 +10:00
Damien George 426785a19e py/emitnative: Ensure load_subscr does not clobber existing REG_RET.
Fixes issue #7782, and part of issue #6314.

Signed-off-by: Damien George <damien@micropython.org>
2021-09-13 22:30:24 +10:00
Damien George e6850838cd py/parse: Simplify parse nodes representing a list.
This commit simplifies and optimises the parse tree in-memory
representation of lists of expressions, for tuples and lists, and when
tuples are used on the left-hand-side of assignments and within del
statements.  This reduces memory usage of the parse tree when such code is
compiled, and also reduces the size of the compiler.

For example, (1,) was previously the following parse tree:

    expr_stmt(5) (n=2)
      atom_paren(45) (n=1)
        testlist_comp(146) (n=2)
          int(1)
          testlist_comp_3b(149) (n=1)
            NULL
      NULL

and with this commit is now:

    expr_stmt(5) (n=2)
      atom_paren(45) (n=1)
        testlist_comp(146) (n=1)
          int(1)
      NULL

Similarly, (1, 2, 3) was previously:

    expr_stmt(5) (n=2)
      atom_paren(45) (n=1)
        testlist_comp(146) (n=2)
          int(1)
          testlist_comp_3c(150) (n=2)
            int(2)
            int(3)
      NULL

and is now:

    expr_stmt(5) (n=2)
      atom_paren(45) (n=1)
        testlist_comp(146) (n=3)
          int(1)
          int(2)
          int(3)
      NULL

Signed-off-by: Damien George <damien@micropython.org>
2021-09-10 14:09:44 +10:00