micropython/py/compile.c

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

3707 lines
147 KiB
C
Raw Normal View History

/*
* This file is part of the MicroPython project, http://micropython.org/
*
* The MIT License (MIT)
*
* Copyright (c) 2013-2020 Damien P. George
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*/
#include <stdbool.h>
2013-10-04 19:53:11 +01:00
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include "py/scope.h"
#include "py/emit.h"
#include "py/compile.h"
#include "py/runtime.h"
#include "py/asmbase.h"
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
#include "py/nativeglue.h"
#include "py/persistentcode.h"
2013-10-04 19:53:11 +01:00
#if MICROPY_ENABLE_COMPILER
2013-10-04 19:53:11 +01:00
// TODO need to mangle __attr names
#define INVALID_LABEL (0xffff)
2013-10-04 19:53:11 +01:00
typedef enum {
// define rules with a compile function
#define DEF_RULE(rule, comp, kind, ...) PN_##rule,
#define DEF_RULE_NC(rule, kind, ...)
#include "py/grammar.h"
2013-10-04 19:53:11 +01:00
#undef DEF_RULE
#undef DEF_RULE_NC
PN_const_object, // special node for a constant, generic Python object
// define rules without a compile function
#define DEF_RULE(rule, comp, kind, ...)
#define DEF_RULE_NC(rule, kind, ...) PN_##rule,
#include "py/grammar.h"
#undef DEF_RULE
#undef DEF_RULE_NC
2013-10-04 19:53:11 +01:00
} pn_kind_t;
// Whether a mp_parse_node_struct_t that has pns->kind == PN_testlist_comp
// corresponds to a list comprehension or generator.
#define MP_PARSE_NODE_TESTLIST_COMP_HAS_COMP_FOR(pns) \
(MP_PARSE_NODE_STRUCT_NUM_NODES(pns) == 2 && \
MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[1], PN_comp_for))
unix-cpy: Remove unix-cpy. It's no longer needed. unix-cpy was originally written to get semantic equivalent with CPython without writing functional tests. When writing the initial implementation of uPy it was a long way between lexer and functional tests, so the half-way test was to make sure that the bytecode was correct. The idea was that if the uPy bytecode matched CPython 1-1 then uPy would be proper Python if the bytecodes acted correctly. And having matching bytecode meant that it was less likely to miss some deep subtlety in the Python semantics that would require an architectural change later on. But that is all history and it no longer makes sense to retain the ability to output CPython bytecode, because: 1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode changes from version to version, and seems to have changed quite a bit in 3.5. There's no point in changing the bytecode output to match CPython anymore. 2. uPy and CPy do different optimisations to the bytecode which makes it harder to match. 3. The bytecode tests are not run. They were never part of Travis and are not run locally anymore. 4. The EMIT_CPYTHON option needs a lot of extra source code which adds heaps of noise, especially in compile.c. 5. Now that there is an extensive test suite (which tests functionality) there is no need to match the bytecode. Some very subtle behaviour is tested with the test suite and passing these tests is a much better way to stay Python-language compliant, rather than trying to match CPy bytecode.
2015-08-14 12:24:11 +01:00
#define NEED_METHOD_TABLE MICROPY_EMIT_NATIVE
#if NEED_METHOD_TABLE
// we need a method table to do the lookup for the emitter functions
#define EMIT(fun) (comp->emit_method_table->fun(comp->emit))
#define EMIT_ARG(fun, ...) (comp->emit_method_table->fun(comp->emit, __VA_ARGS__))
#define EMIT_LOAD_FAST(qst, local_num) (comp->emit_method_table->load_id.local(comp->emit, qst, local_num, MP_EMIT_IDOP_LOCAL_FAST))
#define EMIT_LOAD_GLOBAL(qst) (comp->emit_method_table->load_id.global(comp->emit, qst, MP_EMIT_IDOP_GLOBAL_GLOBAL))
2013-10-04 19:53:11 +01:00
#else
// if we only have the bytecode emitter enabled then we can do a direct call to the functions
#define EMIT(fun) (mp_emit_bc_##fun(comp->emit))
#define EMIT_ARG(fun, ...) (mp_emit_bc_##fun(comp->emit, __VA_ARGS__))
#define EMIT_LOAD_FAST(qst, local_num) (mp_emit_bc_load_local(comp->emit, qst, local_num, MP_EMIT_IDOP_LOCAL_FAST))
#define EMIT_LOAD_GLOBAL(qst) (mp_emit_bc_load_global(comp->emit, qst, MP_EMIT_IDOP_GLOBAL_GLOBAL))
#endif
#if MICROPY_EMIT_NATIVE && MICROPY_DYNAMIC_COMPILER
#define NATIVE_EMITTER(f) emit_native_table[mp_dynamic_compiler.native_arch]->emit_##f
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
#define NATIVE_EMITTER_TABLE (emit_native_table[mp_dynamic_compiler.native_arch])
STATIC const emit_method_table_t *emit_native_table[] = {
NULL,
&emit_native_x86_method_table,
&emit_native_x64_method_table,
&emit_native_arm_method_table,
&emit_native_thumb_method_table,
&emit_native_thumb_method_table,
&emit_native_thumb_method_table,
&emit_native_thumb_method_table,
&emit_native_thumb_method_table,
&emit_native_xtensa_method_table,
&emit_native_xtensawin_method_table,
};
#elif MICROPY_EMIT_NATIVE
// define a macro to access external native emitter
#if MICROPY_EMIT_X64
#define NATIVE_EMITTER(f) emit_native_x64_##f
#elif MICROPY_EMIT_X86
#define NATIVE_EMITTER(f) emit_native_x86_##f
#elif MICROPY_EMIT_THUMB
#define NATIVE_EMITTER(f) emit_native_thumb_##f
#elif MICROPY_EMIT_ARM
#define NATIVE_EMITTER(f) emit_native_arm_##f
#elif MICROPY_EMIT_XTENSA
#define NATIVE_EMITTER(f) emit_native_xtensa_##f
#elif MICROPY_EMIT_XTENSAWIN
#define NATIVE_EMITTER(f) emit_native_xtensawin_##f
#else
#error "unknown native emitter"
#endif
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
#define NATIVE_EMITTER_TABLE (&NATIVE_EMITTER(method_table))
#endif
#if MICROPY_EMIT_INLINE_ASM && MICROPY_DYNAMIC_COMPILER
#define ASM_EMITTER(f) emit_asm_table[mp_dynamic_compiler.native_arch]->asm_##f
#define ASM_EMITTER_TABLE emit_asm_table[mp_dynamic_compiler.native_arch]
STATIC const emit_inline_asm_method_table_t *emit_asm_table[] = {
NULL,
NULL,
NULL,
&emit_inline_thumb_method_table,
&emit_inline_thumb_method_table,
&emit_inline_thumb_method_table,
&emit_inline_thumb_method_table,
&emit_inline_thumb_method_table,
&emit_inline_thumb_method_table,
&emit_inline_xtensa_method_table,
NULL,
};
#elif MICROPY_EMIT_INLINE_ASM
// define macros for inline assembler
#if MICROPY_EMIT_INLINE_THUMB
#define ASM_DECORATOR_QSTR MP_QSTR_asm_thumb
#define ASM_EMITTER(f) emit_inline_thumb_##f
#elif MICROPY_EMIT_INLINE_XTENSA
#define ASM_DECORATOR_QSTR MP_QSTR_asm_xtensa
#define ASM_EMITTER(f) emit_inline_xtensa_##f
#else
#error "unknown asm emitter"
#endif
#define ASM_EMITTER_TABLE &ASM_EMITTER(method_table)
#endif
#define EMIT_INLINE_ASM(fun) (comp->emit_inline_asm_method_table->fun(comp->emit_inline_asm))
#define EMIT_INLINE_ASM_ARG(fun, ...) (comp->emit_inline_asm_method_table->fun(comp->emit_inline_asm, __VA_ARGS__))
// elements in this struct are ordered to make it compact
2013-10-04 19:53:11 +01:00
typedef struct _compiler_t {
uint8_t is_repl;
uint8_t pass; // holds enum type pass_kind_t
uint8_t have_star;
// try to keep compiler clean from nlr
mp_obj_t compile_error; // set to an exception object if there's an error
size_t compile_error_line; // set to best guess of line of error
2013-10-04 19:53:11 +01:00
uint next_label;
uint16_t num_dict_params;
uint16_t num_default_params;
uint16_t break_label; // highest bit set indicates we are breaking out of a for loop
uint16_t continue_label;
uint16_t cur_except_level; // increased for SETUP_EXCEPT, SETUP_FINALLY; decreased for POP_BLOCK, POP_EXCEPT
uint16_t break_continue_except_level;
2013-10-04 19:53:11 +01:00
scope_t *scope_head;
scope_t *scope_cur;
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
mp_emit_common_t emit_common;
emit_t *emit; // current emitter
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
emit_t *emit_bc;
#if NEED_METHOD_TABLE
const emit_method_table_t *emit_method_table; // current emit method table
#endif
2013-10-05 23:17:28 +01:00
#if MICROPY_EMIT_INLINE_ASM
2013-10-05 23:17:28 +01:00
emit_inline_asm_t *emit_inline_asm; // current emitter for inline asm
const emit_inline_asm_method_table_t *emit_inline_asm_method_table; // current emit method table for inline asm
#endif
2013-10-04 19:53:11 +01:00
} compiler_t;
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
/******************************************************************************/
// mp_emit_common_t helper functions
// These are defined here so they can be inlined, to reduce code size.
STATIC void mp_emit_common_init(mp_emit_common_t *emit, qstr source_file) {
#if MICROPY_EMIT_BYTECODE_USES_QSTR_TABLE
mp_map_init(&emit->qstr_map, 1);
// add the source file as the first entry in the qstr table
mp_map_elem_t *elem = mp_map_lookup(&emit->qstr_map, MP_OBJ_NEW_QSTR(source_file), MP_MAP_LOOKUP_ADD_IF_NOT_FOUND);
elem->value = MP_OBJ_NEW_SMALL_INT(0);
#endif
}
STATIC void mp_emit_common_start_pass(mp_emit_common_t *emit, pass_kind_t pass) {
emit->pass = pass;
if (pass == MP_PASS_STACK_SIZE) {
emit->ct_cur_obj_base = emit->ct_cur_obj;
} else if (pass > MP_PASS_STACK_SIZE) {
emit->ct_cur_obj = emit->ct_cur_obj_base;
}
if (pass == MP_PASS_EMIT) {
if (emit->ct_cur_child == 0) {
emit->children = NULL;
} else {
emit->children = m_new0(mp_raw_code_t *, emit->ct_cur_child);
}
}
emit->ct_cur_child = 0;
}
STATIC void mp_emit_common_finalise(mp_emit_common_t *emit, bool has_native_code) {
emit->ct_cur_obj += has_native_code; // allocate an additional slot for &mp_fun_table
emit->const_table = m_new0(mp_uint_t, emit->ct_cur_obj);
emit->ct_cur_obj = has_native_code; // reserve slot 0 for &mp_fun_table
#if MICROPY_EMIT_NATIVE
if (has_native_code) {
// store mp_fun_table pointer at the start of the constant table
emit->const_table[0] = (mp_uint_t)(uintptr_t)&mp_fun_table;
}
#endif
}
STATIC void mp_emit_common_populate_module_context(mp_emit_common_t *emit, qstr source_file, mp_module_context_t *context) {
#if MICROPY_EMIT_BYTECODE_USES_QSTR_TABLE
size_t qstr_map_used = emit->qstr_map.used;
mp_module_context_alloc_tables(context, qstr_map_used, emit->ct_cur_obj);
for (size_t i = 0; i < emit->qstr_map.alloc; ++i) {
if (mp_map_slot_is_filled(&emit->qstr_map, i)) {
size_t idx = MP_OBJ_SMALL_INT_VALUE(emit->qstr_map.table[i].value);
qstr qst = MP_OBJ_QSTR_VALUE(emit->qstr_map.table[i].key);
context->constants.qstr_table[idx] = qst;
}
}
#else
mp_module_context_alloc_tables(context, 0, emit->ct_cur_obj);
context->constants.source_file = source_file;
#endif
if (emit->ct_cur_obj > 0) {
memcpy(context->constants.obj_table, emit->const_table, emit->ct_cur_obj * sizeof(mp_uint_t));
}
}
/******************************************************************************/
STATIC void compile_error_set_line(compiler_t *comp, mp_parse_node_t pn) {
// if the line of the error is unknown then try to update it from the pn
if (comp->compile_error_line == 0 && MP_PARSE_NODE_IS_STRUCT(pn)) {
comp->compile_error_line = ((mp_parse_node_struct_t *)pn)->source_line;
}
}
STATIC void compile_syntax_error(compiler_t *comp, mp_parse_node_t pn, mp_rom_error_text_t msg) {
// only register the error if there has been no other error
if (comp->compile_error == MP_OBJ_NULL) {
comp->compile_error = mp_obj_new_exception_msg(&mp_type_SyntaxError, msg);
compile_error_set_line(comp, pn);
}
}
STATIC void compile_trailer_paren_helper(compiler_t *comp, mp_parse_node_t pn_arglist, bool is_method_call, int n_positional_extra);
STATIC void compile_comprehension(compiler_t *comp, mp_parse_node_struct_t *pns, scope_kind_t kind);
STATIC void compile_atom_brace_helper(compiler_t *comp, mp_parse_node_struct_t *pns, bool create_map);
STATIC void compile_node(compiler_t *comp, mp_parse_node_t pn);
2013-10-04 19:53:11 +01:00
STATIC uint comp_next_label(compiler_t *comp) {
return comp->next_label++;
}
py/emitnative: Optimise and improve exception handling in native code. Prior to this patch, native code would use a full nlr_buf_t for each exception handler (try-except, try-finally, with). For nested exception handlers this would use a lot of C stack and be rather inefficient. This patch changes how exceptions are handled in native code by setting up only a single nlr_buf_t context for the entire function, and then manages a state machine (using the PC) to work out which exception handler to run when an exception is raised by an nlr_jump. This keeps the C stack usage at a constant level regardless of the depth of Python exception blocks. The patch also fixes an existing bug when local variables are written to within an exception handler, then their value was incorrectly restored if an exception was raised (since the nlr_jump would restore register values, back to the point of the nlr_push). And it also gets nested try-finally+with working with the viper emitter. Broadly speaking, efficiency of executing native code that doesn't use any exception blocks is unchanged, and emitted code size is only slightly increased for such function. C stack usage of all native functions is either equal or less than before. Emitted code size for native functions that use exception blocks is increased by roughly 10% (due in part to fixing of above-mentioned bugs). But, most importantly, this patch allows to implement more Python features in native code, like unwind jumps and yielding from within nested exception blocks.
2018-08-16 04:56:36 +01:00
#if MICROPY_EMIT_NATIVE
STATIC void reserve_labels_for_native(compiler_t *comp, int n) {
if (comp->scope_cur->emit_options != MP_EMIT_OPT_BYTECODE) {
comp->next_label += n;
}
}
#else
#define reserve_labels_for_native(comp, n)
#endif
STATIC void compile_increase_except_level(compiler_t *comp, uint label, int kind) {
EMIT_ARG(setup_block, label, kind);
comp->cur_except_level += 1;
if (comp->cur_except_level > comp->scope_cur->exc_stack_size) {
comp->scope_cur->exc_stack_size = comp->cur_except_level;
}
}
STATIC void compile_decrease_except_level(compiler_t *comp) {
assert(comp->cur_except_level > 0);
comp->cur_except_level -= 1;
EMIT(end_finally);
reserve_labels_for_native(comp, 1);
}
STATIC scope_t *scope_new_and_link(compiler_t *comp, scope_kind_t kind, mp_parse_node_t pn, uint emit_options) {
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
scope_t *scope = scope_new(kind, pn, emit_options);
2013-10-04 19:53:11 +01:00
scope->parent = comp->scope_cur;
scope->next = NULL;
if (comp->scope_head == NULL) {
comp->scope_head = scope;
} else {
scope_t *s = comp->scope_head;
while (s->next != NULL) {
s = s->next;
}
s->next = scope;
}
return scope;
}
typedef void (*apply_list_fun_t)(compiler_t *comp, mp_parse_node_t pn);
STATIC void apply_to_single_or_list(compiler_t *comp, mp_parse_node_t pn, pn_kind_t pn_list_kind, apply_list_fun_t f) {
if (MP_PARSE_NODE_IS_STRUCT_KIND(pn, pn_list_kind)) {
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
int num_nodes = MP_PARSE_NODE_STRUCT_NUM_NODES(pns);
2013-10-04 19:53:11 +01:00
for (int i = 0; i < num_nodes; i++) {
f(comp, pns->nodes[i]);
}
} else if (!MP_PARSE_NODE_IS_NULL(pn)) {
2013-10-04 19:53:11 +01:00
f(comp, pn);
}
}
STATIC void compile_generic_all_nodes(compiler_t *comp, mp_parse_node_struct_t *pns) {
int num_nodes = MP_PARSE_NODE_STRUCT_NUM_NODES(pns);
2013-10-04 19:53:11 +01:00
for (int i = 0; i < num_nodes; i++) {
compile_node(comp, pns->nodes[i]);
if (comp->compile_error != MP_OBJ_NULL) {
// add line info for the error in case it didn't have a line number
compile_error_set_line(comp, pns->nodes[i]);
return;
}
2013-10-04 19:53:11 +01:00
}
}
STATIC void compile_load_id(compiler_t *comp, qstr qst) {
if (comp->pass == MP_PASS_SCOPE) {
mp_emit_common_get_id_for_load(comp->scope_cur, qst);
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
}
{
#if NEED_METHOD_TABLE
mp_emit_common_id_op(comp->emit, &comp->emit_method_table->load_id, comp->scope_cur, qst);
#else
mp_emit_common_id_op(comp->emit, &mp_emit_bc_method_table_load_id_ops, comp->scope_cur, qst);
#endif
}
}
STATIC void compile_store_id(compiler_t *comp, qstr qst) {
if (comp->pass == MP_PASS_SCOPE) {
mp_emit_common_get_id_for_modification(comp->scope_cur, qst);
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
}
{
#if NEED_METHOD_TABLE
mp_emit_common_id_op(comp->emit, &comp->emit_method_table->store_id, comp->scope_cur, qst);
#else
mp_emit_common_id_op(comp->emit, &mp_emit_bc_method_table_store_id_ops, comp->scope_cur, qst);
#endif
}
}
STATIC void compile_delete_id(compiler_t *comp, qstr qst) {
if (comp->pass == MP_PASS_SCOPE) {
mp_emit_common_get_id_for_modification(comp->scope_cur, qst);
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
}
{
#if NEED_METHOD_TABLE
mp_emit_common_id_op(comp->emit, &comp->emit_method_table->delete_id, comp->scope_cur, qst);
#else
mp_emit_common_id_op(comp->emit, &mp_emit_bc_method_table_delete_id_ops, comp->scope_cur, qst);
#endif
}
}
STATIC void compile_generic_tuple(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-10-04 19:53:11 +01:00
// a simple tuple expression
size_t num_nodes = MP_PARSE_NODE_STRUCT_NUM_NODES(pns);
for (size_t i = 0; i < num_nodes; i++) {
compile_node(comp, pns->nodes[i]);
}
EMIT_ARG(build, num_nodes, MP_EMIT_BUILD_TUPLE);
2013-10-04 19:53:11 +01:00
}
STATIC void c_if_cond(compiler_t *comp, mp_parse_node_t pn, bool jump_if, int label) {
if (mp_parse_node_is_const_false(pn)) {
if (jump_if == false) {
EMIT_ARG(jump, label);
}
return;
} else if (mp_parse_node_is_const_true(pn)) {
if (jump_if == true) {
EMIT_ARG(jump, label);
}
return;
} else if (MP_PARSE_NODE_IS_STRUCT(pn)) {
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
int n = MP_PARSE_NODE_STRUCT_NUM_NODES(pns);
if (MP_PARSE_NODE_STRUCT_KIND(pns) == PN_or_test) {
if (jump_if == false) {
and_or_logic1:;
uint label2 = comp_next_label(comp);
for (int i = 0; i < n - 1; i++) {
c_if_cond(comp, pns->nodes[i], !jump_if, label2);
}
c_if_cond(comp, pns->nodes[n - 1], jump_if, label);
EMIT_ARG(label_assign, label2);
} else {
and_or_logic2:
for (int i = 0; i < n; i++) {
c_if_cond(comp, pns->nodes[i], jump_if, label);
}
}
return;
} else if (MP_PARSE_NODE_STRUCT_KIND(pns) == PN_and_test) {
if (jump_if == false) {
goto and_or_logic2;
} else {
goto and_or_logic1;
}
} else if (MP_PARSE_NODE_STRUCT_KIND(pns) == PN_not_test_2) {
c_if_cond(comp, pns->nodes[0], !jump_if, label);
return;
} else if (MP_PARSE_NODE_STRUCT_KIND(pns) == PN_atom_paren) {
// cond is something in parenthesis
if (MP_PARSE_NODE_IS_NULL(pns->nodes[0])) {
// empty tuple, acts as false for the condition
if (jump_if == false) {
EMIT_ARG(jump, label);
}
} else {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_testlist_comp));
// non-empty tuple, acts as true for the condition
if (jump_if == true) {
EMIT_ARG(jump, label);
}
}
return;
}
}
// nothing special, fall back to default compiling for node and jump
compile_node(comp, pn);
EMIT_ARG(pop_jump_if, jump_if, label);
2013-10-04 19:53:11 +01:00
}
typedef enum { ASSIGN_STORE, ASSIGN_AUG_LOAD, ASSIGN_AUG_STORE } assign_kind_t;
STATIC void c_assign(compiler_t *comp, mp_parse_node_t pn, assign_kind_t kind);
2013-10-04 19:53:11 +01:00
STATIC void c_assign_atom_expr(compiler_t *comp, mp_parse_node_struct_t *pns, assign_kind_t assign_kind) {
2013-10-04 19:53:11 +01:00
if (assign_kind != ASSIGN_AUG_STORE) {
compile_node(comp, pns->nodes[0]);
}
if (MP_PARSE_NODE_IS_STRUCT(pns->nodes[1])) {
mp_parse_node_struct_t *pns1 = (mp_parse_node_struct_t *)pns->nodes[1];
if (MP_PARSE_NODE_STRUCT_KIND(pns1) == PN_atom_expr_trailers) {
int n = MP_PARSE_NODE_STRUCT_NUM_NODES(pns1);
2013-10-04 19:53:11 +01:00
if (assign_kind != ASSIGN_AUG_STORE) {
for (int i = 0; i < n - 1; i++) {
compile_node(comp, pns1->nodes[i]);
}
}
assert(MP_PARSE_NODE_IS_STRUCT(pns1->nodes[n - 1]));
pns1 = (mp_parse_node_struct_t *)pns1->nodes[n - 1];
2013-10-04 19:53:11 +01:00
}
if (MP_PARSE_NODE_STRUCT_KIND(pns1) == PN_trailer_bracket) {
2013-10-04 19:53:11 +01:00
if (assign_kind == ASSIGN_AUG_STORE) {
EMIT(rot_three);
EMIT_ARG(subscr, MP_EMIT_SUBSCR_STORE);
2013-10-04 19:53:11 +01:00
} else {
compile_node(comp, pns1->nodes[0]);
if (assign_kind == ASSIGN_AUG_LOAD) {
EMIT(dup_top_two);
EMIT_ARG(subscr, MP_EMIT_SUBSCR_LOAD);
2013-10-04 19:53:11 +01:00
} else {
EMIT_ARG(subscr, MP_EMIT_SUBSCR_STORE);
2013-10-04 19:53:11 +01:00
}
}
return;
} else if (MP_PARSE_NODE_STRUCT_KIND(pns1) == PN_trailer_period) {
assert(MP_PARSE_NODE_IS_ID(pns1->nodes[0]));
2013-10-04 19:53:11 +01:00
if (assign_kind == ASSIGN_AUG_LOAD) {
EMIT(dup_top);
EMIT_ARG(attr, MP_PARSE_NODE_LEAF_ARG(pns1->nodes[0]), MP_EMIT_ATTR_LOAD);
2013-10-04 19:53:11 +01:00
} else {
if (assign_kind == ASSIGN_AUG_STORE) {
EMIT(rot_two);
}
EMIT_ARG(attr, MP_PARSE_NODE_LEAF_ARG(pns1->nodes[0]), MP_EMIT_ATTR_STORE);
2013-10-04 19:53:11 +01:00
}
return;
2013-10-04 19:53:11 +01:00
}
}
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("can't assign to expression"));
2013-10-04 19:53:11 +01:00
}
STATIC void c_assign_tuple(compiler_t *comp, uint num_tail, mp_parse_node_t *nodes_tail) {
// look for star expression
uint have_star_index = -1;
for (uint i = 0; i < num_tail; i++) {
if (MP_PARSE_NODE_IS_STRUCT_KIND(nodes_tail[i], PN_star_expr)) {
if (have_star_index == (uint)-1) {
EMIT_ARG(unpack_ex, i, num_tail - i - 1);
have_star_index = i;
2013-10-04 19:53:11 +01:00
} else {
compile_syntax_error(comp, nodes_tail[i], MP_ERROR_TEXT("multiple *x in assignment"));
2013-10-04 19:53:11 +01:00
return;
}
}
}
if (have_star_index == (uint)-1) {
EMIT_ARG(unpack_sequence, num_tail);
}
for (uint i = 0; i < num_tail; i++) {
if (i == have_star_index) {
c_assign(comp, ((mp_parse_node_struct_t *)nodes_tail[i])->nodes[0], ASSIGN_STORE);
} else {
c_assign(comp, nodes_tail[i], ASSIGN_STORE);
2013-10-04 19:53:11 +01:00
}
}
}
// assigns top of stack to pn
STATIC void c_assign(compiler_t *comp, mp_parse_node_t pn, assign_kind_t assign_kind) {
assert(!MP_PARSE_NODE_IS_NULL(pn));
if (MP_PARSE_NODE_IS_LEAF(pn)) {
if (MP_PARSE_NODE_IS_ID(pn)) {
qstr arg = MP_PARSE_NODE_LEAF_ARG(pn);
2013-10-04 19:53:11 +01:00
switch (assign_kind) {
case ASSIGN_STORE:
case ASSIGN_AUG_STORE:
compile_store_id(comp, arg);
2013-10-04 19:53:11 +01:00
break;
case ASSIGN_AUG_LOAD:
default:
compile_load_id(comp, arg);
2013-10-04 19:53:11 +01:00
break;
}
} else {
goto cannot_assign;
2013-10-04 19:53:11 +01:00
}
} else {
// pn must be a struct
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
switch (MP_PARSE_NODE_STRUCT_KIND(pns)) {
case PN_atom_expr_normal:
2013-10-04 19:53:11 +01:00
// lhs is an index or attribute
c_assign_atom_expr(comp, pns, assign_kind);
2013-10-04 19:53:11 +01:00
break;
case PN_testlist_star_expr:
case PN_exprlist:
// lhs is a tuple
if (assign_kind != ASSIGN_STORE) {
goto cannot_assign;
2013-10-04 19:53:11 +01:00
}
c_assign_tuple(comp, MP_PARSE_NODE_STRUCT_NUM_NODES(pns), pns->nodes);
2013-10-04 19:53:11 +01:00
break;
case PN_atom_paren:
// lhs is something in parenthesis
if (MP_PARSE_NODE_IS_NULL(pns->nodes[0])) {
2013-10-04 19:53:11 +01:00
// empty tuple
goto cannot_assign;
} else {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_testlist_comp));
if (assign_kind != ASSIGN_STORE) {
goto cannot_assign;
}
pns = (mp_parse_node_struct_t *)pns->nodes[0];
2013-10-04 19:53:11 +01:00
goto testlist_comp;
}
break;
case PN_atom_bracket:
// lhs is something in brackets
if (assign_kind != ASSIGN_STORE) {
goto cannot_assign;
2013-10-04 19:53:11 +01:00
}
if (MP_PARSE_NODE_IS_NULL(pns->nodes[0])) {
2013-10-04 19:53:11 +01:00
// empty list, assignment allowed
c_assign_tuple(comp, 0, NULL);
} else if (MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_testlist_comp)) {
pns = (mp_parse_node_struct_t *)pns->nodes[0];
2013-10-04 19:53:11 +01:00
goto testlist_comp;
} else {
// brackets around 1 item
c_assign_tuple(comp, 1, pns->nodes);
2013-10-04 19:53:11 +01:00
}
break;
default:
goto cannot_assign;
2013-10-04 19:53:11 +01:00
}
return;
testlist_comp:
// lhs is a sequence
if (MP_PARSE_NODE_TESTLIST_COMP_HAS_COMP_FOR(pns)) {
goto cannot_assign;
2013-10-04 19:53:11 +01:00
}
c_assign_tuple(comp, MP_PARSE_NODE_STRUCT_NUM_NODES(pns), pns->nodes);
2013-10-04 19:53:11 +01:00
return;
}
return;
cannot_assign:
compile_syntax_error(comp, pn, MP_ERROR_TEXT("can't assign to expression"));
2013-10-04 19:53:11 +01:00
}
unix-cpy: Remove unix-cpy. It's no longer needed. unix-cpy was originally written to get semantic equivalent with CPython without writing functional tests. When writing the initial implementation of uPy it was a long way between lexer and functional tests, so the half-way test was to make sure that the bytecode was correct. The idea was that if the uPy bytecode matched CPython 1-1 then uPy would be proper Python if the bytecodes acted correctly. And having matching bytecode meant that it was less likely to miss some deep subtlety in the Python semantics that would require an architectural change later on. But that is all history and it no longer makes sense to retain the ability to output CPython bytecode, because: 1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode changes from version to version, and seems to have changed quite a bit in 3.5. There's no point in changing the bytecode output to match CPython anymore. 2. uPy and CPy do different optimisations to the bytecode which makes it harder to match. 3. The bytecode tests are not run. They were never part of Travis and are not run locally anymore. 4. The EMIT_CPYTHON option needs a lot of extra source code which adds heaps of noise, especially in compile.c. 5. Now that there is an extensive test suite (which tests functionality) there is no need to match the bytecode. Some very subtle behaviour is tested with the test suite and passing these tests is a much better way to stay Python-language compliant, rather than trying to match CPy bytecode.
2015-08-14 12:24:11 +01:00
// stuff for lambda and comprehensions and generators:
// if n_pos_defaults > 0 then there is a tuple on the stack with the positional defaults
// if n_kw_defaults > 0 then there is a dictionary on the stack with the keyword defaults
// if both exist, the tuple is above the dictionary (ie the first pop gets the tuple)
STATIC void close_over_variables_etc(compiler_t *comp, scope_t *this_scope, int n_pos_defaults, int n_kw_defaults) {
assert(n_pos_defaults >= 0);
assert(n_kw_defaults >= 0);
// set flags
if (n_kw_defaults > 0) {
this_scope->scope_flags |= MP_SCOPE_FLAG_DEFKWARGS;
}
this_scope->num_def_pos_args = n_pos_defaults;
py: Fix native functions so they run with their correct globals context. Prior to this commit a function compiled with the native decorator @micropython.native would not work correctly when accessing global variables, because the globals dict was not being set upon function entry. This commit fixes this problem by, upon function entry, setting as the current globals dict the globals dict context the function was defined within, as per normal Python semantics, and as bytecode does. Upon function exit the original globals dict is restored. In order to restore the globals dict when an exception is raised the native function must guard its internals with an nlr_push/nlr_pop pair. Because this push/pop is relatively expensive, in both C stack usage for the nlr_buf_t and CPU execution time, the implementation here optimises things as much as possible. First, the compiler keeps track of whether a function even needs to access global variables. Using this information the native emitter then generates three different kinds of code: 1. no globals used, no exception handlers: no nlr handling code and no setting of the globals dict. 2. globals used, no exception handlers: an nlr_buf_t is allocated on the C stack but it is not used if the globals dict is unchanged, saving execution time because nlr_push/nlr_pop don't need to run. 3. function has exception handlers, may use globals: an nlr_buf_t is allocated and nlr_push/nlr_pop are always called. In the end, native functions that don't access globals and don't have exception handlers will run more efficiently than those that do. Fixes issue #1573.
2018-09-13 13:03:48 +01:00
#if MICROPY_EMIT_NATIVE
// When creating a function/closure it will take a reference to the current globals
comp->scope_cur->scope_flags |= MP_SCOPE_FLAG_REFGLOBALS | MP_SCOPE_FLAG_HASCONSTS;
py: Fix native functions so they run with their correct globals context. Prior to this commit a function compiled with the native decorator @micropython.native would not work correctly when accessing global variables, because the globals dict was not being set upon function entry. This commit fixes this problem by, upon function entry, setting as the current globals dict the globals dict context the function was defined within, as per normal Python semantics, and as bytecode does. Upon function exit the original globals dict is restored. In order to restore the globals dict when an exception is raised the native function must guard its internals with an nlr_push/nlr_pop pair. Because this push/pop is relatively expensive, in both C stack usage for the nlr_buf_t and CPU execution time, the implementation here optimises things as much as possible. First, the compiler keeps track of whether a function even needs to access global variables. Using this information the native emitter then generates three different kinds of code: 1. no globals used, no exception handlers: no nlr handling code and no setting of the globals dict. 2. globals used, no exception handlers: an nlr_buf_t is allocated on the C stack but it is not used if the globals dict is unchanged, saving execution time because nlr_push/nlr_pop don't need to run. 3. function has exception handlers, may use globals: an nlr_buf_t is allocated and nlr_push/nlr_pop are always called. In the end, native functions that don't access globals and don't have exception handlers will run more efficiently than those that do. Fixes issue #1573.
2018-09-13 13:03:48 +01:00
#endif
2013-10-04 19:53:11 +01:00
// make closed over variables, if any
// ensure they are closed over in the order defined in the outer scope (mainly to agree with CPython)
2013-10-04 19:53:11 +01:00
int nfree = 0;
if (comp->scope_cur->kind != SCOPE_MODULE) {
for (int i = 0; i < comp->scope_cur->id_info_len; i++) {
id_info_t *id = &comp->scope_cur->id_info[i];
if (id->kind == ID_INFO_KIND_CELL || id->kind == ID_INFO_KIND_FREE) {
for (int j = 0; j < this_scope->id_info_len; j++) {
id_info_t *id2 = &this_scope->id_info[j];
if (id2->kind == ID_INFO_KIND_FREE && id->qst == id2->qst) {
// in MicroPython we load closures using LOAD_FAST
EMIT_LOAD_FAST(id->qst, id->local_num);
nfree += 1;
}
}
2013-10-04 19:53:11 +01:00
}
}
}
// make the function/closure
if (nfree == 0) {
EMIT_ARG(make_function, this_scope, n_pos_defaults, n_kw_defaults);
2013-10-04 19:53:11 +01:00
} else {
EMIT_ARG(make_closure, this_scope, nfree, n_pos_defaults, n_kw_defaults);
2013-10-04 19:53:11 +01:00
}
}
STATIC void compile_funcdef_lambdef_param(compiler_t *comp, mp_parse_node_t pn) {
// For efficiency of the code below we extract the parse-node kind here
int pn_kind;
if (MP_PARSE_NODE_IS_ID(pn)) {
pn_kind = -1;
} else {
assert(MP_PARSE_NODE_IS_STRUCT(pn));
pn_kind = MP_PARSE_NODE_STRUCT_KIND((mp_parse_node_struct_t *)pn);
}
if (pn_kind == PN_typedargslist_star || pn_kind == PN_varargslist_star) {
comp->have_star = true;
/* don't need to distinguish bare from named star
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t*)pn;
if (MP_PARSE_NODE_IS_NULL(pns->nodes[0])) {
// bare star
} else {
// named star
}
*/
} else if (pn_kind == PN_typedargslist_dbl_star || pn_kind == PN_varargslist_dbl_star) {
// named double star
// TODO do we need to do anything with this?
} else {
mp_parse_node_t pn_id;
mp_parse_node_t pn_equal;
if (pn_kind == -1) {
// this parameter is just an id
pn_id = pn;
pn_equal = MP_PARSE_NODE_NULL;
} else if (pn_kind == PN_typedargslist_name) {
// this parameter has a colon and/or equal specifier
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
pn_id = pns->nodes[0];
// pn_colon = pns->nodes[1]; // unused
pn_equal = pns->nodes[2];
} else {
assert(pn_kind == PN_varargslist_name); // should be
// this parameter has an equal specifier
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
pn_id = pns->nodes[0];
pn_equal = pns->nodes[1];
}
if (MP_PARSE_NODE_IS_NULL(pn_equal)) {
// this parameter does not have a default value
// check for non-default parameters given after default parameters (allowed by parser, but not syntactically valid)
if (!comp->have_star && comp->num_default_params != 0) {
compile_syntax_error(comp, pn, MP_ERROR_TEXT("non-default argument follows default argument"));
return;
}
} else {
2013-10-04 19:53:11 +01:00
// this parameter has a default value
// in CPython, None (and True, False?) as default parameters are loaded with LOAD_NAME; don't understandy why
if (comp->have_star) {
comp->num_dict_params += 1;
// in MicroPython we put the default dict parameters into a dictionary using the bytecode
if (comp->num_dict_params == 1) {
// in MicroPython we put the default positional parameters into a tuple using the bytecode
// we need to do this here before we start building the map for the default keywords
if (comp->num_default_params > 0) {
EMIT_ARG(build, comp->num_default_params, MP_EMIT_BUILD_TUPLE);
} else {
EMIT(load_null); // sentinel indicating empty default positional args
}
// first default dict param, so make the map
EMIT_ARG(build, 0, MP_EMIT_BUILD_MAP);
}
// compile value then key, then store it to the dict
compile_node(comp, pn_equal);
EMIT_ARG(load_const_str, MP_PARSE_NODE_LEAF_ARG(pn_id));
EMIT(store_map);
2013-10-04 19:53:11 +01:00
} else {
comp->num_default_params += 1;
compile_node(comp, pn_equal);
2013-10-04 19:53:11 +01:00
}
}
}
}
STATIC void compile_funcdef_lambdef(compiler_t *comp, scope_t *scope, mp_parse_node_t pn_params, pn_kind_t pn_list_kind) {
// When we call compile_funcdef_lambdef_param below it can compile an arbitrary
// expression for default arguments, which may contain a lambda. The lambda will
// call here in a nested way, so we must save and restore the relevant state.
bool orig_have_star = comp->have_star;
uint16_t orig_num_dict_params = comp->num_dict_params;
uint16_t orig_num_default_params = comp->num_default_params;
2013-10-04 19:53:11 +01:00
// compile default parameters
comp->have_star = false;
comp->num_dict_params = 0;
comp->num_default_params = 0;
apply_to_single_or_list(comp, pn_params, pn_list_kind, compile_funcdef_lambdef_param);
if (comp->compile_error != MP_OBJ_NULL) {
return;
}
// in MicroPython we put the default positional parameters into a tuple using the bytecode
// the default keywords args may have already made the tuple; if not, do it now
if (comp->num_default_params > 0 && comp->num_dict_params == 0) {
EMIT_ARG(build, comp->num_default_params, MP_EMIT_BUILD_TUPLE);
EMIT(load_null); // sentinel indicating empty default keyword args
}
// make the function
close_over_variables_etc(comp, scope, comp->num_default_params, comp->num_dict_params);
// restore state
comp->have_star = orig_have_star;
comp->num_dict_params = orig_num_dict_params;
comp->num_default_params = orig_num_default_params;
}
// leaves function object on stack
// returns function name
STATIC qstr compile_funcdef_helper(compiler_t *comp, mp_parse_node_struct_t *pns, uint emit_options) {
if (comp->pass == MP_PASS_SCOPE) {
// create a new scope for this function
scope_t *s = scope_new_and_link(comp, SCOPE_FUNCTION, (mp_parse_node_t)pns, emit_options);
// store the function scope so the compiling function can use it at each pass
pns->nodes[4] = (mp_parse_node_t)s;
}
2013-10-04 19:53:11 +01:00
// get the scope for this function
scope_t *fscope = (scope_t *)pns->nodes[4];
// compile the function definition
compile_funcdef_lambdef(comp, fscope, pns->nodes[1], PN_typedargslist);
2013-10-04 19:53:11 +01:00
// return its name (the 'f' in "def f(...):")
return fscope->simple_name;
}
// leaves class object on stack
// returns class name
STATIC qstr compile_classdef_helper(compiler_t *comp, mp_parse_node_struct_t *pns, uint emit_options) {
if (comp->pass == MP_PASS_SCOPE) {
2013-10-04 19:53:11 +01:00
// create a new scope for this class
scope_t *s = scope_new_and_link(comp, SCOPE_CLASS, (mp_parse_node_t)pns, emit_options);
2013-10-04 19:53:11 +01:00
// store the class scope so the compiling function can use it at each pass
pns->nodes[3] = (mp_parse_node_t)s;
2013-10-04 19:53:11 +01:00
}
EMIT(load_build_class);
// scope for this class
scope_t *cscope = (scope_t *)pns->nodes[3];
// compile the class
close_over_variables_etc(comp, cscope, 0, 0);
// get its name
EMIT_ARG(load_const_str, cscope->simple_name);
2013-10-04 19:53:11 +01:00
// nodes[1] has parent classes, if any
// empty parenthesis (eg class C():) gets here as an empty PN_classdef_2 and needs special handling
mp_parse_node_t parents = pns->nodes[1];
if (MP_PARSE_NODE_IS_STRUCT_KIND(parents, PN_classdef_2)) {
parents = MP_PARSE_NODE_NULL;
}
compile_trailer_paren_helper(comp, parents, false, 2);
2013-10-04 19:53:11 +01:00
// return its name (the 'C' in class C(...):")
return cscope->simple_name;
}
// returns true if it was a built-in decorator (even if the built-in had an error)
STATIC bool compile_built_in_decorator(compiler_t *comp, size_t name_len, mp_parse_node_t *name_nodes, uint *emit_options) {
if (MP_PARSE_NODE_LEAF_ARG(name_nodes[0]) != MP_QSTR_micropython) {
return false;
}
if (name_len != 2) {
compile_syntax_error(comp, name_nodes[0], MP_ERROR_TEXT("invalid micropython decorator"));
return true;
}
qstr attr = MP_PARSE_NODE_LEAF_ARG(name_nodes[1]);
if (attr == MP_QSTR_bytecode) {
*emit_options = MP_EMIT_OPT_BYTECODE;
#if MICROPY_EMIT_NATIVE
} else if (attr == MP_QSTR_native) {
*emit_options = MP_EMIT_OPT_NATIVE_PYTHON;
} else if (attr == MP_QSTR_viper) {
*emit_options = MP_EMIT_OPT_VIPER;
#endif
#if MICROPY_EMIT_INLINE_ASM
#if MICROPY_DYNAMIC_COMPILER
} else if (attr == MP_QSTR_asm_thumb) {
*emit_options = MP_EMIT_OPT_ASM;
} else if (attr == MP_QSTR_asm_xtensa) {
*emit_options = MP_EMIT_OPT_ASM;
#else
} else if (attr == ASM_DECORATOR_QSTR) {
*emit_options = MP_EMIT_OPT_ASM;
#endif
#endif
} else {
compile_syntax_error(comp, name_nodes[1], MP_ERROR_TEXT("invalid micropython decorator"));
}
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
#if MICROPY_EMIT_NATIVE && MICROPY_DYNAMIC_COMPILER
if (*emit_options == MP_EMIT_OPT_NATIVE_PYTHON || *emit_options == MP_EMIT_OPT_VIPER) {
if (emit_native_table[mp_dynamic_compiler.native_arch] == NULL) {
compile_syntax_error(comp, name_nodes[1], MP_ERROR_TEXT("invalid arch"));
}
} else if (*emit_options == MP_EMIT_OPT_ASM) {
if (emit_asm_table[mp_dynamic_compiler.native_arch] == NULL) {
compile_syntax_error(comp, name_nodes[1], MP_ERROR_TEXT("invalid arch"));
}
}
#endif
return true;
}
STATIC void compile_decorated(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-10-04 19:53:11 +01:00
// get the list of decorators
mp_parse_node_t *nodes;
size_t n = mp_parse_node_extract_list(&pns->nodes[0], PN_decorators, &nodes);
2013-10-04 19:53:11 +01:00
// inherit emit options for this function/class definition
uint emit_options = comp->scope_cur->emit_options;
// compile each decorator
size_t num_built_in_decorators = 0;
for (size_t i = 0; i < n; i++) {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(nodes[i], PN_decorator)); // should be
mp_parse_node_struct_t *pns_decorator = (mp_parse_node_struct_t *)nodes[i];
// nodes[0] contains the decorator function, which is a dotted name
mp_parse_node_t *name_nodes;
size_t name_len = mp_parse_node_extract_list(&pns_decorator->nodes[0], PN_dotted_name, &name_nodes);
// check for built-in decorators
if (compile_built_in_decorator(comp, name_len, name_nodes, &emit_options)) {
// this was a built-in
num_built_in_decorators += 1;
} else {
// not a built-in, compile normally
// compile the decorator function
compile_node(comp, name_nodes[0]);
for (size_t j = 1; j < name_len; j++) {
assert(MP_PARSE_NODE_IS_ID(name_nodes[j])); // should be
EMIT_ARG(attr, MP_PARSE_NODE_LEAF_ARG(name_nodes[j]), MP_EMIT_ATTR_LOAD);
}
// nodes[1] contains arguments to the decorator function, if any
if (!MP_PARSE_NODE_IS_NULL(pns_decorator->nodes[1])) {
// call the decorator function with the arguments in nodes[1]
compile_node(comp, pns_decorator->nodes[1]);
}
2013-10-04 19:53:11 +01:00
}
}
// compile the body (funcdef, async funcdef or classdef) and get its name
mp_parse_node_struct_t *pns_body = (mp_parse_node_struct_t *)pns->nodes[1];
2013-10-04 19:53:11 +01:00
qstr body_name = 0;
if (MP_PARSE_NODE_STRUCT_KIND(pns_body) == PN_funcdef) {
body_name = compile_funcdef_helper(comp, pns_body, emit_options);
#if MICROPY_PY_ASYNC_AWAIT
} else if (MP_PARSE_NODE_STRUCT_KIND(pns_body) == PN_async_funcdef) {
assert(MP_PARSE_NODE_IS_STRUCT(pns_body->nodes[0]));
mp_parse_node_struct_t *pns0 = (mp_parse_node_struct_t *)pns_body->nodes[0];
body_name = compile_funcdef_helper(comp, pns0, emit_options);
scope_t *fscope = (scope_t *)pns0->nodes[4];
fscope->scope_flags |= MP_SCOPE_FLAG_GENERATOR;
#endif
2013-10-04 19:53:11 +01:00
} else {
assert(MP_PARSE_NODE_STRUCT_KIND(pns_body) == PN_classdef); // should be
body_name = compile_classdef_helper(comp, pns_body, emit_options);
2013-10-04 19:53:11 +01:00
}
// call each decorator
for (size_t i = 0; i < n - num_built_in_decorators; i++) {
EMIT_ARG(call_function, 1, 0, 0);
2013-10-04 19:53:11 +01:00
}
// store func/class object into name
compile_store_id(comp, body_name);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_funcdef(compiler_t *comp, mp_parse_node_struct_t *pns) {
qstr fname = compile_funcdef_helper(comp, pns, comp->scope_cur->emit_options);
2013-10-04 19:53:11 +01:00
// store function object into function name
compile_store_id(comp, fname);
2013-10-04 19:53:11 +01:00
}
STATIC void c_del_stmt(compiler_t *comp, mp_parse_node_t pn) {
if (MP_PARSE_NODE_IS_ID(pn)) {
compile_delete_id(comp, MP_PARSE_NODE_LEAF_ARG(pn));
} else if (MP_PARSE_NODE_IS_STRUCT_KIND(pn, PN_atom_expr_normal)) {
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[0]); // base of the atom_expr_normal node
2013-10-04 19:53:11 +01:00
if (MP_PARSE_NODE_IS_STRUCT(pns->nodes[1])) {
mp_parse_node_struct_t *pns1 = (mp_parse_node_struct_t *)pns->nodes[1];
if (MP_PARSE_NODE_STRUCT_KIND(pns1) == PN_atom_expr_trailers) {
int n = MP_PARSE_NODE_STRUCT_NUM_NODES(pns1);
2013-10-04 19:53:11 +01:00
for (int i = 0; i < n - 1; i++) {
compile_node(comp, pns1->nodes[i]);
}
assert(MP_PARSE_NODE_IS_STRUCT(pns1->nodes[n - 1]));
pns1 = (mp_parse_node_struct_t *)pns1->nodes[n - 1];
2013-10-04 19:53:11 +01:00
}
if (MP_PARSE_NODE_STRUCT_KIND(pns1) == PN_trailer_bracket) {
2013-10-04 19:53:11 +01:00
compile_node(comp, pns1->nodes[0]);
EMIT_ARG(subscr, MP_EMIT_SUBSCR_DELETE);
} else if (MP_PARSE_NODE_STRUCT_KIND(pns1) == PN_trailer_period) {
assert(MP_PARSE_NODE_IS_ID(pns1->nodes[0]));
EMIT_ARG(attr, MP_PARSE_NODE_LEAF_ARG(pns1->nodes[0]), MP_EMIT_ATTR_DELETE);
2013-10-04 19:53:11 +01:00
} else {
goto cannot_delete;
2013-10-04 19:53:11 +01:00
}
} else {
goto cannot_delete;
2013-10-04 19:53:11 +01:00
}
} else if (MP_PARSE_NODE_IS_STRUCT_KIND(pn, PN_atom_paren)) {
pn = ((mp_parse_node_struct_t *)pn)->nodes[0];
if (MP_PARSE_NODE_IS_NULL(pn)) {
goto cannot_delete;
} else {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pn, PN_testlist_comp));
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
if (MP_PARSE_NODE_TESTLIST_COMP_HAS_COMP_FOR(pns)) {
goto cannot_delete;
}
for (size_t i = 0; i < MP_PARSE_NODE_STRUCT_NUM_NODES(pns); ++i) {
c_del_stmt(comp, pns->nodes[i]);
2013-10-04 19:53:11 +01:00
}
}
} else {
2017-05-29 08:08:14 +01:00
// some arbitrary statement that we can't delete (eg del 1)
goto cannot_delete;
2013-10-04 19:53:11 +01:00
}
return;
cannot_delete:
compile_syntax_error(comp, (mp_parse_node_t)pn, MP_ERROR_TEXT("can't delete expression"));
2013-10-04 19:53:11 +01:00
}
STATIC void compile_del_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-10-04 19:53:11 +01:00
apply_to_single_or_list(comp, pns->nodes[0], PN_exprlist, c_del_stmt);
}
STATIC void compile_break_cont_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
uint16_t label;
if (MP_PARSE_NODE_STRUCT_KIND(pns) == PN_break_stmt) {
label = comp->break_label;
} else {
label = comp->continue_label;
2013-10-04 19:53:11 +01:00
}
if (label == INVALID_LABEL) {
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("'break'/'continue' outside loop"));
2013-10-04 19:53:11 +01:00
}
assert(comp->cur_except_level >= comp->break_continue_except_level);
EMIT_ARG(unwind_jump, label, comp->cur_except_level - comp->break_continue_except_level);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_return_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
#if MICROPY_CPYTHON_COMPAT
2013-10-18 19:58:12 +01:00
if (comp->scope_cur->kind != SCOPE_FUNCTION) {
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("'return' outside function"));
2013-10-18 19:58:12 +01:00
return;
}
#endif
if (MP_PARSE_NODE_IS_NULL(pns->nodes[0])) {
2013-10-18 19:58:12 +01:00
// no argument to 'return', so return None
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
} else if (MICROPY_COMP_RETURN_IF_EXPR
&& MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_test_if_expr)) {
2013-10-04 19:53:11 +01:00
// special case when returning an if-expression; to match CPython optimisation
mp_parse_node_struct_t *pns_test_if_expr = (mp_parse_node_struct_t *)pns->nodes[0];
mp_parse_node_struct_t *pns_test_if_else = (mp_parse_node_struct_t *)pns_test_if_expr->nodes[1];
2013-10-04 19:53:11 +01:00
uint l_fail = comp_next_label(comp);
2013-10-04 19:53:11 +01:00
c_if_cond(comp, pns_test_if_else->nodes[0], false, l_fail); // condition
compile_node(comp, pns_test_if_expr->nodes[0]); // success value
EMIT(return_value);
EMIT_ARG(label_assign, l_fail);
2013-10-04 19:53:11 +01:00
compile_node(comp, pns_test_if_else->nodes[1]); // failure value
} else {
compile_node(comp, pns->nodes[0]);
}
EMIT(return_value);
}
STATIC void compile_yield_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[0]);
EMIT(pop_top);
}
STATIC void compile_raise_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
if (MP_PARSE_NODE_IS_NULL(pns->nodes[0])) {
2013-10-04 19:53:11 +01:00
// raise
EMIT_ARG(raise_varargs, 0);
} else if (MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_raise_stmt_arg)) {
2013-10-04 19:53:11 +01:00
// raise x from y
pns = (mp_parse_node_struct_t *)pns->nodes[0];
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[0]);
compile_node(comp, pns->nodes[1]);
EMIT_ARG(raise_varargs, 2);
2013-10-04 19:53:11 +01:00
} else {
// raise x
compile_node(comp, pns->nodes[0]);
EMIT_ARG(raise_varargs, 1);
2013-10-04 19:53:11 +01:00
}
}
// q_base holds the base of the name
// eg a -> q_base=a
// a.b.c -> q_base=a
STATIC void do_import_name(compiler_t *comp, mp_parse_node_t pn, qstr *q_base) {
2013-10-04 19:53:11 +01:00
bool is_as = false;
if (MP_PARSE_NODE_IS_STRUCT_KIND(pn, PN_dotted_as_name)) {
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
2013-10-04 19:53:11 +01:00
// a name of the form x as y; unwrap it
*q_base = MP_PARSE_NODE_LEAF_ARG(pns->nodes[1]);
2013-10-04 19:53:11 +01:00
pn = pns->nodes[0];
is_as = true;
}
if (MP_PARSE_NODE_IS_NULL(pn)) {
// empty name (eg, from . import x)
*q_base = MP_QSTR_;
EMIT_ARG(import, MP_QSTR_, MP_EMIT_IMPORT_NAME); // import the empty string
} else if (MP_PARSE_NODE_IS_ID(pn)) {
2013-10-04 19:53:11 +01:00
// just a simple name
qstr q_full = MP_PARSE_NODE_LEAF_ARG(pn);
2013-10-04 19:53:11 +01:00
if (!is_as) {
*q_base = q_full;
2013-10-04 19:53:11 +01:00
}
EMIT_ARG(import, q_full, MP_EMIT_IMPORT_NAME);
} else {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pn, PN_dotted_name)); // should be
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
{
2013-10-04 19:53:11 +01:00
// a name of the form a.b.c
if (!is_as) {
*q_base = MP_PARSE_NODE_LEAF_ARG(pns->nodes[0]);
2013-10-04 19:53:11 +01:00
}
int n = MP_PARSE_NODE_STRUCT_NUM_NODES(pns);
2013-10-04 19:53:11 +01:00
int len = n - 1;
for (int i = 0; i < n; i++) {
len += qstr_len(MP_PARSE_NODE_LEAF_ARG(pns->nodes[i]));
2013-10-04 19:53:11 +01:00
}
char *q_ptr = mp_local_alloc(len);
char *str_dest = q_ptr;
2013-10-04 19:53:11 +01:00
for (int i = 0; i < n; i++) {
if (i > 0) {
*str_dest++ = '.';
2013-10-04 19:53:11 +01:00
}
size_t str_src_len;
const byte *str_src = qstr_data(MP_PARSE_NODE_LEAF_ARG(pns->nodes[i]), &str_src_len);
memcpy(str_dest, str_src, str_src_len);
str_dest += str_src_len;
2013-10-04 19:53:11 +01:00
}
qstr q_full = qstr_from_strn(q_ptr, len);
mp_local_free(q_ptr);
EMIT_ARG(import, q_full, MP_EMIT_IMPORT_NAME);
2013-10-04 19:53:11 +01:00
if (is_as) {
for (int i = 1; i < n; i++) {
EMIT_ARG(attr, MP_PARSE_NODE_LEAF_ARG(pns->nodes[i]), MP_EMIT_ATTR_LOAD);
2013-10-04 19:53:11 +01:00
}
}
}
}
}
STATIC void compile_dotted_as_name(compiler_t *comp, mp_parse_node_t pn) {
EMIT_ARG(load_const_small_int, 0); // level 0 import
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE); // not importing from anything
qstr q_base;
do_import_name(comp, pn, &q_base);
compile_store_id(comp, q_base);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_import_name(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-10-04 19:53:11 +01:00
apply_to_single_or_list(comp, pns->nodes[0], PN_dotted_as_names, compile_dotted_as_name);
}
STATIC void compile_import_from(compiler_t *comp, mp_parse_node_struct_t *pns) {
mp_parse_node_t pn_import_source = pns->nodes[0];
2017-05-29 08:08:14 +01:00
// extract the preceding .'s (if any) for a relative import, to compute the import level
uint import_level = 0;
do {
mp_parse_node_t pn_rel;
if (MP_PARSE_NODE_IS_TOKEN(pn_import_source) || MP_PARSE_NODE_IS_STRUCT_KIND(pn_import_source, PN_one_or_more_period_or_ellipsis)) {
// This covers relative imports with dots only like "from .. import"
pn_rel = pn_import_source;
pn_import_source = MP_PARSE_NODE_NULL;
} else if (MP_PARSE_NODE_IS_STRUCT_KIND(pn_import_source, PN_import_from_2b)) {
// This covers relative imports starting with dot(s) like "from .foo import"
mp_parse_node_struct_t *pns_2b = (mp_parse_node_struct_t *)pn_import_source;
pn_rel = pns_2b->nodes[0];
pn_import_source = pns_2b->nodes[1];
assert(!MP_PARSE_NODE_IS_NULL(pn_import_source)); // should not be
} else {
// Not a relative import
break;
}
// get the list of . and/or ...'s
mp_parse_node_t *nodes;
size_t n = mp_parse_node_extract_list(&pn_rel, PN_one_or_more_period_or_ellipsis, &nodes);
// count the total number of .'s
for (size_t i = 0; i < n; i++) {
if (MP_PARSE_NODE_IS_TOKEN_KIND(nodes[i], MP_TOKEN_DEL_PERIOD)) {
import_level++;
} else {
// should be an MP_TOKEN_ELLIPSIS
import_level += 3;
}
}
} while (0);
if (MP_PARSE_NODE_IS_TOKEN_KIND(pns->nodes[1], MP_TOKEN_OP_STAR)) {
#if MICROPY_CPYTHON_COMPAT
if (comp->scope_cur->kind != SCOPE_MODULE) {
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("import * not at module level"));
return;
}
#endif
EMIT_ARG(load_const_small_int, import_level);
2013-12-10 17:27:24 +00:00
// build the "fromlist" tuple
EMIT_ARG(load_const_str, MP_QSTR__star_);
EMIT_ARG(build, 1, MP_EMIT_BUILD_TUPLE);
2013-12-10 17:27:24 +00:00
// do the import
qstr dummy_q;
do_import_name(comp, pn_import_source, &dummy_q);
EMIT_ARG(import, MP_QSTRnull, MP_EMIT_IMPORT_STAR);
2013-12-10 17:27:24 +00:00
2013-10-04 19:53:11 +01:00
} else {
EMIT_ARG(load_const_small_int, import_level);
2013-12-10 17:27:24 +00:00
// build the "fromlist" tuple
mp_parse_node_t *pn_nodes;
size_t n = mp_parse_node_extract_list(&pns->nodes[1], PN_import_as_names, &pn_nodes);
for (size_t i = 0; i < n; i++) {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pn_nodes[i], PN_import_as_name));
mp_parse_node_struct_t *pns3 = (mp_parse_node_struct_t *)pn_nodes[i];
qstr id2 = MP_PARSE_NODE_LEAF_ARG(pns3->nodes[0]); // should be id
EMIT_ARG(load_const_str, id2);
2013-12-10 17:27:24 +00:00
}
EMIT_ARG(build, n, MP_EMIT_BUILD_TUPLE);
2013-12-10 17:27:24 +00:00
// do the import
qstr dummy_q;
do_import_name(comp, pn_import_source, &dummy_q);
for (size_t i = 0; i < n; i++) {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pn_nodes[i], PN_import_as_name));
mp_parse_node_struct_t *pns3 = (mp_parse_node_struct_t *)pn_nodes[i];
qstr id2 = MP_PARSE_NODE_LEAF_ARG(pns3->nodes[0]); // should be id
EMIT_ARG(import, id2, MP_EMIT_IMPORT_FROM);
if (MP_PARSE_NODE_IS_NULL(pns3->nodes[1])) {
compile_store_id(comp, id2);
2013-10-04 19:53:11 +01:00
} else {
compile_store_id(comp, MP_PARSE_NODE_LEAF_ARG(pns3->nodes[1]));
2013-10-04 19:53:11 +01:00
}
}
EMIT(pop_top);
}
}
STATIC void compile_declare_global(compiler_t *comp, mp_parse_node_t pn, id_info_t *id_info) {
if (id_info->kind != ID_INFO_KIND_UNDECIDED && id_info->kind != ID_INFO_KIND_GLOBAL_EXPLICIT) {
compile_syntax_error(comp, pn, MP_ERROR_TEXT("identifier redefined as global"));
return;
}
id_info->kind = ID_INFO_KIND_GLOBAL_EXPLICIT;
// if the id exists in the global scope, set its kind to EXPLICIT_GLOBAL
id_info = scope_find_global(comp->scope_cur, id_info->qst);
if (id_info != NULL) {
id_info->kind = ID_INFO_KIND_GLOBAL_EXPLICIT;
}
}
STATIC void compile_declare_nonlocal(compiler_t *comp, mp_parse_node_t pn, id_info_t *id_info) {
if (id_info->kind == ID_INFO_KIND_UNDECIDED) {
id_info->kind = ID_INFO_KIND_GLOBAL_IMPLICIT;
scope_check_to_close_over(comp->scope_cur, id_info);
if (id_info->kind == ID_INFO_KIND_GLOBAL_IMPLICIT) {
compile_syntax_error(comp, pn, MP_ERROR_TEXT("no binding for nonlocal found"));
}
} else if (id_info->kind != ID_INFO_KIND_FREE) {
compile_syntax_error(comp, pn, MP_ERROR_TEXT("identifier redefined as nonlocal"));
}
}
STATIC void compile_global_nonlocal_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
if (comp->pass == MP_PASS_SCOPE) {
bool is_global = MP_PARSE_NODE_STRUCT_KIND(pns) == PN_global_stmt;
if (!is_global && comp->scope_cur->kind == SCOPE_MODULE) {
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("can't declare nonlocal in outer code"));
return;
}
mp_parse_node_t *nodes;
size_t n = mp_parse_node_extract_list(&pns->nodes[0], PN_name_list, &nodes);
for (size_t i = 0; i < n; i++) {
qstr qst = MP_PARSE_NODE_LEAF_ARG(nodes[i]);
id_info_t *id_info = scope_find_or_add_id(comp->scope_cur, qst, ID_INFO_KIND_UNDECIDED);
if (is_global) {
compile_declare_global(comp, (mp_parse_node_t)pns, id_info);
} else {
compile_declare_nonlocal(comp, (mp_parse_node_t)pns, id_info);
}
2013-10-04 19:53:11 +01:00
}
}
}
STATIC void compile_assert_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
// with optimisations enabled we don't compile assertions
if (MP_STATE_VM(mp_optimise_value) != 0) {
return;
}
uint l_end = comp_next_label(comp);
2013-10-04 19:53:11 +01:00
c_if_cond(comp, pns->nodes[0], true, l_end);
EMIT_LOAD_GLOBAL(MP_QSTR_AssertionError); // we load_global instead of load_id, to be consistent with CPython
if (!MP_PARSE_NODE_IS_NULL(pns->nodes[1])) {
2013-10-04 19:53:11 +01:00
// assertion message
compile_node(comp, pns->nodes[1]);
EMIT_ARG(call_function, 1, 0, 0);
2013-10-04 19:53:11 +01:00
}
EMIT_ARG(raise_varargs, 1);
EMIT_ARG(label_assign, l_end);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_if_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
uint l_end = comp_next_label(comp);
2013-10-04 19:53:11 +01:00
unix-cpy: Remove unix-cpy. It's no longer needed. unix-cpy was originally written to get semantic equivalent with CPython without writing functional tests. When writing the initial implementation of uPy it was a long way between lexer and functional tests, so the half-way test was to make sure that the bytecode was correct. The idea was that if the uPy bytecode matched CPython 1-1 then uPy would be proper Python if the bytecodes acted correctly. And having matching bytecode meant that it was less likely to miss some deep subtlety in the Python semantics that would require an architectural change later on. But that is all history and it no longer makes sense to retain the ability to output CPython bytecode, because: 1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode changes from version to version, and seems to have changed quite a bit in 3.5. There's no point in changing the bytecode output to match CPython anymore. 2. uPy and CPy do different optimisations to the bytecode which makes it harder to match. 3. The bytecode tests are not run. They were never part of Travis and are not run locally anymore. 4. The EMIT_CPYTHON option needs a lot of extra source code which adds heaps of noise, especially in compile.c. 5. Now that there is an extensive test suite (which tests functionality) there is no need to match the bytecode. Some very subtle behaviour is tested with the test suite and passing these tests is a much better way to stay Python-language compliant, rather than trying to match CPy bytecode.
2015-08-14 12:24:11 +01:00
// optimisation: don't emit anything when "if False"
if (!mp_parse_node_is_const_false(pns->nodes[0])) {
uint l_fail = comp_next_label(comp);
c_if_cond(comp, pns->nodes[0], false, l_fail); // if condition
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[1]); // if block
unix-cpy: Remove unix-cpy. It's no longer needed. unix-cpy was originally written to get semantic equivalent with CPython without writing functional tests. When writing the initial implementation of uPy it was a long way between lexer and functional tests, so the half-way test was to make sure that the bytecode was correct. The idea was that if the uPy bytecode matched CPython 1-1 then uPy would be proper Python if the bytecodes acted correctly. And having matching bytecode meant that it was less likely to miss some deep subtlety in the Python semantics that would require an architectural change later on. But that is all history and it no longer makes sense to retain the ability to output CPython bytecode, because: 1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode changes from version to version, and seems to have changed quite a bit in 3.5. There's no point in changing the bytecode output to match CPython anymore. 2. uPy and CPy do different optimisations to the bytecode which makes it harder to match. 3. The bytecode tests are not run. They were never part of Travis and are not run locally anymore. 4. The EMIT_CPYTHON option needs a lot of extra source code which adds heaps of noise, especially in compile.c. 5. Now that there is an extensive test suite (which tests functionality) there is no need to match the bytecode. Some very subtle behaviour is tested with the test suite and passing these tests is a much better way to stay Python-language compliant, rather than trying to match CPy bytecode.
2015-08-14 12:24:11 +01:00
// optimisation: skip everything else when "if True"
if (mp_parse_node_is_const_true(pns->nodes[0])) {
goto done;
}
if (
unix-cpy: Remove unix-cpy. It's no longer needed. unix-cpy was originally written to get semantic equivalent with CPython without writing functional tests. When writing the initial implementation of uPy it was a long way between lexer and functional tests, so the half-way test was to make sure that the bytecode was correct. The idea was that if the uPy bytecode matched CPython 1-1 then uPy would be proper Python if the bytecodes acted correctly. And having matching bytecode meant that it was less likely to miss some deep subtlety in the Python semantics that would require an architectural change later on. But that is all history and it no longer makes sense to retain the ability to output CPython bytecode, because: 1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode changes from version to version, and seems to have changed quite a bit in 3.5. There's no point in changing the bytecode output to match CPython anymore. 2. uPy and CPy do different optimisations to the bytecode which makes it harder to match. 3. The bytecode tests are not run. They were never part of Travis and are not run locally anymore. 4. The EMIT_CPYTHON option needs a lot of extra source code which adds heaps of noise, especially in compile.c. 5. Now that there is an extensive test suite (which tests functionality) there is no need to match the bytecode. Some very subtle behaviour is tested with the test suite and passing these tests is a much better way to stay Python-language compliant, rather than trying to match CPy bytecode.
2015-08-14 12:24:11 +01:00
// optimisation: don't jump over non-existent elif/else blocks
!(MP_PARSE_NODE_IS_NULL(pns->nodes[2]) && MP_PARSE_NODE_IS_NULL(pns->nodes[3]))
// optimisation: don't jump if last instruction was return
&& !EMIT(last_emit_was_return_value)
) {
// jump over elif/else blocks
EMIT_ARG(jump, l_end);
}
EMIT_ARG(label_assign, l_fail);
}
2013-10-04 19:53:11 +01:00
// compile elif blocks (if any)
mp_parse_node_t *pn_elif;
size_t n_elif = mp_parse_node_extract_list(&pns->nodes[2], PN_if_stmt_elif_list, &pn_elif);
for (size_t i = 0; i < n_elif; i++) {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pn_elif[i], PN_if_stmt_elif)); // should be
mp_parse_node_struct_t *pns_elif = (mp_parse_node_struct_t *)pn_elif[i];
unix-cpy: Remove unix-cpy. It's no longer needed. unix-cpy was originally written to get semantic equivalent with CPython without writing functional tests. When writing the initial implementation of uPy it was a long way between lexer and functional tests, so the half-way test was to make sure that the bytecode was correct. The idea was that if the uPy bytecode matched CPython 1-1 then uPy would be proper Python if the bytecodes acted correctly. And having matching bytecode meant that it was less likely to miss some deep subtlety in the Python semantics that would require an architectural change later on. But that is all history and it no longer makes sense to retain the ability to output CPython bytecode, because: 1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode changes from version to version, and seems to have changed quite a bit in 3.5. There's no point in changing the bytecode output to match CPython anymore. 2. uPy and CPy do different optimisations to the bytecode which makes it harder to match. 3. The bytecode tests are not run. They were never part of Travis and are not run locally anymore. 4. The EMIT_CPYTHON option needs a lot of extra source code which adds heaps of noise, especially in compile.c. 5. Now that there is an extensive test suite (which tests functionality) there is no need to match the bytecode. Some very subtle behaviour is tested with the test suite and passing these tests is a much better way to stay Python-language compliant, rather than trying to match CPy bytecode.
2015-08-14 12:24:11 +01:00
// optimisation: don't emit anything when "if False"
if (!mp_parse_node_is_const_false(pns_elif->nodes[0])) {
uint l_fail = comp_next_label(comp);
c_if_cond(comp, pns_elif->nodes[0], false, l_fail); // elif condition
compile_node(comp, pns_elif->nodes[1]); // elif block
unix-cpy: Remove unix-cpy. It's no longer needed. unix-cpy was originally written to get semantic equivalent with CPython without writing functional tests. When writing the initial implementation of uPy it was a long way between lexer and functional tests, so the half-way test was to make sure that the bytecode was correct. The idea was that if the uPy bytecode matched CPython 1-1 then uPy would be proper Python if the bytecodes acted correctly. And having matching bytecode meant that it was less likely to miss some deep subtlety in the Python semantics that would require an architectural change later on. But that is all history and it no longer makes sense to retain the ability to output CPython bytecode, because: 1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode changes from version to version, and seems to have changed quite a bit in 3.5. There's no point in changing the bytecode output to match CPython anymore. 2. uPy and CPy do different optimisations to the bytecode which makes it harder to match. 3. The bytecode tests are not run. They were never part of Travis and are not run locally anymore. 4. The EMIT_CPYTHON option needs a lot of extra source code which adds heaps of noise, especially in compile.c. 5. Now that there is an extensive test suite (which tests functionality) there is no need to match the bytecode. Some very subtle behaviour is tested with the test suite and passing these tests is a much better way to stay Python-language compliant, rather than trying to match CPy bytecode.
2015-08-14 12:24:11 +01:00
// optimisation: skip everything else when "elif True"
if (mp_parse_node_is_const_true(pns_elif->nodes[0])) {
goto done;
}
// optimisation: don't jump if last instruction was return
if (!EMIT(last_emit_was_return_value)) {
EMIT_ARG(jump, l_end);
}
EMIT_ARG(label_assign, l_fail);
2013-10-04 19:53:11 +01:00
}
}
// compile else block
compile_node(comp, pns->nodes[3]); // can be null
done:
EMIT_ARG(label_assign, l_end);
2013-10-04 19:53:11 +01:00
}
#define START_BREAK_CONTINUE_BLOCK \
uint16_t old_break_label = comp->break_label; \
uint16_t old_continue_label = comp->continue_label; \
uint16_t old_break_continue_except_level = comp->break_continue_except_level; \
uint break_label = comp_next_label(comp); \
uint continue_label = comp_next_label(comp); \
comp->break_label = break_label; \
comp->continue_label = continue_label; \
comp->break_continue_except_level = comp->cur_except_level;
2013-10-04 19:53:11 +01:00
#define END_BREAK_CONTINUE_BLOCK \
comp->break_label = old_break_label; \
comp->continue_label = old_continue_label; \
comp->break_continue_except_level = old_break_continue_except_level;
2013-10-04 19:53:11 +01:00
STATIC void compile_while_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
START_BREAK_CONTINUE_BLOCK
2013-10-04 19:53:11 +01:00
if (!mp_parse_node_is_const_false(pns->nodes[0])) { // optimisation: don't emit anything for "while False"
uint top_label = comp_next_label(comp);
if (!mp_parse_node_is_const_true(pns->nodes[0])) { // optimisation: don't jump to cond for "while True"
EMIT_ARG(jump, continue_label);
}
EMIT_ARG(label_assign, top_label);
compile_node(comp, pns->nodes[1]); // body
EMIT_ARG(label_assign, continue_label);
c_if_cond(comp, pns->nodes[0], true, top_label); // condition
}
// break/continue apply to outer loop (if any) in the else block
END_BREAK_CONTINUE_BLOCK
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[2]); // else
EMIT_ARG(label_assign, break_label);
2013-10-04 19:53:11 +01:00
}
// This function compiles an optimised for-loop of the form:
// for <var> in range(<start>, <end>, <step>):
// <body>
// else:
// <else>
// <var> must be an identifier and <step> must be a small-int.
//
// Semantics of for-loop require:
// - final failing value should not be stored in the loop variable
// - if the loop never runs, the loop variable should never be assigned
// - assignments to <var>, <end> or <step> in the body do not alter the loop
// (<step> is a constant for us, so no need to worry about it changing)
//
// If <end> is a small-int, then the stack during the for-loop contains just
// the current value of <var>. Otherwise, the stack contains <end> then the
// current value of <var>.
STATIC void compile_for_stmt_optimised_range(compiler_t *comp, mp_parse_node_t pn_var, mp_parse_node_t pn_start, mp_parse_node_t pn_end, mp_parse_node_t pn_step, mp_parse_node_t pn_body, mp_parse_node_t pn_else) {
START_BREAK_CONTINUE_BLOCK
2013-11-06 20:20:49 +00:00
uint top_label = comp_next_label(comp);
uint entry_label = comp_next_label(comp);
2013-11-06 20:20:49 +00:00
// put the end value on the stack if it's not a small-int constant
bool end_on_stack = !MP_PARSE_NODE_IS_SMALL_INT(pn_end);
if (end_on_stack) {
compile_node(comp, pn_end);
}
// compile: start
2013-11-06 20:20:49 +00:00
compile_node(comp, pn_start);
EMIT_ARG(jump, entry_label);
EMIT_ARG(label_assign, top_label);
2013-11-06 20:20:49 +00:00
// duplicate next value and store it to var
EMIT(dup_top);
c_assign(comp, pn_var, ASSIGN_STORE);
// compile body
compile_node(comp, pn_body);
EMIT_ARG(label_assign, continue_label);
// compile: var + step
2013-11-06 20:20:49 +00:00
compile_node(comp, pn_step);
EMIT_ARG(binary_op, MP_BINARY_OP_INPLACE_ADD);
2013-11-06 20:20:49 +00:00
EMIT_ARG(label_assign, entry_label);
2013-11-06 20:20:49 +00:00
// compile: if var <cond> end: goto top
if (end_on_stack) {
EMIT(dup_top_two);
EMIT(rot_two);
} else {
EMIT(dup_top);
compile_node(comp, pn_end);
}
assert(MP_PARSE_NODE_IS_SMALL_INT(pn_step));
if (MP_PARSE_NODE_LEAF_SMALL_INT(pn_step) >= 0) {
EMIT_ARG(binary_op, MP_BINARY_OP_LESS);
} else {
EMIT_ARG(binary_op, MP_BINARY_OP_MORE);
}
EMIT_ARG(pop_jump_if, true, top_label);
2013-11-06 20:20:49 +00:00
// break/continue apply to outer loop (if any) in the else block
END_BREAK_CONTINUE_BLOCK
2013-11-06 20:20:49 +00:00
// Compile the else block. We must pop the iterator variables before
// executing the else code because it may contain break/continue statements.
uint end_label = 0;
if (!MP_PARSE_NODE_IS_NULL(pn_else)) {
// discard final value of "var", and possible "end" value
EMIT(pop_top);
if (end_on_stack) {
EMIT(pop_top);
}
compile_node(comp, pn_else);
end_label = comp_next_label(comp);
EMIT_ARG(jump, end_label);
EMIT_ARG(adjust_stack_size, 1 + end_on_stack);
}
2013-11-06 20:20:49 +00:00
EMIT_ARG(label_assign, break_label);
// discard final value of var that failed the loop condition
EMIT(pop_top);
// discard <end> value if it's on the stack
if (end_on_stack) {
EMIT(pop_top);
}
if (!MP_PARSE_NODE_IS_NULL(pn_else)) {
EMIT_ARG(label_assign, end_label);
}
2013-11-06 20:20:49 +00:00
}
STATIC void compile_for_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-11-06 20:20:49 +00:00
// this bit optimises: for <x> in range(...), turning it into an explicitly incremented variable
// this is actually slower, but uses no heap memory
// for viper it will be much, much faster
if (/*comp->scope_cur->emit_options == MP_EMIT_OPT_VIPER &&*/ MP_PARSE_NODE_IS_ID(pns->nodes[0]) && MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[1], PN_atom_expr_normal)) {
mp_parse_node_struct_t *pns_it = (mp_parse_node_struct_t *)pns->nodes[1];
if (MP_PARSE_NODE_IS_ID(pns_it->nodes[0])
&& MP_PARSE_NODE_LEAF_ARG(pns_it->nodes[0]) == MP_QSTR_range
&& MP_PARSE_NODE_STRUCT_KIND((mp_parse_node_struct_t *)pns_it->nodes[1]) == PN_trailer_paren) {
mp_parse_node_t pn_range_args = ((mp_parse_node_struct_t *)pns_it->nodes[1])->nodes[0];
mp_parse_node_t *args;
size_t n_args = mp_parse_node_extract_list(&pn_range_args, PN_arglist, &args);
mp_parse_node_t pn_range_start;
mp_parse_node_t pn_range_end;
mp_parse_node_t pn_range_step;
bool optimize = false;
2013-11-06 20:20:49 +00:00
if (1 <= n_args && n_args <= 3) {
optimize = true;
2013-11-06 20:20:49 +00:00
if (n_args == 1) {
pn_range_start = mp_parse_node_new_small_int(0);
2013-11-06 20:20:49 +00:00
pn_range_end = args[0];
pn_range_step = mp_parse_node_new_small_int(1);
2013-11-06 20:20:49 +00:00
} else if (n_args == 2) {
pn_range_start = args[0];
pn_range_end = args[1];
pn_range_step = mp_parse_node_new_small_int(1);
2013-11-06 20:20:49 +00:00
} else {
pn_range_start = args[0];
pn_range_end = args[1];
pn_range_step = args[2];
// the step must be a non-zero constant integer to do the optimisation
if (!MP_PARSE_NODE_IS_SMALL_INT(pn_range_step)
|| MP_PARSE_NODE_LEAF_SMALL_INT(pn_range_step) == 0) {
optimize = false;
}
2013-11-06 20:20:49 +00:00
}
// arguments must be able to be compiled as standard expressions
if (optimize && MP_PARSE_NODE_IS_STRUCT(pn_range_start)) {
int k = MP_PARSE_NODE_STRUCT_KIND((mp_parse_node_struct_t *)pn_range_start);
if (k == PN_arglist_star || k == PN_arglist_dbl_star || k == PN_argument) {
optimize = false;
}
}
if (optimize && MP_PARSE_NODE_IS_STRUCT(pn_range_end)) {
int k = MP_PARSE_NODE_STRUCT_KIND((mp_parse_node_struct_t *)pn_range_end);
if (k == PN_arglist_star || k == PN_arglist_dbl_star || k == PN_argument) {
optimize = false;
}
}
}
if (optimize) {
2013-11-06 20:20:49 +00:00
compile_for_stmt_optimised_range(comp, pns->nodes[0], pn_range_start, pn_range_end, pn_range_step, pns->nodes[2], pns->nodes[3]);
return;
}
}
}
START_BREAK_CONTINUE_BLOCK
comp->break_label |= MP_EMIT_BREAK_FROM_FOR;
2013-10-04 19:53:11 +01:00
uint pop_label = comp_next_label(comp);
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[1]); // iterator
EMIT_ARG(get_iter, true);
EMIT_ARG(label_assign, continue_label);
EMIT_ARG(for_iter, pop_label);
2013-10-04 19:53:11 +01:00
c_assign(comp, pns->nodes[0], ASSIGN_STORE); // variable
compile_node(comp, pns->nodes[2]); // body
if (!EMIT(last_emit_was_return_value)) {
EMIT_ARG(jump, continue_label);
2013-10-04 19:53:11 +01:00
}
EMIT_ARG(label_assign, pop_label);
EMIT(for_iter_end);
2013-10-04 19:53:11 +01:00
// break/continue apply to outer loop (if any) in the else block
END_BREAK_CONTINUE_BLOCK
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[3]); // else (may be empty)
2013-10-04 19:53:11 +01:00
EMIT_ARG(label_assign, break_label);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_try_except(compiler_t *comp, mp_parse_node_t pn_body, int n_except, mp_parse_node_t *pn_excepts, mp_parse_node_t pn_else) {
2013-10-04 19:53:11 +01:00
// setup code
uint l1 = comp_next_label(comp);
uint success_label = comp_next_label(comp);
compile_increase_except_level(comp, l1, MP_EMIT_SETUP_BLOCK_EXCEPT);
2013-10-04 19:53:11 +01:00
compile_node(comp, pn_body); // body
EMIT_ARG(pop_except_jump, success_label, false); // jump over exception handler
EMIT_ARG(label_assign, l1); // start of exception handler
EMIT(start_except_handler);
// at this point the top of the stack contains the exception instance that was raised
uint l2 = comp_next_label(comp);
2013-10-04 19:53:11 +01:00
for (int i = 0; i < n_except; i++) {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pn_excepts[i], PN_try_stmt_except)); // should be
mp_parse_node_struct_t *pns_except = (mp_parse_node_struct_t *)pn_excepts[i];
2013-10-04 19:53:11 +01:00
qstr qstr_exception_local = 0;
uint end_finally_label = comp_next_label(comp);
#if MICROPY_PY_SYS_SETTRACE
EMIT_ARG(set_source_line, pns_except->source_line);
#endif
2013-10-04 19:53:11 +01:00
if (MP_PARSE_NODE_IS_NULL(pns_except->nodes[0])) {
2013-10-04 19:53:11 +01:00
// this is a catch all exception handler
if (i + 1 != n_except) {
compile_syntax_error(comp, pn_excepts[i], MP_ERROR_TEXT("default 'except' must be last"));
compile_decrease_except_level(comp);
2013-10-04 19:53:11 +01:00
return;
}
} else {
// this exception handler requires a match to a certain type of exception
mp_parse_node_t pns_exception_expr = pns_except->nodes[0];
if (MP_PARSE_NODE_IS_STRUCT(pns_exception_expr)) {
mp_parse_node_struct_t *pns3 = (mp_parse_node_struct_t *)pns_exception_expr;
if (MP_PARSE_NODE_STRUCT_KIND(pns3) == PN_try_stmt_as_name) {
2013-10-04 19:53:11 +01:00
// handler binds the exception to a local
pns_exception_expr = pns3->nodes[0];
qstr_exception_local = MP_PARSE_NODE_LEAF_ARG(pns3->nodes[1]);
2013-10-04 19:53:11 +01:00
}
}
EMIT(dup_top);
compile_node(comp, pns_exception_expr);
EMIT_ARG(binary_op, MP_BINARY_OP_EXCEPTION_MATCH);
EMIT_ARG(pop_jump_if, false, end_finally_label);
2013-10-04 19:53:11 +01:00
}
// either discard or store the exception instance
2013-10-04 19:53:11 +01:00
if (qstr_exception_local == 0) {
EMIT(pop_top);
} else {
compile_store_id(comp, qstr_exception_local);
2013-10-04 19:53:11 +01:00
}
// If the exception is bound to a variable <e> then the <body> of the
// exception handler is wrapped in a try-finally so that the name <e> can
// be deleted (per Python semantics) even if the <body> has an exception.
// In such a case the generated code for the exception handler is:
// try:
// <body>
// finally:
// <e> = None
// del <e>
uint l3 = 0;
2013-10-04 19:53:11 +01:00
if (qstr_exception_local != 0) {
l3 = comp_next_label(comp);
compile_increase_except_level(comp, l3, MP_EMIT_SETUP_BLOCK_FINALLY);
2013-10-04 19:53:11 +01:00
}
compile_node(comp, pns_except->nodes[1]); // the <body>
2013-10-04 19:53:11 +01:00
if (qstr_exception_local != 0) {
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
EMIT_ARG(label_assign, l3);
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
compile_store_id(comp, qstr_exception_local);
compile_delete_id(comp, qstr_exception_local);
compile_decrease_except_level(comp);
2013-10-04 19:53:11 +01:00
}
EMIT_ARG(pop_except_jump, l2, true);
EMIT_ARG(label_assign, end_finally_label);
EMIT_ARG(adjust_stack_size, 1); // stack adjust for the exception instance
2013-10-04 19:53:11 +01:00
}
compile_decrease_except_level(comp);
EMIT(end_except_handler);
EMIT_ARG(label_assign, success_label);
2013-10-04 19:53:11 +01:00
compile_node(comp, pn_else); // else block, can be null
EMIT_ARG(label_assign, l2);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_try_finally(compiler_t *comp, mp_parse_node_t pn_body, int n_except, mp_parse_node_t *pn_except, mp_parse_node_t pn_else, mp_parse_node_t pn_finally) {
uint l_finally_block = comp_next_label(comp);
compile_increase_except_level(comp, l_finally_block, MP_EMIT_SETUP_BLOCK_FINALLY);
2013-10-04 19:53:11 +01:00
if (n_except == 0) {
assert(MP_PARSE_NODE_IS_NULL(pn_else));
EMIT_ARG(adjust_stack_size, 3); // stack adjust for possible UNWIND_JUMP state
2013-10-04 19:53:11 +01:00
compile_node(comp, pn_body);
EMIT_ARG(adjust_stack_size, -3);
2013-10-04 19:53:11 +01:00
} else {
compile_try_except(comp, pn_body, n_except, pn_except, pn_else);
}
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
EMIT_ARG(label_assign, l_finally_block);
2013-10-04 19:53:11 +01:00
compile_node(comp, pn_finally);
compile_decrease_except_level(comp);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_try_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
assert(MP_PARSE_NODE_IS_STRUCT(pns->nodes[1])); // should be
{
mp_parse_node_struct_t *pns2 = (mp_parse_node_struct_t *)pns->nodes[1];
if (MP_PARSE_NODE_STRUCT_KIND(pns2) == PN_try_stmt_finally) {
2013-10-04 19:53:11 +01:00
// just try-finally
compile_try_finally(comp, pns->nodes[0], 0, NULL, MP_PARSE_NODE_NULL, pns2->nodes[0]);
} else if (MP_PARSE_NODE_STRUCT_KIND(pns2) == PN_try_stmt_except_and_more) {
2013-10-04 19:53:11 +01:00
// try-except and possibly else and/or finally
mp_parse_node_t *pn_excepts;
size_t n_except = mp_parse_node_extract_list(&pns2->nodes[0], PN_try_stmt_except_list, &pn_excepts);
if (MP_PARSE_NODE_IS_NULL(pns2->nodes[2])) {
2013-10-04 19:53:11 +01:00
// no finally
compile_try_except(comp, pns->nodes[0], n_except, pn_excepts, pns2->nodes[1]);
} else {
// have finally
compile_try_finally(comp, pns->nodes[0], n_except, pn_excepts, pns2->nodes[1], ((mp_parse_node_struct_t *)pns2->nodes[2])->nodes[0]);
2013-10-04 19:53:11 +01:00
}
} else {
// just try-except
mp_parse_node_t *pn_excepts;
size_t n_except = mp_parse_node_extract_list(&pns->nodes[1], PN_try_stmt_except_list, &pn_excepts);
compile_try_except(comp, pns->nodes[0], n_except, pn_excepts, MP_PARSE_NODE_NULL);
2013-10-04 19:53:11 +01:00
}
}
}
STATIC void compile_with_stmt_helper(compiler_t *comp, size_t n, mp_parse_node_t *nodes, mp_parse_node_t body) {
2013-10-04 19:53:11 +01:00
if (n == 0) {
// no more pre-bits, compile the body of the with
compile_node(comp, body);
} else {
uint l_end = comp_next_label(comp);
if (MP_PARSE_NODE_IS_STRUCT_KIND(nodes[0], PN_with_item)) {
2013-10-04 19:53:11 +01:00
// this pre-bit is of the form "a as b"
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)nodes[0];
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[0]);
compile_increase_except_level(comp, l_end, MP_EMIT_SETUP_BLOCK_WITH);
2013-10-04 19:53:11 +01:00
c_assign(comp, pns->nodes[1], ASSIGN_STORE);
} else {
// this pre-bit is just an expression
compile_node(comp, nodes[0]);
compile_increase_except_level(comp, l_end, MP_EMIT_SETUP_BLOCK_WITH);
2013-10-04 19:53:11 +01:00
EMIT(pop_top);
}
// compile additional pre-bits and the body
compile_with_stmt_helper(comp, n - 1, nodes + 1, body);
// finish this with block
EMIT_ARG(with_cleanup, l_end);
reserve_labels_for_native(comp, 3); // used by native's with_cleanup
compile_decrease_except_level(comp);
2013-10-04 19:53:11 +01:00
}
}
STATIC void compile_with_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-10-04 19:53:11 +01:00
// get the nodes for the pre-bit of the with (the a as b, c as d, ... bit)
mp_parse_node_t *nodes;
size_t n = mp_parse_node_extract_list(&pns->nodes[0], PN_with_stmt_list, &nodes);
2013-10-04 19:53:11 +01:00
assert(n > 0);
// compile in a nested fashion
compile_with_stmt_helper(comp, n, nodes, pns->nodes[1]);
}
STATIC void compile_yield_from(compiler_t *comp) {
EMIT_ARG(get_iter, false);
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
EMIT_ARG(yield, MP_EMIT_YIELD_FROM);
reserve_labels_for_native(comp, 3);
}
#if MICROPY_PY_ASYNC_AWAIT
STATIC void compile_await_object_method(compiler_t *comp, qstr method) {
EMIT_ARG(load_method, method, false);
EMIT_ARG(call_method, 0, 0, 0);
compile_yield_from(comp);
}
STATIC void compile_async_for_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
// comp->break_label |= MP_EMIT_BREAK_FROM_FOR;
qstr context = MP_PARSE_NODE_LEAF_ARG(pns->nodes[1]);
uint while_else_label = comp_next_label(comp);
uint try_exception_label = comp_next_label(comp);
uint try_else_label = comp_next_label(comp);
uint try_finally_label = comp_next_label(comp);
compile_node(comp, pns->nodes[1]); // iterator
EMIT_ARG(load_method, MP_QSTR___aiter__, false);
EMIT_ARG(call_method, 0, 0, 0);
compile_store_id(comp, context);
START_BREAK_CONTINUE_BLOCK
EMIT_ARG(label_assign, continue_label);
compile_increase_except_level(comp, try_exception_label, MP_EMIT_SETUP_BLOCK_EXCEPT);
compile_load_id(comp, context);
compile_await_object_method(comp, MP_QSTR___anext__);
c_assign(comp, pns->nodes[0], ASSIGN_STORE); // variable
EMIT_ARG(pop_except_jump, try_else_label, false);
EMIT_ARG(label_assign, try_exception_label);
EMIT(start_except_handler);
EMIT(dup_top);
EMIT_LOAD_GLOBAL(MP_QSTR_StopAsyncIteration);
EMIT_ARG(binary_op, MP_BINARY_OP_EXCEPTION_MATCH);
EMIT_ARG(pop_jump_if, false, try_finally_label);
EMIT(pop_top); // pop exception instance
EMIT_ARG(pop_except_jump, while_else_label, true);
EMIT_ARG(label_assign, try_finally_label);
EMIT_ARG(adjust_stack_size, 1); // if we jump here, the exc is on the stack
compile_decrease_except_level(comp);
EMIT(end_except_handler);
EMIT_ARG(label_assign, try_else_label);
compile_node(comp, pns->nodes[2]); // body
EMIT_ARG(jump, continue_label);
// break/continue apply to outer loop (if any) in the else block
END_BREAK_CONTINUE_BLOCK
EMIT_ARG(label_assign, while_else_label);
compile_node(comp, pns->nodes[3]); // else
EMIT_ARG(label_assign, break_label);
}
STATIC void compile_async_with_stmt_helper(compiler_t *comp, size_t n, mp_parse_node_t *nodes, mp_parse_node_t body) {
if (n == 0) {
// no more pre-bits, compile the body of the with
compile_node(comp, body);
} else {
uint l_finally_block = comp_next_label(comp);
uint l_aexit_no_exc = comp_next_label(comp);
uint l_ret_unwind_jump = comp_next_label(comp);
uint l_end = comp_next_label(comp);
if (MP_PARSE_NODE_IS_STRUCT_KIND(nodes[0], PN_with_item)) {
// this pre-bit is of the form "a as b"
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)nodes[0];
compile_node(comp, pns->nodes[0]);
EMIT(dup_top);
compile_await_object_method(comp, MP_QSTR___aenter__);
c_assign(comp, pns->nodes[1], ASSIGN_STORE);
} else {
// this pre-bit is just an expression
compile_node(comp, nodes[0]);
EMIT(dup_top);
compile_await_object_method(comp, MP_QSTR___aenter__);
EMIT(pop_top);
}
// To keep the Python stack size down, and because we can't access values on
// this stack further down than 3 elements (via rot_three), we don't preload
// __aexit__ (as per normal with) but rather wait until we need it below.
// Start the try-finally statement
compile_increase_except_level(comp, l_finally_block, MP_EMIT_SETUP_BLOCK_FINALLY);
// Compile any additional pre-bits of the "async with", and also the body
EMIT_ARG(adjust_stack_size, 3); // stack adjust for possible UNWIND_JUMP state
compile_async_with_stmt_helper(comp, n - 1, nodes + 1, body);
EMIT_ARG(adjust_stack_size, -3);
py: Fix VM crash with unwinding jump out of a finally block. This patch fixes a bug in the VM when breaking within a try-finally. The bug has to do with executing a break within the finally block of a try-finally statement. For example: def f(): for x in (1,): print('a', x) try: raise Exception finally: print(1) break print('b', x) f() Currently in uPy the above code will print: a 1 1 1 segmentation fault (core dumped) micropython Not only is there a seg fault, but the "1" in the finally block is printed twice. This is because when the VM executes a finally block it doesn't really know if that block was executed due to a fall-through of the try (no exception raised), or because an exception is active. In particular, for nested finallys the VM has no idea which of the nested ones have active exceptions and which are just fall-throughs. So when a break (or continue) is executed it tries to unwind all of the finallys, when in fact only some may be active. It's questionable whether break (or return or continue) should be allowed within a finally block, because they implicitly swallow any active exception, but nevertheless it's allowed by CPython (although almost never used in the standard library). And uPy should at least not crash in such a case. The solution here relies on the fact that exception and finally handlers always appear in the bytecode after the try body. Note: there was a similar bug with a return in a finally block, but that was previously fixed in b735208403a54774f9fd3d966f7c1a194c41870f
2019-01-02 06:48:43 +00:00
// We have now finished the "try" block and fall through to the "finally"
// At this point, after the with body has executed, we have 3 cases:
// 1. no exception, we just fall through to this point; stack: (..., ctx_mgr)
// 2. exception propagating out, we get to the finally block; stack: (..., ctx_mgr, exc)
// 3. return or unwind jump, we get to the finally block; stack: (..., ctx_mgr, X, INT)
// Handle case 1: call __aexit__
// Stack: (..., ctx_mgr)
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE); // to tell end_finally there's no exception
EMIT(rot_two);
EMIT_ARG(jump, l_aexit_no_exc); // jump to code below to call __aexit__
// Start of "finally" block
// At this point we have case 2 or 3, we detect which one by the TOS being an exception or not
EMIT_ARG(label_assign, l_finally_block);
// Detect if TOS an exception or not
EMIT(dup_top);
EMIT_LOAD_GLOBAL(MP_QSTR_BaseException);
EMIT_ARG(binary_op, MP_BINARY_OP_EXCEPTION_MATCH);
EMIT_ARG(pop_jump_if, false, l_ret_unwind_jump); // if not an exception then we have case 3
// Handle case 2: call __aexit__ and either swallow or re-raise the exception
// Stack: (..., ctx_mgr, exc)
EMIT(dup_top);
EMIT(rot_three);
EMIT(rot_two);
EMIT_ARG(load_method, MP_QSTR___aexit__, false);
EMIT(rot_three);
EMIT(rot_three);
EMIT(dup_top);
#if MICROPY_CPYTHON_COMPAT
EMIT_ARG(attr, MP_QSTR___class__, MP_EMIT_ATTR_LOAD); // get type(exc)
#else
compile_load_id(comp, MP_QSTR_type);
EMIT(rot_two);
EMIT_ARG(call_function, 1, 0, 0); // get type(exc)
#endif
EMIT(rot_two);
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE); // dummy traceback value
// Stack: (..., exc, __aexit__, ctx_mgr, type(exc), exc, None)
EMIT_ARG(call_method, 3, 0, 0);
compile_yield_from(comp);
EMIT_ARG(pop_jump_if, false, l_end);
EMIT(pop_top); // pop exception
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE); // replace with None to swallow exception
EMIT_ARG(jump, l_end);
EMIT_ARG(adjust_stack_size, 2);
// Handle case 3: call __aexit__
// Stack: (..., ctx_mgr, X, INT)
EMIT_ARG(label_assign, l_ret_unwind_jump);
EMIT(rot_three);
EMIT(rot_three);
EMIT_ARG(label_assign, l_aexit_no_exc);
EMIT_ARG(load_method, MP_QSTR___aexit__, false);
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
EMIT(dup_top);
EMIT(dup_top);
EMIT_ARG(call_method, 3, 0, 0);
compile_yield_from(comp);
EMIT(pop_top);
EMIT_ARG(adjust_stack_size, -1);
// End of "finally" block
// Stack can have one of three configurations:
// a. (..., None) - from either case 1, or case 2 with swallowed exception
// b. (..., exc) - from case 2 with re-raised exception
// c. (..., X, INT) - from case 3
EMIT_ARG(label_assign, l_end);
compile_decrease_except_level(comp);
}
}
STATIC void compile_async_with_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
// get the nodes for the pre-bit of the with (the a as b, c as d, ... bit)
mp_parse_node_t *nodes;
size_t n = mp_parse_node_extract_list(&pns->nodes[0], PN_with_stmt_list, &nodes);
assert(n > 0);
// compile in a nested fashion
compile_async_with_stmt_helper(comp, n, nodes, pns->nodes[1]);
}
STATIC void compile_async_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
assert(MP_PARSE_NODE_IS_STRUCT(pns->nodes[0]));
mp_parse_node_struct_t *pns0 = (mp_parse_node_struct_t *)pns->nodes[0];
if (MP_PARSE_NODE_STRUCT_KIND(pns0) == PN_funcdef) {
// async def
compile_funcdef(comp, pns0);
scope_t *fscope = (scope_t *)pns0->nodes[4];
fscope->scope_flags |= MP_SCOPE_FLAG_GENERATOR;
} else {
// async for/with; first verify the scope is a generator
int scope_flags = comp->scope_cur->scope_flags;
if (!(scope_flags & MP_SCOPE_FLAG_GENERATOR)) {
compile_syntax_error(comp, (mp_parse_node_t)pns0,
MP_ERROR_TEXT("async for/with outside async function"));
return;
}
if (MP_PARSE_NODE_STRUCT_KIND(pns0) == PN_for_stmt) {
// async for
compile_async_for_stmt(comp, pns0);
} else {
// async with
assert(MP_PARSE_NODE_STRUCT_KIND(pns0) == PN_with_stmt);
compile_async_with_stmt(comp, pns0);
}
}
}
#endif
STATIC void compile_expr_stmt(compiler_t *comp, mp_parse_node_struct_t *pns) {
mp_parse_node_t pn_rhs = pns->nodes[1];
if (MP_PARSE_NODE_IS_NULL(pn_rhs)) {
2013-10-18 19:58:12 +01:00
if (comp->is_repl && comp->scope_cur->kind == SCOPE_MODULE) {
// for REPL, evaluate then print the expression
compile_load_id(comp, MP_QSTR___repl_print__);
2013-10-18 19:58:12 +01:00
compile_node(comp, pns->nodes[0]);
EMIT_ARG(call_function, 1, 0, 0);
2013-10-18 19:58:12 +01:00
EMIT(pop_top);
2013-10-04 19:53:11 +01:00
} else {
2013-10-18 19:58:12 +01:00
// for non-REPL, evaluate then discard the expression
if ((MP_PARSE_NODE_IS_LEAF(pns->nodes[0]) && !MP_PARSE_NODE_IS_ID(pns->nodes[0]))
|| MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_const_object)) {
2013-10-18 19:58:12 +01:00
// do nothing with a lonely constant
} else {
compile_node(comp, pns->nodes[0]); // just an expression
EMIT(pop_top); // discard last result since this is a statement and leaves nothing on the stack
}
2013-10-04 19:53:11 +01:00
}
} else if (MP_PARSE_NODE_IS_STRUCT(pn_rhs)) {
mp_parse_node_struct_t *pns1 = (mp_parse_node_struct_t *)pn_rhs;
int kind = MP_PARSE_NODE_STRUCT_KIND(pns1);
if (kind == PN_annassign) {
// the annotation is in pns1->nodes[0] and is ignored
if (MP_PARSE_NODE_IS_NULL(pns1->nodes[1])) {
// an annotation of the form "x: y"
// inside a function this declares "x" as a local
if (comp->scope_cur->kind == SCOPE_FUNCTION) {
if (MP_PARSE_NODE_IS_ID(pns->nodes[0])) {
qstr lhs = MP_PARSE_NODE_LEAF_ARG(pns->nodes[0]);
scope_find_or_add_id(comp->scope_cur, lhs, ID_INFO_KIND_LOCAL);
}
}
} else {
// an assigned annotation of the form "x: y = z"
pn_rhs = pns1->nodes[1];
goto plain_assign;
}
} else if (kind == PN_expr_stmt_augassign) {
2013-10-04 19:53:11 +01:00
c_assign(comp, pns->nodes[0], ASSIGN_AUG_LOAD); // lhs load for aug assign
compile_node(comp, pns1->nodes[1]); // rhs
assert(MP_PARSE_NODE_IS_TOKEN(pns1->nodes[0]));
mp_token_kind_t tok = MP_PARSE_NODE_LEAF_ARG(pns1->nodes[0]);
mp_binary_op_t op = MP_BINARY_OP_INPLACE_OR + (tok - MP_TOKEN_DEL_PIPE_EQUAL);
EMIT_ARG(binary_op, op);
2013-10-04 19:53:11 +01:00
c_assign(comp, pns->nodes[0], ASSIGN_AUG_STORE); // lhs store for aug assign
} else if (kind == PN_expr_stmt_assign_list) {
int rhs = MP_PARSE_NODE_STRUCT_NUM_NODES(pns1) - 1;
compile_node(comp, pns1->nodes[rhs]); // rhs
2013-10-04 19:53:11 +01:00
// following CPython, we store left-most first
if (rhs > 0) {
EMIT(dup_top);
}
c_assign(comp, pns->nodes[0], ASSIGN_STORE); // lhs store
for (int i = 0; i < rhs; i++) {
if (i + 1 < rhs) {
EMIT(dup_top);
}
c_assign(comp, pns1->nodes[i], ASSIGN_STORE); // middle store
2013-10-04 19:53:11 +01:00
}
} else {
plain_assign:
#if MICROPY_COMP_DOUBLE_TUPLE_ASSIGN
if (MP_PARSE_NODE_IS_STRUCT_KIND(pn_rhs, PN_testlist_star_expr)
&& MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_testlist_star_expr)) {
mp_parse_node_struct_t *pns0 = (mp_parse_node_struct_t *)pns->nodes[0];
pns1 = (mp_parse_node_struct_t *)pn_rhs;
uint32_t n_pns0 = MP_PARSE_NODE_STRUCT_NUM_NODES(pns0);
// Can only optimise a tuple-to-tuple assignment when all of the following hold:
// - equal number of items in LHS and RHS tuples
// - 2 or 3 items in the tuples
// - there are no star expressions in the LHS tuple
if (n_pns0 == MP_PARSE_NODE_STRUCT_NUM_NODES(pns1)
&& (n_pns0 == 2
#if MICROPY_COMP_TRIPLE_TUPLE_ASSIGN
|| n_pns0 == 3
#endif
)
&& !MP_PARSE_NODE_IS_STRUCT_KIND(pns0->nodes[0], PN_star_expr)
&& !MP_PARSE_NODE_IS_STRUCT_KIND(pns0->nodes[1], PN_star_expr)
#if MICROPY_COMP_TRIPLE_TUPLE_ASSIGN
&& (n_pns0 == 2 || !MP_PARSE_NODE_IS_STRUCT_KIND(pns0->nodes[2], PN_star_expr))
#endif
) {
// Optimisation for a, b = c, d or a, b, c = d, e, f
compile_node(comp, pns1->nodes[0]); // rhs
compile_node(comp, pns1->nodes[1]); // rhs
#if MICROPY_COMP_TRIPLE_TUPLE_ASSIGN
if (n_pns0 == 3) {
compile_node(comp, pns1->nodes[2]); // rhs
EMIT(rot_three);
}
#endif
EMIT(rot_two);
c_assign(comp, pns0->nodes[0], ASSIGN_STORE); // lhs store
c_assign(comp, pns0->nodes[1], ASSIGN_STORE); // lhs store
#if MICROPY_COMP_TRIPLE_TUPLE_ASSIGN
if (n_pns0 == 3) {
c_assign(comp, pns0->nodes[2], ASSIGN_STORE); // lhs store
}
#endif
return;
}
2013-10-04 19:53:11 +01:00
}
#endif
compile_node(comp, pn_rhs); // rhs
c_assign(comp, pns->nodes[0], ASSIGN_STORE); // lhs store
2013-10-04 19:53:11 +01:00
}
} else {
goto plain_assign;
2013-10-04 19:53:11 +01:00
}
}
STATIC void compile_test_if_expr(compiler_t *comp, mp_parse_node_struct_t *pns) {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[1], PN_test_if_else));
mp_parse_node_struct_t *pns_test_if_else = (mp_parse_node_struct_t *)pns->nodes[1];
2013-10-04 19:53:11 +01:00
uint l_fail = comp_next_label(comp);
uint l_end = comp_next_label(comp);
2013-10-04 19:53:11 +01:00
c_if_cond(comp, pns_test_if_else->nodes[0], false, l_fail); // condition
compile_node(comp, pns->nodes[0]); // success value
EMIT_ARG(jump, l_end);
EMIT_ARG(label_assign, l_fail);
EMIT_ARG(adjust_stack_size, -1); // adjust stack size
2013-10-04 19:53:11 +01:00
compile_node(comp, pns_test_if_else->nodes[1]); // failure value
EMIT_ARG(label_assign, l_end);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_lambdef(compiler_t *comp, mp_parse_node_struct_t *pns) {
if (comp->pass == MP_PASS_SCOPE) {
2013-10-04 19:53:11 +01:00
// create a new scope for this lambda
scope_t *s = scope_new_and_link(comp, SCOPE_LAMBDA, (mp_parse_node_t)pns, comp->scope_cur->emit_options);
2013-10-04 19:53:11 +01:00
// store the lambda scope so the compiling function (this one) can use it at each pass
pns->nodes[2] = (mp_parse_node_t)s;
2013-10-04 19:53:11 +01:00
}
// get the scope for this lambda
scope_t *this_scope = (scope_t *)pns->nodes[2];
// compile the lambda definition
compile_funcdef_lambdef(comp, this_scope, pns->nodes[0], PN_varargslist);
2013-10-04 19:53:11 +01:00
}
#if MICROPY_PY_ASSIGN_EXPR
STATIC void compile_namedexpr_helper(compiler_t *comp, mp_parse_node_t pn_name, mp_parse_node_t pn_expr) {
if (!MP_PARSE_NODE_IS_ID(pn_name)) {
compile_syntax_error(comp, (mp_parse_node_t)pn_name, MP_ERROR_TEXT("can't assign to expression"));
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
return; // because pn_name is not a valid qstr (in compile_store_id below)
}
compile_node(comp, pn_expr);
EMIT(dup_top);
scope_t *old_scope = comp->scope_cur;
if (SCOPE_IS_COMP_LIKE(comp->scope_cur->kind)) {
// Use parent's scope for assigned value so it can "escape"
comp->scope_cur = comp->scope_cur->parent;
}
compile_store_id(comp, MP_PARSE_NODE_LEAF_ARG(pn_name));
comp->scope_cur = old_scope;
}
STATIC void compile_namedexpr(compiler_t *comp, mp_parse_node_struct_t *pns) {
compile_namedexpr_helper(comp, pns->nodes[0], pns->nodes[1]);
}
#endif
STATIC void compile_or_and_test(compiler_t *comp, mp_parse_node_struct_t *pns) {
bool cond = MP_PARSE_NODE_STRUCT_KIND(pns) == PN_or_test;
uint l_end = comp_next_label(comp);
int n = MP_PARSE_NODE_STRUCT_NUM_NODES(pns);
2013-10-04 19:53:11 +01:00
for (int i = 0; i < n; i += 1) {
compile_node(comp, pns->nodes[i]);
if (i + 1 < n) {
EMIT_ARG(jump_if_or_pop, cond, l_end);
2013-10-04 19:53:11 +01:00
}
}
EMIT_ARG(label_assign, l_end);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_not_test_2(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[0]);
EMIT_ARG(unary_op, MP_UNARY_OP_NOT);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_comparison(compiler_t *comp, mp_parse_node_struct_t *pns) {
int num_nodes = MP_PARSE_NODE_STRUCT_NUM_NODES(pns);
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[0]);
bool multi = (num_nodes > 3);
uint l_fail = 0;
2013-10-04 19:53:11 +01:00
if (multi) {
l_fail = comp_next_label(comp);
2013-10-04 19:53:11 +01:00
}
for (int i = 1; i + 1 < num_nodes; i += 2) {
compile_node(comp, pns->nodes[i + 1]);
if (i + 2 < num_nodes) {
EMIT(dup_top);
EMIT(rot_three);
}
if (MP_PARSE_NODE_IS_TOKEN(pns->nodes[i])) {
mp_token_kind_t tok = MP_PARSE_NODE_LEAF_ARG(pns->nodes[i]);
mp_binary_op_t op;
if (tok == MP_TOKEN_KW_IN) {
op = MP_BINARY_OP_IN;
} else {
op = MP_BINARY_OP_LESS + (tok - MP_TOKEN_OP_LESS);
}
EMIT_ARG(binary_op, op);
} else {
assert(MP_PARSE_NODE_IS_STRUCT(pns->nodes[i])); // should be
mp_parse_node_struct_t *pns2 = (mp_parse_node_struct_t *)pns->nodes[i];
int kind = MP_PARSE_NODE_STRUCT_KIND(pns2);
2013-10-04 19:53:11 +01:00
if (kind == PN_comp_op_not_in) {
EMIT_ARG(binary_op, MP_BINARY_OP_NOT_IN);
} else {
assert(kind == PN_comp_op_is); // should be
if (MP_PARSE_NODE_IS_NULL(pns2->nodes[0])) {
EMIT_ARG(binary_op, MP_BINARY_OP_IS);
2013-10-04 19:53:11 +01:00
} else {
EMIT_ARG(binary_op, MP_BINARY_OP_IS_NOT);
2013-10-04 19:53:11 +01:00
}
}
}
if (i + 2 < num_nodes) {
EMIT_ARG(jump_if_or_pop, false, l_fail);
2013-10-04 19:53:11 +01:00
}
}
if (multi) {
uint l_end = comp_next_label(comp);
EMIT_ARG(jump, l_end);
EMIT_ARG(label_assign, l_fail);
EMIT_ARG(adjust_stack_size, 1);
2013-10-04 19:53:11 +01:00
EMIT(rot_two);
EMIT(pop_top);
EMIT_ARG(label_assign, l_end);
2013-10-04 19:53:11 +01:00
}
}
STATIC void compile_star_expr(compiler_t *comp, mp_parse_node_struct_t *pns) {
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("*x must be assignment target"));
2013-10-04 19:53:11 +01:00
}
STATIC void compile_binary_op(compiler_t *comp, mp_parse_node_struct_t *pns) {
MP_STATIC_ASSERT(MP_BINARY_OP_OR + PN_xor_expr - PN_expr == MP_BINARY_OP_XOR);
MP_STATIC_ASSERT(MP_BINARY_OP_OR + PN_and_expr - PN_expr == MP_BINARY_OP_AND);
mp_binary_op_t binary_op = MP_BINARY_OP_OR + MP_PARSE_NODE_STRUCT_KIND(pns) - PN_expr;
int num_nodes = MP_PARSE_NODE_STRUCT_NUM_NODES(pns);
compile_node(comp, pns->nodes[0]);
for (int i = 1; i < num_nodes; ++i) {
compile_node(comp, pns->nodes[i]);
EMIT_ARG(binary_op, binary_op);
}
2013-10-04 19:53:11 +01:00
}
STATIC void compile_term(compiler_t *comp, mp_parse_node_struct_t *pns) {
int num_nodes = MP_PARSE_NODE_STRUCT_NUM_NODES(pns);
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[0]);
for (int i = 1; i + 1 < num_nodes; i += 2) {
compile_node(comp, pns->nodes[i + 1]);
mp_token_kind_t tok = MP_PARSE_NODE_LEAF_ARG(pns->nodes[i]);
mp_binary_op_t op = MP_BINARY_OP_LSHIFT + (tok - MP_TOKEN_OP_DBL_LESS);
EMIT_ARG(binary_op, op);
2013-10-04 19:53:11 +01:00
}
}
STATIC void compile_factor_2(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[1]);
mp_token_kind_t tok = MP_PARSE_NODE_LEAF_ARG(pns->nodes[0]);
mp_unary_op_t op;
if (tok == MP_TOKEN_OP_TILDE) {
op = MP_UNARY_OP_INVERT;
} else {
assert(tok == MP_TOKEN_OP_PLUS || tok == MP_TOKEN_OP_MINUS);
op = MP_UNARY_OP_POSITIVE + (tok - MP_TOKEN_OP_PLUS);
2013-10-04 19:53:11 +01:00
}
EMIT_ARG(unary_op, op);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_atom_expr_normal(compiler_t *comp, mp_parse_node_struct_t *pns) {
// compile the subject of the expression
compile_node(comp, pns->nodes[0]);
2014-02-05 00:51:47 +00:00
// compile_atom_expr_await may call us with a NULL node
if (MP_PARSE_NODE_IS_NULL(pns->nodes[1])) {
return;
}
// get the array of trailers (known to be an array of PARSE_NODE_STRUCT)
size_t num_trail = 1;
mp_parse_node_struct_t **pns_trail = (mp_parse_node_struct_t **)&pns->nodes[1];
if (MP_PARSE_NODE_STRUCT_KIND(pns_trail[0]) == PN_atom_expr_trailers) {
num_trail = MP_PARSE_NODE_STRUCT_NUM_NODES(pns_trail[0]);
pns_trail = (mp_parse_node_struct_t **)&pns_trail[0]->nodes[0];
}
2014-02-05 00:51:47 +00:00
// the current index into the array of trailers
size_t i = 0;
2013-10-04 19:53:11 +01:00
// handle special super() call
if (comp->scope_cur->kind == SCOPE_FUNCTION
&& MP_PARSE_NODE_IS_ID(pns->nodes[0])
&& MP_PARSE_NODE_LEAF_ARG(pns->nodes[0]) == MP_QSTR_super
&& MP_PARSE_NODE_STRUCT_KIND(pns_trail[0]) == PN_trailer_paren
&& MP_PARSE_NODE_IS_NULL(pns_trail[0]->nodes[0])) {
// at this point we have matched "super()" within a function
// load the class for super to search for a parent
compile_load_id(comp, MP_QSTR___class__);
// look for first argument to function (assumes it's "self")
bool found = false;
id_info_t *id = &comp->scope_cur->id_info[0];
for (size_t n = comp->scope_cur->id_info_len; n > 0; --n, ++id) {
if (id->flags & ID_FLAG_IS_PARAM) {
// first argument found; load it
compile_load_id(comp, id->qst);
found = true;
break;
2014-02-05 00:51:47 +00:00
}
}
if (!found) {
compile_syntax_error(comp, (mp_parse_node_t)pns_trail[0],
MP_ERROR_TEXT("super() can't find self")); // really a TypeError
return;
}
if (num_trail >= 3
&& MP_PARSE_NODE_STRUCT_KIND(pns_trail[1]) == PN_trailer_period
&& MP_PARSE_NODE_STRUCT_KIND(pns_trail[2]) == PN_trailer_paren) {
// optimisation for method calls super().f(...), to eliminate heap allocation
mp_parse_node_struct_t *pns_period = pns_trail[1];
mp_parse_node_struct_t *pns_paren = pns_trail[2];
EMIT_ARG(load_method, MP_PARSE_NODE_LEAF_ARG(pns_period->nodes[0]), true);
compile_trailer_paren_helper(comp, pns_paren->nodes[0], true, 0);
i = 3;
} else {
// a super() call
EMIT_ARG(call_function, 2, 0, 0);
i = 1;
}
#if MICROPY_COMP_CONST_LITERAL && MICROPY_PY_COLLECTIONS_ORDEREDDICT
// handle special OrderedDict constructor
} else if (MP_PARSE_NODE_IS_ID(pns->nodes[0])
&& MP_PARSE_NODE_LEAF_ARG(pns->nodes[0]) == MP_QSTR_OrderedDict
&& MP_PARSE_NODE_STRUCT_KIND(pns_trail[0]) == PN_trailer_paren
&& MP_PARSE_NODE_IS_STRUCT_KIND(pns_trail[0]->nodes[0], PN_atom_brace)) {
// at this point we have matched "OrderedDict({...})"
EMIT_ARG(call_function, 0, 0, 0);
mp_parse_node_struct_t *pns_dict = (mp_parse_node_struct_t *)pns_trail[0]->nodes[0];
compile_atom_brace_helper(comp, pns_dict, false);
i = 1;
#endif
}
// compile the remaining trailers
for (; i < num_trail; i++) {
if (i + 1 < num_trail
&& MP_PARSE_NODE_STRUCT_KIND(pns_trail[i]) == PN_trailer_period
&& MP_PARSE_NODE_STRUCT_KIND(pns_trail[i + 1]) == PN_trailer_paren) {
// optimisation for method calls a.f(...), following PyPy
mp_parse_node_struct_t *pns_period = pns_trail[i];
mp_parse_node_struct_t *pns_paren = pns_trail[i + 1];
EMIT_ARG(load_method, MP_PARSE_NODE_LEAF_ARG(pns_period->nodes[0]), false);
compile_trailer_paren_helper(comp, pns_paren->nodes[0], true, 0);
i += 1;
} else {
// node is one of: trailer_paren, trailer_bracket, trailer_period
compile_node(comp, (mp_parse_node_t)pns_trail[i]);
}
2014-02-05 00:51:47 +00:00
}
}
STATIC void compile_power(compiler_t *comp, mp_parse_node_struct_t *pns) {
compile_generic_all_nodes(comp, pns); // 2 nodes, arguments of power
EMIT_ARG(binary_op, MP_BINARY_OP_POWER);
}
STATIC void compile_trailer_paren_helper(compiler_t *comp, mp_parse_node_t pn_arglist, bool is_method_call, int n_positional_extra) {
// function to call is on top of stack
2014-02-05 00:51:47 +00:00
// get the list of arguments
mp_parse_node_t *args;
size_t n_args = mp_parse_node_extract_list(&pn_arglist, PN_arglist, &args);
// compile the arguments
// Rather than calling compile_node on the list, we go through the list of args
// explicitly here so that we can count the number of arguments and give sensible
// error messages.
int n_positional = n_positional_extra;
uint n_keyword = 0;
uint star_flags = 0;
mp_parse_node_struct_t *star_args_node = NULL, *dblstar_args_node = NULL;
for (size_t i = 0; i < n_args; i++) {
if (MP_PARSE_NODE_IS_STRUCT(args[i])) {
mp_parse_node_struct_t *pns_arg = (mp_parse_node_struct_t *)args[i];
if (MP_PARSE_NODE_STRUCT_KIND(pns_arg) == PN_arglist_star) {
if (star_flags & MP_EMIT_STAR_FLAG_SINGLE) {
compile_syntax_error(comp, (mp_parse_node_t)pns_arg, MP_ERROR_TEXT("can't have multiple *x"));
return;
}
star_flags |= MP_EMIT_STAR_FLAG_SINGLE;
star_args_node = pns_arg;
} else if (MP_PARSE_NODE_STRUCT_KIND(pns_arg) == PN_arglist_dbl_star) {
if (star_flags & MP_EMIT_STAR_FLAG_DOUBLE) {
compile_syntax_error(comp, (mp_parse_node_t)pns_arg, MP_ERROR_TEXT("can't have multiple **x"));
return;
}
star_flags |= MP_EMIT_STAR_FLAG_DOUBLE;
dblstar_args_node = pns_arg;
} else if (MP_PARSE_NODE_STRUCT_KIND(pns_arg) == PN_argument) {
#if MICROPY_PY_ASSIGN_EXPR
if (MP_PARSE_NODE_IS_STRUCT_KIND(pns_arg->nodes[1], PN_argument_3)) {
compile_namedexpr_helper(comp, pns_arg->nodes[0], ((mp_parse_node_struct_t *)pns_arg->nodes[1])->nodes[0]);
n_positional++;
} else
#endif
if (!MP_PARSE_NODE_IS_STRUCT_KIND(pns_arg->nodes[1], PN_comp_for)) {
if (!MP_PARSE_NODE_IS_ID(pns_arg->nodes[0])) {
compile_syntax_error(comp, (mp_parse_node_t)pns_arg, MP_ERROR_TEXT("LHS of keyword arg must be an id"));
return;
}
EMIT_ARG(load_const_str, MP_PARSE_NODE_LEAF_ARG(pns_arg->nodes[0]));
compile_node(comp, pns_arg->nodes[1]);
n_keyword += 1;
} else {
compile_comprehension(comp, pns_arg, SCOPE_GEN_EXPR);
n_positional++;
}
} else {
goto normal_argument;
}
} else {
normal_argument:
if (star_flags) {
compile_syntax_error(comp, args[i], MP_ERROR_TEXT("non-keyword arg after */**"));
return;
}
if (n_keyword > 0) {
compile_syntax_error(comp, args[i], MP_ERROR_TEXT("non-keyword arg after keyword arg"));
return;
}
compile_node(comp, args[i]);
n_positional++;
}
2013-10-04 19:53:11 +01:00
}
// compile the star/double-star arguments if we had them
// if we had one but not the other then we load "null" as a place holder
if (star_flags != 0) {
if (star_args_node == NULL) {
EMIT(load_null);
} else {
compile_node(comp, star_args_node->nodes[0]);
}
if (dblstar_args_node == NULL) {
EMIT(load_null);
} else {
compile_node(comp, dblstar_args_node->nodes[0]);
}
}
// emit the function/method call
2013-10-04 19:53:11 +01:00
if (is_method_call) {
EMIT_ARG(call_method, n_positional, n_keyword, star_flags);
2013-10-04 19:53:11 +01:00
} else {
EMIT_ARG(call_function, n_positional, n_keyword, star_flags);
2013-10-04 19:53:11 +01:00
}
}
// pns needs to have 2 nodes, first is lhs of comprehension, second is PN_comp_for node
STATIC void compile_comprehension(compiler_t *comp, mp_parse_node_struct_t *pns, scope_kind_t kind) {
assert(MP_PARSE_NODE_STRUCT_NUM_NODES(pns) == 2);
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[1], PN_comp_for));
mp_parse_node_struct_t *pns_comp_for = (mp_parse_node_struct_t *)pns->nodes[1];
2013-10-04 19:53:11 +01:00
if (comp->pass == MP_PASS_SCOPE) {
2013-10-04 19:53:11 +01:00
// create a new scope for this comprehension
scope_t *s = scope_new_and_link(comp, kind, (mp_parse_node_t)pns, comp->scope_cur->emit_options);
2013-10-04 19:53:11 +01:00
// store the comprehension scope so the compiling function (this one) can use it at each pass
pns_comp_for->nodes[3] = (mp_parse_node_t)s;
2013-10-04 19:53:11 +01:00
}
// get the scope for this comprehension
scope_t *this_scope = (scope_t *)pns_comp_for->nodes[3];
// compile the comprehension
close_over_variables_etc(comp, this_scope, 0, 0);
compile_node(comp, pns_comp_for->nodes[1]); // source of the iterator
if (kind == SCOPE_GEN_EXPR) {
EMIT_ARG(get_iter, false);
}
EMIT_ARG(call_function, 1, 0, 0);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_atom_paren(compiler_t *comp, mp_parse_node_struct_t *pns) {
if (MP_PARSE_NODE_IS_NULL(pns->nodes[0])) {
2013-10-04 19:53:11 +01:00
// an empty tuple
EMIT_ARG(build, 0, MP_EMIT_BUILD_TUPLE);
} else {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_testlist_comp));
pns = (mp_parse_node_struct_t *)pns->nodes[0];
if (MP_PARSE_NODE_TESTLIST_COMP_HAS_COMP_FOR(pns)) {
// generator expression
compile_comprehension(comp, pns, SCOPE_GEN_EXPR);
2013-10-04 19:53:11 +01:00
} else {
// tuple with N items
compile_generic_tuple(comp, pns);
2013-10-04 19:53:11 +01:00
}
}
}
STATIC void compile_atom_bracket(compiler_t *comp, mp_parse_node_struct_t *pns) {
if (MP_PARSE_NODE_IS_NULL(pns->nodes[0])) {
2013-10-04 19:53:11 +01:00
// empty list
EMIT_ARG(build, 0, MP_EMIT_BUILD_LIST);
} else if (MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_testlist_comp)) {
mp_parse_node_struct_t *pns2 = (mp_parse_node_struct_t *)pns->nodes[0];
if (MP_PARSE_NODE_TESTLIST_COMP_HAS_COMP_FOR(pns2)) {
// list comprehension
compile_comprehension(comp, pns2, SCOPE_LIST_COMP);
2013-10-04 19:53:11 +01:00
} else {
// list with N items
compile_generic_all_nodes(comp, pns2);
EMIT_ARG(build, MP_PARSE_NODE_STRUCT_NUM_NODES(pns2), MP_EMIT_BUILD_LIST);
2013-10-04 19:53:11 +01:00
}
} else {
// list with 1 item
compile_node(comp, pns->nodes[0]);
EMIT_ARG(build, 1, MP_EMIT_BUILD_LIST);
2013-10-04 19:53:11 +01:00
}
}
STATIC void compile_atom_brace_helper(compiler_t *comp, mp_parse_node_struct_t *pns, bool create_map) {
mp_parse_node_t pn = pns->nodes[0];
if (MP_PARSE_NODE_IS_NULL(pn)) {
2013-10-04 19:53:11 +01:00
// empty dict
if (create_map) {
EMIT_ARG(build, 0, MP_EMIT_BUILD_MAP);
}
} else if (MP_PARSE_NODE_IS_STRUCT(pn)) {
pns = (mp_parse_node_struct_t *)pn;
if (MP_PARSE_NODE_STRUCT_KIND(pns) == PN_dictorsetmaker_item) {
2013-10-04 19:53:11 +01:00
// dict with one element
if (create_map) {
EMIT_ARG(build, 1, MP_EMIT_BUILD_MAP);
}
2013-10-04 19:53:11 +01:00
compile_node(comp, pn);
EMIT(store_map);
} else if (MP_PARSE_NODE_STRUCT_KIND(pns) == PN_dictorsetmaker) {
assert(MP_PARSE_NODE_IS_STRUCT(pns->nodes[1])); // should succeed
mp_parse_node_struct_t *pns1 = (mp_parse_node_struct_t *)pns->nodes[1];
if (MP_PARSE_NODE_STRUCT_KIND(pns1) == PN_dictorsetmaker_list) {
2013-10-04 19:53:11 +01:00
// dict/set with multiple elements
// get tail elements (2nd, 3rd, ...)
mp_parse_node_t *nodes;
size_t n = mp_parse_node_extract_list(&pns1->nodes[0], PN_dictorsetmaker_list2, &nodes);
2013-10-04 19:53:11 +01:00
// first element sets whether it's a dict or set
bool is_dict;
if (!MICROPY_PY_BUILTINS_SET || MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_dictorsetmaker_item)) {
2013-10-04 19:53:11 +01:00
// a dictionary
if (create_map) {
EMIT_ARG(build, 1 + n, MP_EMIT_BUILD_MAP);
}
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[0]);
EMIT(store_map);
is_dict = true;
} else {
// a set
compile_node(comp, pns->nodes[0]); // 1st value of set
is_dict = false;
}
// process rest of elements
for (size_t i = 0; i < n; i++) {
mp_parse_node_t pn_i = nodes[i];
bool is_key_value = MP_PARSE_NODE_IS_STRUCT_KIND(pn_i, PN_dictorsetmaker_item);
compile_node(comp, pn_i);
2013-10-04 19:53:11 +01:00
if (is_dict) {
if (!is_key_value) {
#if MICROPY_ERROR_REPORTING <= MICROPY_ERROR_REPORTING_TERSE
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("invalid syntax"));
#else
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("expecting key:value for dict"));
#endif
2013-10-04 19:53:11 +01:00
return;
}
EMIT(store_map);
} else {
if (is_key_value) {
#if MICROPY_ERROR_REPORTING <= MICROPY_ERROR_REPORTING_TERSE
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("invalid syntax"));
#else
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("expecting just a value for set"));
#endif
2013-10-04 19:53:11 +01:00
return;
}
}
}
#if MICROPY_PY_BUILTINS_SET
2013-10-04 19:53:11 +01:00
// if it's a set, build it
if (!is_dict) {
EMIT_ARG(build, 1 + n, MP_EMIT_BUILD_SET);
2013-10-04 19:53:11 +01:00
}
#endif
} else {
assert(MP_PARSE_NODE_STRUCT_KIND(pns1) == PN_comp_for); // should be
2013-10-04 19:53:11 +01:00
// dict/set comprehension
if (!MICROPY_PY_BUILTINS_SET || MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_dictorsetmaker_item)) {
2013-10-04 19:53:11 +01:00
// a dictionary comprehension
compile_comprehension(comp, pns, SCOPE_DICT_COMP);
} else {
// a set comprehension
compile_comprehension(comp, pns, SCOPE_SET_COMP);
}
}
} else {
// set with one element
goto set_with_one_element;
}
} else {
// set with one element
set_with_one_element:
#if MICROPY_PY_BUILTINS_SET
2013-10-04 19:53:11 +01:00
compile_node(comp, pn);
EMIT_ARG(build, 1, MP_EMIT_BUILD_SET);
#else
assert(0);
#endif
2013-10-04 19:53:11 +01:00
}
}
STATIC void compile_atom_brace(compiler_t *comp, mp_parse_node_struct_t *pns) {
compile_atom_brace_helper(comp, pns, true);
}
STATIC void compile_trailer_paren(compiler_t *comp, mp_parse_node_struct_t *pns) {
compile_trailer_paren_helper(comp, pns->nodes[0], false, 0);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_trailer_bracket(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-10-04 19:53:11 +01:00
// object who's index we want is on top of stack
compile_node(comp, pns->nodes[0]); // the index
EMIT_ARG(subscr, MP_EMIT_SUBSCR_LOAD);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_trailer_period(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-10-04 19:53:11 +01:00
// object who's attribute we want is on top of stack
EMIT_ARG(attr, MP_PARSE_NODE_LEAF_ARG(pns->nodes[0]), MP_EMIT_ATTR_LOAD); // attribute to get
2013-10-04 19:53:11 +01:00
}
#if MICROPY_PY_BUILTINS_SLICE
STATIC void compile_subscript(compiler_t *comp, mp_parse_node_struct_t *pns) {
if (MP_PARSE_NODE_STRUCT_KIND(pns) == PN_subscript_2) {
compile_node(comp, pns->nodes[0]); // start of slice
assert(MP_PARSE_NODE_IS_STRUCT(pns->nodes[1])); // should always be
pns = (mp_parse_node_struct_t *)pns->nodes[1];
} else {
// pns is a PN_subscript_3, load None for start of slice
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
}
assert(MP_PARSE_NODE_STRUCT_KIND(pns) == PN_subscript_3); // should always be
mp_parse_node_t pn = pns->nodes[0];
if (MP_PARSE_NODE_IS_NULL(pn)) {
2013-10-04 19:53:11 +01:00
// [?:]
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
EMIT_ARG(build, 2, MP_EMIT_BUILD_SLICE);
} else if (MP_PARSE_NODE_IS_STRUCT(pn)) {
pns = (mp_parse_node_struct_t *)pn;
if (MP_PARSE_NODE_STRUCT_KIND(pns) == PN_subscript_3c) {
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
2013-10-04 19:53:11 +01:00
pn = pns->nodes[0];
if (MP_PARSE_NODE_IS_NULL(pn)) {
2013-10-04 19:53:11 +01:00
// [?::]
EMIT_ARG(build, 2, MP_EMIT_BUILD_SLICE);
2013-10-04 19:53:11 +01:00
} else {
// [?::x]
compile_node(comp, pn);
EMIT_ARG(build, 3, MP_EMIT_BUILD_SLICE);
2013-10-04 19:53:11 +01:00
}
} else if (MP_PARSE_NODE_STRUCT_KIND(pns) == PN_subscript_3d) {
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[0]);
assert(MP_PARSE_NODE_IS_STRUCT(pns->nodes[1])); // should always be
pns = (mp_parse_node_struct_t *)pns->nodes[1];
assert(MP_PARSE_NODE_STRUCT_KIND(pns) == PN_sliceop); // should always be
if (MP_PARSE_NODE_IS_NULL(pns->nodes[0])) {
2013-10-04 19:53:11 +01:00
// [?:x:]
EMIT_ARG(build, 2, MP_EMIT_BUILD_SLICE);
2013-10-04 19:53:11 +01:00
} else {
// [?:x:x]
compile_node(comp, pns->nodes[0]);
EMIT_ARG(build, 3, MP_EMIT_BUILD_SLICE);
2013-10-04 19:53:11 +01:00
}
} else {
// [?:x]
compile_node(comp, pn);
EMIT_ARG(build, 2, MP_EMIT_BUILD_SLICE);
2013-10-04 19:53:11 +01:00
}
} else {
// [?:x]
compile_node(comp, pn);
EMIT_ARG(build, 2, MP_EMIT_BUILD_SLICE);
2013-10-04 19:53:11 +01:00
}
}
#endif // MICROPY_PY_BUILTINS_SLICE
2013-10-04 19:53:11 +01:00
STATIC void compile_dictorsetmaker_item(compiler_t *comp, mp_parse_node_struct_t *pns) {
2013-10-04 19:53:11 +01:00
// if this is called then we are compiling a dict key:value pair
compile_node(comp, pns->nodes[1]); // value
compile_node(comp, pns->nodes[0]); // key
}
STATIC void compile_classdef(compiler_t *comp, mp_parse_node_struct_t *pns) {
qstr cname = compile_classdef_helper(comp, pns, comp->scope_cur->emit_options);
2013-10-04 19:53:11 +01:00
// store class object into class name
compile_store_id(comp, cname);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_yield_expr(compiler_t *comp, mp_parse_node_struct_t *pns) {
2014-04-11 14:10:21 +01:00
if (comp->scope_cur->kind != SCOPE_FUNCTION && comp->scope_cur->kind != SCOPE_LAMBDA) {
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("'yield' outside function"));
2013-10-04 19:53:11 +01:00
return;
}
if (MP_PARSE_NODE_IS_NULL(pns->nodes[0])) {
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
EMIT_ARG(yield, MP_EMIT_YIELD_VALUE);
reserve_labels_for_native(comp, 1);
} else if (MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_yield_arg_from)) {
pns = (mp_parse_node_struct_t *)pns->nodes[0];
2013-10-04 19:53:11 +01:00
compile_node(comp, pns->nodes[0]);
compile_yield_from(comp);
2013-10-04 19:53:11 +01:00
} else {
compile_node(comp, pns->nodes[0]);
EMIT_ARG(yield, MP_EMIT_YIELD_VALUE);
reserve_labels_for_native(comp, 1);
2013-10-04 19:53:11 +01:00
}
}
#if MICROPY_PY_ASYNC_AWAIT
STATIC void compile_atom_expr_await(compiler_t *comp, mp_parse_node_struct_t *pns) {
if (comp->scope_cur->kind != SCOPE_FUNCTION && comp->scope_cur->kind != SCOPE_LAMBDA) {
compile_syntax_error(comp, (mp_parse_node_t)pns, MP_ERROR_TEXT("'await' outside function"));
return;
}
compile_atom_expr_normal(comp, pns);
compile_yield_from(comp);
}
#endif
STATIC mp_obj_t get_const_object(mp_parse_node_struct_t *pns) {
#if MICROPY_OBJ_REPR == MICROPY_OBJ_REPR_D
// nodes are 32-bit pointers, but need to extract 64-bit object
return (uint64_t)pns->nodes[0] | ((uint64_t)pns->nodes[1] << 32);
#else
return (mp_obj_t)pns->nodes[0];
#endif
}
STATIC void compile_const_object(compiler_t *comp, mp_parse_node_struct_t *pns) {
EMIT_ARG(load_const_obj, get_const_object(pns));
}
typedef void (*compile_function_t)(compiler_t *, mp_parse_node_struct_t *);
STATIC const compile_function_t compile_function[] = {
// only define rules with a compile function
2013-10-04 19:53:11 +01:00
#define c(f) compile_##f
#define DEF_RULE(rule, comp, kind, ...) comp,
#define DEF_RULE_NC(rule, kind, ...)
#include "py/grammar.h"
2013-10-04 19:53:11 +01:00
#undef c
#undef DEF_RULE
#undef DEF_RULE_NC
compile_const_object,
2013-10-04 19:53:11 +01:00
};
STATIC void compile_node(compiler_t *comp, mp_parse_node_t pn) {
if (MP_PARSE_NODE_IS_NULL(pn)) {
2013-10-04 19:53:11 +01:00
// pass
} else if (MP_PARSE_NODE_IS_SMALL_INT(pn)) {
mp_int_t arg = MP_PARSE_NODE_LEAF_SMALL_INT(pn);
#if MICROPY_DYNAMIC_COMPILER
mp_uint_t sign_mask = -((mp_uint_t)1 << (mp_dynamic_compiler.small_int_bits - 1));
if ((arg & sign_mask) == 0 || (arg & sign_mask) == sign_mask) {
// integer fits in target runtime's small-int
EMIT_ARG(load_const_small_int, arg);
} else {
// integer doesn't fit, so create a multi-precision int object
// (but only create the actual object on the last pass)
if (comp->pass != MP_PASS_EMIT) {
EMIT_ARG(load_const_obj, mp_const_none);
} else {
EMIT_ARG(load_const_obj, mp_obj_new_int_from_ll(arg));
}
}
#else
EMIT_ARG(load_const_small_int, arg);
#endif
} else if (MP_PARSE_NODE_IS_LEAF(pn)) {
uintptr_t arg = MP_PARSE_NODE_LEAF_ARG(pn);
switch (MP_PARSE_NODE_LEAF_KIND(pn)) {
case MP_PARSE_NODE_ID:
compile_load_id(comp, arg);
break;
case MP_PARSE_NODE_STRING:
EMIT_ARG(load_const_str, arg);
break;
case MP_PARSE_NODE_BYTES:
// only create and load the actual bytes object on the last pass
if (comp->pass != MP_PASS_EMIT) {
EMIT_ARG(load_const_obj, mp_const_none);
} else {
size_t len;
const byte *data = qstr_data(arg, &len);
EMIT_ARG(load_const_obj, mp_obj_new_bytes(data, len));
}
break;
case MP_PARSE_NODE_TOKEN:
default:
if (arg == MP_TOKEN_NEWLINE) {
// this can occur when file_input lets through a NEWLINE (eg if file starts with a newline)
2013-10-18 19:58:12 +01:00
// or when single_input lets through a NEWLINE (user enters a blank line)
// do nothing
} else {
EMIT_ARG(load_const_tok, arg);
}
break;
2013-10-04 19:53:11 +01:00
}
} else {
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
EMIT_ARG(set_source_line, pns->source_line);
assert(MP_PARSE_NODE_STRUCT_KIND(pns) <= PN_const_object);
compile_function_t f = compile_function[MP_PARSE_NODE_STRUCT_KIND(pns)];
f(comp, pns);
2013-10-04 19:53:11 +01:00
}
}
#if MICROPY_EMIT_NATIVE
STATIC int compile_viper_type_annotation(compiler_t *comp, mp_parse_node_t pn_annotation) {
int native_type = MP_NATIVE_TYPE_OBJ;
if (MP_PARSE_NODE_IS_NULL(pn_annotation)) {
// No annotation, type defaults to object
} else if (MP_PARSE_NODE_IS_ID(pn_annotation)) {
qstr type_name = MP_PARSE_NODE_LEAF_ARG(pn_annotation);
native_type = mp_native_type_from_qstr(type_name);
if (native_type < 0) {
comp->compile_error = mp_obj_new_exception_msg_varg(&mp_type_ViperTypeError, MP_ERROR_TEXT("unknown type '%q'"), type_name);
native_type = 0;
}
} else {
compile_syntax_error(comp, pn_annotation, MP_ERROR_TEXT("annotation must be an identifier"));
}
return native_type;
}
#endif
STATIC void compile_scope_func_lambda_param(compiler_t *comp, mp_parse_node_t pn, pn_kind_t pn_name, pn_kind_t pn_star, pn_kind_t pn_dbl_star) {
(void)pn_dbl_star;
// check that **kw is last
if ((comp->scope_cur->scope_flags & MP_SCOPE_FLAG_VARKEYWORDS) != 0) {
compile_syntax_error(comp, pn, MP_ERROR_TEXT("invalid syntax"));
return;
}
qstr param_name = MP_QSTRnull;
uint param_flag = ID_FLAG_IS_PARAM;
mp_parse_node_struct_t *pns = NULL;
if (MP_PARSE_NODE_IS_ID(pn)) {
param_name = MP_PARSE_NODE_LEAF_ARG(pn);
if (comp->have_star) {
// comes after a star, so counts as a keyword-only parameter
comp->scope_cur->num_kwonly_args += 1;
2013-10-04 19:53:11 +01:00
} else {
// comes before a star, so counts as a positional parameter
comp->scope_cur->num_pos_args += 1;
2013-10-04 19:53:11 +01:00
}
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
mp_emit_common_use_qstr(&comp->emit_common, param_name);
} else {
assert(MP_PARSE_NODE_IS_STRUCT(pn));
pns = (mp_parse_node_struct_t *)pn;
if (MP_PARSE_NODE_STRUCT_KIND(pns) == pn_name) {
// named parameter with possible annotation
param_name = MP_PARSE_NODE_LEAF_ARG(pns->nodes[0]);
if (comp->have_star) {
// comes after a star, so counts as a keyword-only parameter
comp->scope_cur->num_kwonly_args += 1;
} else {
// comes before a star, so counts as a positional parameter
comp->scope_cur->num_pos_args += 1;
}
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
mp_emit_common_use_qstr(&comp->emit_common, param_name);
} else if (MP_PARSE_NODE_STRUCT_KIND(pns) == pn_star) {
if (comp->have_star) {
// more than one star
compile_syntax_error(comp, pn, MP_ERROR_TEXT("invalid syntax"));
return;
}
comp->have_star = true;
param_flag = ID_FLAG_IS_PARAM | ID_FLAG_IS_STAR_PARAM;
if (MP_PARSE_NODE_IS_NULL(pns->nodes[0])) {
// bare star
// TODO see http://www.python.org/dev/peps/pep-3102/
// assert(comp->scope_cur->num_dict_params == 0);
pns = NULL;
} else if (MP_PARSE_NODE_IS_ID(pns->nodes[0])) {
// named star
comp->scope_cur->scope_flags |= MP_SCOPE_FLAG_VARARGS;
param_name = MP_PARSE_NODE_LEAF_ARG(pns->nodes[0]);
pns = NULL;
} else {
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_tfpdef)); // should be
// named star with possible annotation
comp->scope_cur->scope_flags |= MP_SCOPE_FLAG_VARARGS;
pns = (mp_parse_node_struct_t *)pns->nodes[0];
param_name = MP_PARSE_NODE_LEAF_ARG(pns->nodes[0]);
}
} else {
// double star with possible annotation
assert(MP_PARSE_NODE_STRUCT_KIND(pns) == pn_dbl_star); // should be
param_name = MP_PARSE_NODE_LEAF_ARG(pns->nodes[0]);
param_flag = ID_FLAG_IS_PARAM | ID_FLAG_IS_DBL_STAR_PARAM;
comp->scope_cur->scope_flags |= MP_SCOPE_FLAG_VARKEYWORDS;
2013-10-04 19:53:11 +01:00
}
}
if (param_name != MP_QSTRnull) {
id_info_t *id_info = scope_find_or_add_id(comp->scope_cur, param_name, ID_INFO_KIND_UNDECIDED);
if (id_info->kind != ID_INFO_KIND_UNDECIDED) {
compile_syntax_error(comp, pn, MP_ERROR_TEXT("argument name reused"));
2013-10-04 19:53:11 +01:00
return;
}
id_info->kind = ID_INFO_KIND_LOCAL;
id_info->flags = param_flag;
#if MICROPY_EMIT_NATIVE
if (comp->scope_cur->emit_options == MP_EMIT_OPT_VIPER && pn_name == PN_typedargslist_name && pns != NULL) {
id_info->flags |= compile_viper_type_annotation(comp, pns->nodes[1]) << ID_FLAG_VIPER_TYPE_POS;
}
#else
(void)pns;
#endif
2013-10-04 19:53:11 +01:00
}
}
STATIC void compile_scope_func_param(compiler_t *comp, mp_parse_node_t pn) {
compile_scope_func_lambda_param(comp, pn, PN_typedargslist_name, PN_typedargslist_star, PN_typedargslist_dbl_star);
2013-10-04 19:53:11 +01:00
}
STATIC void compile_scope_lambda_param(compiler_t *comp, mp_parse_node_t pn) {
compile_scope_func_lambda_param(comp, pn, PN_varargslist_name, PN_varargslist_star, PN_varargslist_dbl_star);
}
STATIC void compile_scope_comp_iter(compiler_t *comp, mp_parse_node_struct_t *pns_comp_for, mp_parse_node_t pn_inner_expr, int for_depth) {
uint l_top = comp_next_label(comp);
uint l_end = comp_next_label(comp);
EMIT_ARG(label_assign, l_top);
EMIT_ARG(for_iter, l_end);
c_assign(comp, pns_comp_for->nodes[0], ASSIGN_STORE);
mp_parse_node_t pn_iter = pns_comp_for->nodes[2];
2013-10-04 19:53:11 +01:00
tail_recursion:
if (MP_PARSE_NODE_IS_NULL(pn_iter)) {
2013-10-04 19:53:11 +01:00
// no more nested if/for; compile inner expression
compile_node(comp, pn_inner_expr);
if (comp->scope_cur->kind == SCOPE_GEN_EXPR) {
EMIT_ARG(yield, MP_EMIT_YIELD_VALUE);
reserve_labels_for_native(comp, 1);
2013-10-04 19:53:11 +01:00
EMIT(pop_top);
} else {
EMIT_ARG(store_comp, comp->scope_cur->kind, 4 * for_depth + 5);
2013-10-04 19:53:11 +01:00
}
} else if (MP_PARSE_NODE_STRUCT_KIND((mp_parse_node_struct_t *)pn_iter) == PN_comp_if) {
2013-10-04 19:53:11 +01:00
// if condition
mp_parse_node_struct_t *pns_comp_if = (mp_parse_node_struct_t *)pn_iter;
2013-10-04 19:53:11 +01:00
c_if_cond(comp, pns_comp_if->nodes[0], false, l_top);
pn_iter = pns_comp_if->nodes[1];
goto tail_recursion;
} else {
assert(MP_PARSE_NODE_STRUCT_KIND((mp_parse_node_struct_t *)pn_iter) == PN_comp_for); // should be
2013-10-04 19:53:11 +01:00
// for loop
mp_parse_node_struct_t *pns_comp_for2 = (mp_parse_node_struct_t *)pn_iter;
2013-10-04 19:53:11 +01:00
compile_node(comp, pns_comp_for2->nodes[1]);
EMIT_ARG(get_iter, true);
compile_scope_comp_iter(comp, pns_comp_for2, pn_inner_expr, for_depth + 1);
2013-10-04 19:53:11 +01:00
}
EMIT_ARG(jump, l_top);
EMIT_ARG(label_assign, l_end);
EMIT(for_iter_end);
2013-10-04 19:53:11 +01:00
}
STATIC void check_for_doc_string(compiler_t *comp, mp_parse_node_t pn) {
unix-cpy: Remove unix-cpy. It's no longer needed. unix-cpy was originally written to get semantic equivalent with CPython without writing functional tests. When writing the initial implementation of uPy it was a long way between lexer and functional tests, so the half-way test was to make sure that the bytecode was correct. The idea was that if the uPy bytecode matched CPython 1-1 then uPy would be proper Python if the bytecodes acted correctly. And having matching bytecode meant that it was less likely to miss some deep subtlety in the Python semantics that would require an architectural change later on. But that is all history and it no longer makes sense to retain the ability to output CPython bytecode, because: 1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode changes from version to version, and seems to have changed quite a bit in 3.5. There's no point in changing the bytecode output to match CPython anymore. 2. uPy and CPy do different optimisations to the bytecode which makes it harder to match. 3. The bytecode tests are not run. They were never part of Travis and are not run locally anymore. 4. The EMIT_CPYTHON option needs a lot of extra source code which adds heaps of noise, especially in compile.c. 5. Now that there is an extensive test suite (which tests functionality) there is no need to match the bytecode. Some very subtle behaviour is tested with the test suite and passing these tests is a much better way to stay Python-language compliant, rather than trying to match CPy bytecode.
2015-08-14 12:24:11 +01:00
#if MICROPY_ENABLE_DOC_STRING
2013-10-04 19:53:11 +01:00
// see http://www.python.org/dev/peps/pep-0257/
// look for the first statement
if (MP_PARSE_NODE_IS_STRUCT_KIND(pn, PN_expr_stmt)) {
// a statement; fall through
} else if (MP_PARSE_NODE_IS_STRUCT_KIND(pn, PN_file_input_2)) {
// file input; find the first non-newline node
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
int num_nodes = MP_PARSE_NODE_STRUCT_NUM_NODES(pns);
for (int i = 0; i < num_nodes; i++) {
pn = pns->nodes[i];
if (!(MP_PARSE_NODE_IS_LEAF(pn) && MP_PARSE_NODE_LEAF_KIND(pn) == MP_PARSE_NODE_TOKEN && MP_PARSE_NODE_LEAF_ARG(pn) == MP_TOKEN_NEWLINE)) {
// not a newline, so this is the first statement; finish search
break;
}
}
// if we didn't find a non-newline then it's okay to fall through; pn will be a newline and so doc-string test below will fail gracefully
} else if (MP_PARSE_NODE_IS_STRUCT_KIND(pn, PN_suite_block_stmts)) {
// a list of statements; get the first one
pn = ((mp_parse_node_struct_t *)pn)->nodes[0];
2013-10-04 19:53:11 +01:00
} else {
return;
}
// check the first statement for a doc string
if (MP_PARSE_NODE_IS_STRUCT_KIND(pn, PN_expr_stmt)) {
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)pn;
if ((MP_PARSE_NODE_IS_LEAF(pns->nodes[0])
&& MP_PARSE_NODE_LEAF_KIND(pns->nodes[0]) == MP_PARSE_NODE_STRING)
|| (MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[0], PN_const_object)
&& mp_obj_is_str(get_const_object((mp_parse_node_struct_t *)pns->nodes[0])))) {
// compile the doc string
compile_node(comp, pns->nodes[0]);
// store the doc string
compile_store_id(comp, MP_QSTR___doc__);
2013-10-04 19:53:11 +01:00
}
}
#else
(void)comp;
(void)pn;
#endif
2013-10-04 19:53:11 +01:00
}
STATIC void compile_scope(compiler_t *comp, scope_t *scope, pass_kind_t pass) {
2013-10-04 19:53:11 +01:00
comp->pass = pass;
comp->scope_cur = scope;
comp->next_label = 0;
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
mp_emit_common_start_pass(&comp->emit_common, pass);
EMIT_ARG(start_pass, pass, scope);
reserve_labels_for_native(comp, 6); // used by native's start_pass
2013-10-04 19:53:11 +01:00
if (comp->pass == MP_PASS_SCOPE) {
// reset maximum stack sizes in scope
// they will be computed in this first pass
2013-10-04 19:53:11 +01:00
scope->stack_size = 0;
scope->exc_stack_size = 0;
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
#if MICROPY_EMIT_NATIVE
if (scope->emit_options == MP_EMIT_OPT_NATIVE_PYTHON || scope->emit_options == MP_EMIT_OPT_VIPER) {
// allow native code to perfom basic tasks during the pass scope
// note: the first argument passed here is mp_emit_common_t, not the native emitter context
NATIVE_EMITTER_TABLE->start_pass((void *)&comp->emit_common, comp->pass, scope);
}
#endif
2013-10-04 19:53:11 +01:00
}
// compile
2014-01-15 22:14:03 +00:00
if (MP_PARSE_NODE_IS_STRUCT_KIND(scope->pn, PN_eval_input)) {
assert(scope->kind == SCOPE_MODULE);
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)scope->pn;
compile_node(comp, pns->nodes[0]); // compile the expression
EMIT(return_value);
} else if (scope->kind == SCOPE_MODULE) {
2013-10-18 19:58:12 +01:00
if (!comp->is_repl) {
check_for_doc_string(comp, scope->pn);
}
2013-10-04 19:53:11 +01:00
compile_node(comp, scope->pn);
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
2013-10-04 19:53:11 +01:00
EMIT(return_value);
} else if (scope->kind == SCOPE_FUNCTION) {
assert(MP_PARSE_NODE_IS_STRUCT(scope->pn));
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)scope->pn;
assert(MP_PARSE_NODE_STRUCT_KIND(pns) == PN_funcdef);
2013-10-04 19:53:11 +01:00
// work out number of parameters, keywords and default parameters, and add them to the id_info array
// must be done before compiling the body so that arguments are numbered first (for LOAD_FAST etc)
if (comp->pass == MP_PASS_SCOPE) {
comp->have_star = false;
2013-10-04 19:53:11 +01:00
apply_to_single_or_list(comp, pns->nodes[1], PN_typedargslist, compile_scope_func_param);
#if MICROPY_EMIT_NATIVE
if (scope->emit_options == MP_EMIT_OPT_VIPER) {
// Compile return type; pns->nodes[2] is return/whole function annotation
scope->scope_flags |= compile_viper_type_annotation(comp, pns->nodes[2]) << MP_SCOPE_FLAG_VIPERRET_POS;
}
#endif // MICROPY_EMIT_NATIVE
2013-10-04 19:53:11 +01:00
}
compile_node(comp, pns->nodes[3]); // 3 is function body
// emit return if it wasn't the last opcode
if (!EMIT(last_emit_was_return_value)) {
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
2013-10-04 19:53:11 +01:00
EMIT(return_value);
}
} else if (scope->kind == SCOPE_LAMBDA) {
assert(MP_PARSE_NODE_IS_STRUCT(scope->pn));
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)scope->pn;
assert(MP_PARSE_NODE_STRUCT_NUM_NODES(pns) == 3);
2013-10-04 19:53:11 +01:00
// Set the source line number for the start of the lambda
EMIT_ARG(set_source_line, pns->source_line);
2013-10-04 19:53:11 +01:00
// work out number of parameters, keywords and default parameters, and add them to the id_info array
// must be done before compiling the body so that arguments are numbered first (for LOAD_FAST etc)
if (comp->pass == MP_PASS_SCOPE) {
comp->have_star = false;
2013-10-04 19:53:11 +01:00
apply_to_single_or_list(comp, pns->nodes[0], PN_varargslist, compile_scope_lambda_param);
}
compile_node(comp, pns->nodes[1]); // 1 is lambda body
2014-04-11 14:10:21 +01:00
// if the lambda is a generator, then we return None, not the result of the expression of the lambda
if (scope->scope_flags & MP_SCOPE_FLAG_GENERATOR) {
EMIT(pop_top);
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
}
2013-10-04 19:53:11 +01:00
EMIT(return_value);
} else if (SCOPE_IS_COMP_LIKE(scope->kind)) {
2013-10-04 19:53:11 +01:00
// a bit of a hack at the moment
assert(MP_PARSE_NODE_IS_STRUCT(scope->pn));
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)scope->pn;
assert(MP_PARSE_NODE_STRUCT_NUM_NODES(pns) == 2);
assert(MP_PARSE_NODE_IS_STRUCT_KIND(pns->nodes[1], PN_comp_for));
mp_parse_node_struct_t *pns_comp_for = (mp_parse_node_struct_t *)pns->nodes[1];
2013-10-04 19:53:11 +01:00
// We need a unique name for the comprehension argument (the iterator).
// CPython uses .0, but we should be able to use anything that won't
// clash with a user defined variable. Best to use an existing qstr,
// so we use the blank qstr.
qstr qstr_arg = MP_QSTR_;
if (comp->pass == MP_PASS_SCOPE) {
scope_find_or_add_id(comp->scope_cur, qstr_arg, ID_INFO_KIND_LOCAL);
scope->num_pos_args = 1;
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
mp_emit_common_use_qstr(&comp->emit_common, MP_QSTR__star_);
2013-10-04 19:53:11 +01:00
}
// Set the source line number for the start of the comprehension
EMIT_ARG(set_source_line, pns->source_line);
2013-10-04 19:53:11 +01:00
if (scope->kind == SCOPE_LIST_COMP) {
EMIT_ARG(build, 0, MP_EMIT_BUILD_LIST);
2013-10-04 19:53:11 +01:00
} else if (scope->kind == SCOPE_DICT_COMP) {
EMIT_ARG(build, 0, MP_EMIT_BUILD_MAP);
#if MICROPY_PY_BUILTINS_SET
2013-10-04 19:53:11 +01:00
} else if (scope->kind == SCOPE_SET_COMP) {
EMIT_ARG(build, 0, MP_EMIT_BUILD_SET);
#endif
2013-10-04 19:53:11 +01:00
}
// There are 4 slots on the stack for the iterator, and the first one is
// NULL to indicate that the second one points to the iterator object.
if (scope->kind == SCOPE_GEN_EXPR) {
MP_STATIC_ASSERT(MP_OBJ_ITER_BUF_NSLOTS == 4);
EMIT(load_null);
compile_load_id(comp, qstr_arg);
EMIT(load_null);
EMIT(load_null);
} else {
compile_load_id(comp, qstr_arg);
EMIT_ARG(get_iter, true);
}
compile_scope_comp_iter(comp, pns_comp_for, pns->nodes[0], 0);
2013-10-04 19:53:11 +01:00
if (scope->kind == SCOPE_GEN_EXPR) {
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
2013-10-04 19:53:11 +01:00
}
EMIT(return_value);
} else {
assert(scope->kind == SCOPE_CLASS);
assert(MP_PARSE_NODE_IS_STRUCT(scope->pn));
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)scope->pn;
assert(MP_PARSE_NODE_STRUCT_KIND(pns) == PN_classdef);
2013-10-04 19:53:11 +01:00
if (comp->pass == MP_PASS_SCOPE) {
scope_find_or_add_id(scope, MP_QSTR___class__, ID_INFO_KIND_LOCAL);
2013-10-04 19:53:11 +01:00
}
#if MICROPY_PY_SYS_SETTRACE
EMIT_ARG(set_source_line, pns->source_line);
#endif
compile_load_id(comp, MP_QSTR___name__);
compile_store_id(comp, MP_QSTR___module__);
EMIT_ARG(load_const_str, MP_PARSE_NODE_LEAF_ARG(pns->nodes[0])); // 0 is class name
compile_store_id(comp, MP_QSTR___qualname__);
2013-10-04 19:53:11 +01:00
check_for_doc_string(comp, pns->nodes[2]);
compile_node(comp, pns->nodes[2]); // 2 is class body
id_info_t *id = scope_find(scope, MP_QSTR___class__);
2013-10-04 19:53:11 +01:00
assert(id != NULL);
if (id->kind == ID_INFO_KIND_LOCAL) {
EMIT_ARG(load_const_tok, MP_TOKEN_KW_NONE);
2013-10-04 19:53:11 +01:00
} else {
EMIT_LOAD_FAST(MP_QSTR___class__, id->local_num);
2013-10-04 19:53:11 +01:00
}
EMIT(return_value);
}
EMIT(end_pass);
// make sure we match all the exception levels
assert(comp->cur_except_level == 0);
2013-10-05 23:17:28 +01:00
}
#if MICROPY_EMIT_INLINE_ASM
// requires 3 passes: SCOPE, CODE_SIZE, EMIT
STATIC void compile_scope_inline_asm(compiler_t *comp, scope_t *scope, pass_kind_t pass) {
2013-10-05 23:17:28 +01:00
comp->pass = pass;
comp->scope_cur = scope;
comp->next_label = 0;
2013-10-05 23:17:28 +01:00
if (scope->kind != SCOPE_FUNCTION) {
compile_syntax_error(comp, MP_PARSE_NODE_NULL, MP_ERROR_TEXT("inline assembler must be a function"));
2013-10-05 23:17:28 +01:00
return;
}
if (comp->pass > MP_PASS_SCOPE) {
EMIT_INLINE_ASM_ARG(start_pass, comp->pass, &comp->compile_error);
}
2013-10-05 23:17:28 +01:00
// get the function definition parse node
assert(MP_PARSE_NODE_IS_STRUCT(scope->pn));
mp_parse_node_struct_t *pns = (mp_parse_node_struct_t *)scope->pn;
assert(MP_PARSE_NODE_STRUCT_KIND(pns) == PN_funcdef);
2013-10-05 23:17:28 +01:00
// qstr f_id = MP_PARSE_NODE_LEAF_ARG(pns->nodes[0]); // function name
// parameters are in pns->nodes[1]
if (comp->pass == MP_PASS_CODE_SIZE) {
mp_parse_node_t *pn_params;
size_t n_params = mp_parse_node_extract_list(&pns->nodes[1], PN_typedargslist, &pn_params);
scope->num_pos_args = EMIT_INLINE_ASM_ARG(count_params, n_params, pn_params);
if (comp->compile_error != MP_OBJ_NULL) {
goto inline_asm_error;
}
}
2013-10-05 23:17:28 +01:00
// pns->nodes[2] is function return annotation
mp_uint_t type_sig = MP_NATIVE_TYPE_INT;
mp_parse_node_t pn_annotation = pns->nodes[2];
if (!MP_PARSE_NODE_IS_NULL(pn_annotation)) {
// nodes[2] can be null or a test-expr
if (MP_PARSE_NODE_IS_ID(pn_annotation)) {
qstr ret_type = MP_PARSE_NODE_LEAF_ARG(pn_annotation);
switch (ret_type) {
case MP_QSTR_object:
type_sig = MP_NATIVE_TYPE_OBJ;
break;
case MP_QSTR_bool:
type_sig = MP_NATIVE_TYPE_BOOL;
break;
case MP_QSTR_int:
type_sig = MP_NATIVE_TYPE_INT;
break;
case MP_QSTR_uint:
type_sig = MP_NATIVE_TYPE_UINT;
break;
default:
compile_syntax_error(comp, pn_annotation, MP_ERROR_TEXT("unknown type"));
return;
}
} else {
compile_syntax_error(comp, pn_annotation, MP_ERROR_TEXT("return annotation must be an identifier"));
}
}
2013-10-05 23:17:28 +01:00
mp_parse_node_t pn_body = pns->nodes[3]; // body
mp_parse_node_t *nodes;
size_t num = mp_parse_node_extract_list(&pn_body, PN_suite_block_stmts, &nodes);
2013-10-05 23:17:28 +01:00
for (size_t i = 0; i < num; i++) {
assert(MP_PARSE_NODE_IS_STRUCT(nodes[i]));
mp_parse_node_struct_t *pns2 = (mp_parse_node_struct_t *)nodes[i];
if (MP_PARSE_NODE_STRUCT_KIND(pns2) == PN_pass_stmt) {
// no instructions
continue;
} else if (MP_PARSE_NODE_STRUCT_KIND(pns2) != PN_expr_stmt) {
// not an instruction; error
not_an_instruction:
compile_syntax_error(comp, nodes[i], MP_ERROR_TEXT("expecting an assembler instruction"));
return;
}
// check structure of parse node
assert(MP_PARSE_NODE_IS_STRUCT(pns2->nodes[0]));
if (!MP_PARSE_NODE_IS_NULL(pns2->nodes[1])) {
goto not_an_instruction;
}
pns2 = (mp_parse_node_struct_t *)pns2->nodes[0];
if (MP_PARSE_NODE_STRUCT_KIND(pns2) != PN_atom_expr_normal) {
goto not_an_instruction;
}
if (!MP_PARSE_NODE_IS_ID(pns2->nodes[0])) {
goto not_an_instruction;
}
if (!MP_PARSE_NODE_IS_STRUCT_KIND(pns2->nodes[1], PN_trailer_paren)) {
goto not_an_instruction;
}
// parse node looks like an instruction
// get instruction name and args
qstr op = MP_PARSE_NODE_LEAF_ARG(pns2->nodes[0]);
pns2 = (mp_parse_node_struct_t *)pns2->nodes[1]; // PN_trailer_paren
mp_parse_node_t *pn_arg;
size_t n_args = mp_parse_node_extract_list(&pns2->nodes[0], PN_arglist, &pn_arg);
2013-10-05 23:17:28 +01:00
// emit instructions
if (op == MP_QSTR_label) {
if (!(n_args == 1 && MP_PARSE_NODE_IS_ID(pn_arg[0]))) {
compile_syntax_error(comp, nodes[i], MP_ERROR_TEXT("'label' requires 1 argument"));
2013-10-05 23:17:28 +01:00
return;
}
uint lab = comp_next_label(comp);
if (pass > MP_PASS_SCOPE) {
if (!EMIT_INLINE_ASM_ARG(label, lab, MP_PARSE_NODE_LEAF_ARG(pn_arg[0]))) {
compile_syntax_error(comp, nodes[i], MP_ERROR_TEXT("label redefined"));
return;
}
2013-10-05 23:17:28 +01:00
}
} else if (op == MP_QSTR_align) {
if (!(n_args == 1 && MP_PARSE_NODE_IS_SMALL_INT(pn_arg[0]))) {
compile_syntax_error(comp, nodes[i], MP_ERROR_TEXT("'align' requires 1 argument"));
return;
}
if (pass > MP_PASS_SCOPE) {
mp_asm_base_align((mp_asm_base_t *)comp->emit_inline_asm,
MP_PARSE_NODE_LEAF_SMALL_INT(pn_arg[0]));
}
} else if (op == MP_QSTR_data) {
if (!(n_args >= 2 && MP_PARSE_NODE_IS_SMALL_INT(pn_arg[0]))) {
compile_syntax_error(comp, nodes[i], MP_ERROR_TEXT("'data' requires at least 2 arguments"));
return;
}
if (pass > MP_PASS_SCOPE) {
mp_int_t bytesize = MP_PARSE_NODE_LEAF_SMALL_INT(pn_arg[0]);
for (uint j = 1; j < n_args; j++) {
if (!MP_PARSE_NODE_IS_SMALL_INT(pn_arg[j])) {
compile_syntax_error(comp, nodes[i], MP_ERROR_TEXT("'data' requires integer arguments"));
return;
}
mp_asm_base_data((mp_asm_base_t *)comp->emit_inline_asm,
bytesize, MP_PARSE_NODE_LEAF_SMALL_INT(pn_arg[j]));
}
}
2013-10-05 23:17:28 +01:00
} else {
if (pass > MP_PASS_SCOPE) {
EMIT_INLINE_ASM_ARG(op, op, n_args, pn_arg);
2013-10-05 23:17:28 +01:00
}
}
if (comp->compile_error != MP_OBJ_NULL) {
pns = pns2; // this is the parse node that had the error
goto inline_asm_error;
}
2013-10-05 23:17:28 +01:00
}
if (comp->pass > MP_PASS_SCOPE) {
EMIT_INLINE_ASM_ARG(end_pass, type_sig);
if (comp->pass == MP_PASS_EMIT) {
void *f = mp_asm_base_get_code((mp_asm_base_t *)comp->emit_inline_asm);
mp_emit_glue_assign_native(comp->scope_cur->raw_code, MP_CODE_NATIVE_ASM,
f, mp_asm_base_get_code_size((mp_asm_base_t *)comp->emit_inline_asm),
NULL,
#if MICROPY_PERSISTENT_CODE_SAVE
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
0,
0, 0, NULL,
#endif
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
0, comp->scope_cur->num_pos_args, type_sig);
}
}
if (comp->compile_error != MP_OBJ_NULL) {
// inline assembler had an error; set line for its exception
inline_asm_error:
comp->compile_error_line = pns->source_line;
}
2013-10-04 19:53:11 +01:00
}
#endif
2013-10-04 19:53:11 +01:00
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
STATIC void scope_compute_things(scope_t *scope, mp_emit_common_t *emit_common) {
// in MicroPython we put the *x parameter after all other parameters (except **y)
if (scope->scope_flags & MP_SCOPE_FLAG_VARARGS) {
id_info_t *id_param = NULL;
for (int i = scope->id_info_len - 1; i >= 0; i--) {
id_info_t *id = &scope->id_info[i];
if (id->flags & ID_FLAG_IS_STAR_PARAM) {
if (id_param != NULL) {
// swap star param with last param
id_info_t temp = *id_param;
*id_param = *id;
*id = temp;
}
break;
} else if (id_param == NULL && id->flags == ID_FLAG_IS_PARAM) {
id_param = id;
}
}
}
2013-10-04 19:53:11 +01:00
// in functions, turn implicit globals into explicit globals
2013-12-30 22:32:17 +00:00
// compute the index of each local
2013-10-04 19:53:11 +01:00
scope->num_locals = 0;
for (int i = 0; i < scope->id_info_len; i++) {
id_info_t *id = &scope->id_info[i];
if (scope->kind == SCOPE_CLASS && id->qst == MP_QSTR___class__) {
2013-10-04 19:53:11 +01:00
// __class__ is not counted as a local; if it's used then it becomes a ID_INFO_KIND_CELL
continue;
}
if (SCOPE_IS_FUNC_LIKE(scope->kind) && id->kind == ID_INFO_KIND_GLOBAL_IMPLICIT) {
2013-10-04 19:53:11 +01:00
id->kind = ID_INFO_KIND_GLOBAL_EXPLICIT;
}
py: Fix native functions so they run with their correct globals context. Prior to this commit a function compiled with the native decorator @micropython.native would not work correctly when accessing global variables, because the globals dict was not being set upon function entry. This commit fixes this problem by, upon function entry, setting as the current globals dict the globals dict context the function was defined within, as per normal Python semantics, and as bytecode does. Upon function exit the original globals dict is restored. In order to restore the globals dict when an exception is raised the native function must guard its internals with an nlr_push/nlr_pop pair. Because this push/pop is relatively expensive, in both C stack usage for the nlr_buf_t and CPU execution time, the implementation here optimises things as much as possible. First, the compiler keeps track of whether a function even needs to access global variables. Using this information the native emitter then generates three different kinds of code: 1. no globals used, no exception handlers: no nlr handling code and no setting of the globals dict. 2. globals used, no exception handlers: an nlr_buf_t is allocated on the C stack but it is not used if the globals dict is unchanged, saving execution time because nlr_push/nlr_pop don't need to run. 3. function has exception handlers, may use globals: an nlr_buf_t is allocated and nlr_push/nlr_pop are always called. In the end, native functions that don't access globals and don't have exception handlers will run more efficiently than those that do. Fixes issue #1573.
2018-09-13 13:03:48 +01:00
#if MICROPY_EMIT_NATIVE
if (id->kind == ID_INFO_KIND_GLOBAL_EXPLICIT) {
// This function makes a reference to a global variable
if (scope->emit_options == MP_EMIT_OPT_VIPER
&& mp_native_type_from_qstr(id->qst) >= MP_NATIVE_TYPE_INT) {
// A casting operator in viper mode, not a real global reference
} else {
scope->scope_flags |= MP_SCOPE_FLAG_REFGLOBALS;
}
py: Fix native functions so they run with their correct globals context. Prior to this commit a function compiled with the native decorator @micropython.native would not work correctly when accessing global variables, because the globals dict was not being set upon function entry. This commit fixes this problem by, upon function entry, setting as the current globals dict the globals dict context the function was defined within, as per normal Python semantics, and as bytecode does. Upon function exit the original globals dict is restored. In order to restore the globals dict when an exception is raised the native function must guard its internals with an nlr_push/nlr_pop pair. Because this push/pop is relatively expensive, in both C stack usage for the nlr_buf_t and CPU execution time, the implementation here optimises things as much as possible. First, the compiler keeps track of whether a function even needs to access global variables. Using this information the native emitter then generates three different kinds of code: 1. no globals used, no exception handlers: no nlr handling code and no setting of the globals dict. 2. globals used, no exception handlers: an nlr_buf_t is allocated on the C stack but it is not used if the globals dict is unchanged, saving execution time because nlr_push/nlr_pop don't need to run. 3. function has exception handlers, may use globals: an nlr_buf_t is allocated and nlr_push/nlr_pop are always called. In the end, native functions that don't access globals and don't have exception handlers will run more efficiently than those that do. Fixes issue #1573.
2018-09-13 13:03:48 +01:00
}
#endif
// params always count for 1 local, even if they are a cell
if (id->kind == ID_INFO_KIND_LOCAL || (id->flags & ID_FLAG_IS_PARAM)) {
id->local_num = scope->num_locals++;
2013-12-11 00:41:43 +00:00
}
}
unix-cpy: Remove unix-cpy. It's no longer needed. unix-cpy was originally written to get semantic equivalent with CPython without writing functional tests. When writing the initial implementation of uPy it was a long way between lexer and functional tests, so the half-way test was to make sure that the bytecode was correct. The idea was that if the uPy bytecode matched CPython 1-1 then uPy would be proper Python if the bytecodes acted correctly. And having matching bytecode meant that it was less likely to miss some deep subtlety in the Python semantics that would require an architectural change later on. But that is all history and it no longer makes sense to retain the ability to output CPython bytecode, because: 1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode changes from version to version, and seems to have changed quite a bit in 3.5. There's no point in changing the bytecode output to match CPython anymore. 2. uPy and CPy do different optimisations to the bytecode which makes it harder to match. 3. The bytecode tests are not run. They were never part of Travis and are not run locally anymore. 4. The EMIT_CPYTHON option needs a lot of extra source code which adds heaps of noise, especially in compile.c. 5. Now that there is an extensive test suite (which tests functionality) there is no need to match the bytecode. Some very subtle behaviour is tested with the test suite and passing these tests is a much better way to stay Python-language compliant, rather than trying to match CPy bytecode.
2015-08-14 12:24:11 +01:00
// compute the index of cell vars
2013-12-11 00:41:43 +00:00
for (int i = 0; i < scope->id_info_len; i++) {
id_info_t *id = &scope->id_info[i];
// in MicroPython the cells come right after the fast locals
2013-12-30 22:32:17 +00:00
// parameters are not counted here, since they remain at the start
// of the locals, even if they are cell vars
if (id->kind == ID_INFO_KIND_CELL && !(id->flags & ID_FLAG_IS_PARAM)) {
2013-12-30 22:32:17 +00:00
id->local_num = scope->num_locals;
scope->num_locals += 1;
2013-12-11 00:41:43 +00:00
}
}
unix-cpy: Remove unix-cpy. It's no longer needed. unix-cpy was originally written to get semantic equivalent with CPython without writing functional tests. When writing the initial implementation of uPy it was a long way between lexer and functional tests, so the half-way test was to make sure that the bytecode was correct. The idea was that if the uPy bytecode matched CPython 1-1 then uPy would be proper Python if the bytecodes acted correctly. And having matching bytecode meant that it was less likely to miss some deep subtlety in the Python semantics that would require an architectural change later on. But that is all history and it no longer makes sense to retain the ability to output CPython bytecode, because: 1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode changes from version to version, and seems to have changed quite a bit in 3.5. There's no point in changing the bytecode output to match CPython anymore. 2. uPy and CPy do different optimisations to the bytecode which makes it harder to match. 3. The bytecode tests are not run. They were never part of Travis and are not run locally anymore. 4. The EMIT_CPYTHON option needs a lot of extra source code which adds heaps of noise, especially in compile.c. 5. Now that there is an extensive test suite (which tests functionality) there is no need to match the bytecode. Some very subtle behaviour is tested with the test suite and passing these tests is a much better way to stay Python-language compliant, rather than trying to match CPy bytecode.
2015-08-14 12:24:11 +01:00
// compute the index of free vars
2013-12-11 00:41:43 +00:00
// make sure they are in the order of the parent scope
if (scope->parent != NULL) {
int num_free = 0;
for (int i = 0; i < scope->parent->id_info_len; i++) {
id_info_t *id = &scope->parent->id_info[i];
if (id->kind == ID_INFO_KIND_CELL || id->kind == ID_INFO_KIND_FREE) {
for (int j = 0; j < scope->id_info_len; j++) {
id_info_t *id2 = &scope->id_info[j];
if (id2->kind == ID_INFO_KIND_FREE && id->qst == id2->qst) {
assert(!(id2->flags & ID_FLAG_IS_PARAM)); // free vars should not be params
// in MicroPython the frees come first, before the params
2013-12-30 22:32:17 +00:00
id2->local_num = num_free;
2013-12-11 00:41:43 +00:00
num_free += 1;
}
}
}
2013-10-04 19:53:11 +01:00
}
// in MicroPython shift all other locals after the free locals
2013-12-30 22:32:17 +00:00
if (num_free > 0) {
for (int i = 0; i < scope->id_info_len; i++) {
id_info_t *id = &scope->id_info[i];
if (id->kind != ID_INFO_KIND_FREE || (id->flags & ID_FLAG_IS_PARAM)) {
2013-12-30 22:32:17 +00:00
id->local_num += num_free;
}
}
scope->num_pos_args += num_free; // free vars are counted as params for passing them into the function
2013-12-30 22:32:17 +00:00
scope->num_locals += num_free;
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
mp_emit_common_use_qstr(emit_common, MP_QSTR__star_);
2013-12-30 22:32:17 +00:00
}
2013-10-04 19:53:11 +01:00
}
}
#if !MICROPY_PERSISTENT_CODE_SAVE
STATIC
#endif
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
mp_compiled_module_t mp_compile_to_raw_code(mp_parse_tree_t *parse_tree, qstr source_file, bool is_repl, mp_module_context_t *context) {
// put compiler state on the stack, it's relatively small
compiler_t comp_state = {0};
compiler_t *comp = &comp_state;
2013-10-18 19:58:12 +01:00
comp->is_repl = is_repl;
comp->break_label = INVALID_LABEL;
comp->continue_label = INVALID_LABEL;
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
mp_emit_common_init(&comp->emit_common, source_file);
2013-10-04 19:53:11 +01:00
// create the module scope
#if MICROPY_EMIT_NATIVE
const uint emit_opt = MP_STATE_VM(default_emit_opt);
#else
const uint emit_opt = MP_EMIT_OPT_NONE;
#endif
scope_t *module_scope = scope_new_and_link(comp, SCOPE_MODULE, parse_tree->root, emit_opt);
// create standard emitter; it's used at least for MP_PASS_SCOPE
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
emit_t *emit_bc = emit_bc_new(&comp->emit_common);
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
// compile MP_PASS_SCOPE
comp->emit = emit_bc;
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
comp->emit_bc = emit_bc;
#if MICROPY_EMIT_NATIVE
comp->emit_method_table = &emit_bc_method_table;
#endif
2013-10-05 23:17:28 +01:00
uint max_num_labels = 0;
for (scope_t *s = comp->scope_head; s != NULL && comp->compile_error == MP_OBJ_NULL; s = s->next) {
#if MICROPY_EMIT_INLINE_ASM
if (s->emit_options == MP_EMIT_OPT_ASM) {
compile_scope_inline_asm(comp, s, MP_PASS_SCOPE);
} else
#endif
{
compile_scope(comp, s, MP_PASS_SCOPE);
// Check if any implicitly declared variables should be closed over
for (size_t i = 0; i < s->id_info_len; ++i) {
id_info_t *id = &s->id_info[i];
if (id->kind == ID_INFO_KIND_GLOBAL_IMPLICIT) {
scope_check_to_close_over(s, id);
}
}
2013-10-05 23:17:28 +01:00
}
// update maximim number of labels needed
if (comp->next_label > max_num_labels) {
max_num_labels = comp->next_label;
}
2013-10-04 19:53:11 +01:00
}
2013-10-05 23:17:28 +01:00
// compute some things related to scope and identifiers
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
bool has_native_code = false;
for (scope_t *s = comp->scope_head; s != NULL && comp->compile_error == MP_OBJ_NULL; s = s->next) {
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
#if MICROPY_EMIT_NATIVE
if (s->emit_options == MP_EMIT_OPT_NATIVE_PYTHON || s->emit_options == MP_EMIT_OPT_VIPER) {
has_native_code = true;
}
#endif
scope_compute_things(s, &comp->emit_common);
2013-10-04 19:53:11 +01:00
}
// set max number of labels now that it's calculated
emit_bc_set_max_num_labels(emit_bc, max_num_labels);
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
// finalise and allocate the constant table
mp_emit_common_finalise(&comp->emit_common, has_native_code);
// compile MP_PASS_STACK_SIZE, MP_PASS_CODE_SIZE, MP_PASS_EMIT
#if MICROPY_EMIT_NATIVE
emit_t *emit_native = NULL;
#endif
for (scope_t *s = comp->scope_head; s != NULL && comp->compile_error == MP_OBJ_NULL; s = s->next) {
#if MICROPY_EMIT_INLINE_ASM
if (s->emit_options == MP_EMIT_OPT_ASM) {
// inline assembly
if (comp->emit_inline_asm == NULL) {
comp->emit_inline_asm = ASM_EMITTER(new)(max_num_labels);
2013-10-05 23:17:28 +01:00
}
comp->emit = NULL;
comp->emit_inline_asm_method_table = ASM_EMITTER_TABLE;
compile_scope_inline_asm(comp, s, MP_PASS_CODE_SIZE);
#if MICROPY_EMIT_INLINE_XTENSA
// Xtensa requires an extra pass to compute size of l32r const table
// TODO this can be improved by calculating it during SCOPE pass
// but that requires some other structural changes to the asm emitters
#if MICROPY_DYNAMIC_COMPILER
if (mp_dynamic_compiler.native_arch == MP_NATIVE_ARCH_XTENSA)
#endif
{
compile_scope_inline_asm(comp, s, MP_PASS_CODE_SIZE);
}
#endif
if (comp->compile_error == MP_OBJ_NULL) {
compile_scope_inline_asm(comp, s, MP_PASS_EMIT);
}
} else
#endif
{
// choose the emit type
2013-10-05 23:17:28 +01:00
switch (s->emit_options) {
#if MICROPY_EMIT_NATIVE
case MP_EMIT_OPT_NATIVE_PYTHON:
case MP_EMIT_OPT_VIPER:
if (emit_native == NULL) {
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
emit_native = NATIVE_EMITTER(new)(&comp->emit_common, &comp->compile_error, &comp->next_label, max_num_labels);
}
comp->emit_method_table = NATIVE_EMITTER_TABLE;
comp->emit = emit_native;
2013-10-07 00:02:49 +01:00
break;
#endif // MICROPY_EMIT_NATIVE
2013-10-07 00:02:49 +01:00
2013-10-05 23:17:28 +01:00
default:
comp->emit = emit_bc;
#if MICROPY_EMIT_NATIVE
2013-10-05 23:17:28 +01:00
comp->emit_method_table = &emit_bc_method_table;
#endif
2013-10-05 23:17:28 +01:00
break;
}
// need a pass to compute stack size
compile_scope(comp, s, MP_PASS_STACK_SIZE);
// second last pass: compute code size
if (comp->compile_error == MP_OBJ_NULL) {
compile_scope(comp, s, MP_PASS_CODE_SIZE);
}
// final pass: emit code
if (comp->compile_error == MP_OBJ_NULL) {
compile_scope(comp, s, MP_PASS_EMIT);
}
}
2013-10-04 19:53:11 +01:00
}
if (comp->compile_error != MP_OBJ_NULL) {
// if there is no line number for the error then use the line
// number for the start of this scope
compile_error_set_line(comp, comp->scope_cur->pn);
// add a traceback to the exception using relevant source info
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
mp_obj_exception_add_traceback(comp->compile_error, source_file,
comp->compile_error_line, comp->scope_cur->simple_name);
}
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
// construct the global qstr/const table for this module
mp_compiled_module_t cm;
cm.rc = module_scope->raw_code;
cm.context = context;
#if MICROPY_PERSISTENT_CODE_SAVE
cm.has_native = has_native_code;
cm.n_qstr = comp->emit_common.qstr_map.used;
cm.n_obj = comp->emit_common.ct_cur_obj;
#endif
if (comp->compile_error == MP_OBJ_NULL) {
mp_emit_common_populate_module_context(&comp->emit_common, source_file, context);
#if MICROPY_DEBUG_PRINTERS
// now that the module context is valid, the raw codes can be printed
if (mp_verbose_flag >= 2) {
for (scope_t *s = comp->scope_head; s != NULL; s = s->next) {
mp_raw_code_t *rc = s->raw_code;
mp_bytecode_print(&mp_plat_print, rc, rc->fun_data, rc->fun_data_len, &cm.context->constants);
}
}
#endif
}
// free the emitters
emit_bc_free(emit_bc);
#if MICROPY_EMIT_NATIVE
if (emit_native != NULL) {
NATIVE_EMITTER(free)(emit_native);
}
#endif
#if MICROPY_EMIT_INLINE_ASM
if (comp->emit_inline_asm != NULL) {
ASM_EMITTER(free)(comp->emit_inline_asm);
}
#endif
// free the parse tree
mp_parse_tree_clear(parse_tree);
// free the scopes
for (scope_t *s = module_scope; s;) {
scope_t *next = s->next;
scope_free(s);
s = next;
}
2013-10-18 19:58:12 +01:00
if (comp->compile_error != MP_OBJ_NULL) {
nlr_raise(comp->compile_error);
}
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
return cm;
2013-10-04 19:53:11 +01:00
}
mp_obj_t mp_compile(mp_parse_tree_t *parse_tree, qstr source_file, bool is_repl) {
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
mp_module_context_t *context = m_new_obj(mp_module_context_t);
context->module.globals = mp_globals_get();
mp_compiled_module_t cm = mp_compile_to_raw_code(parse_tree, source_file, is_repl, context);
// return function that executes the outer module
py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>
2021-10-22 12:22:47 +01:00
return mp_make_function_from_raw_code(cm.rc, cm.context, NULL);
}
#endif // MICROPY_ENABLE_COMPILER