micropython

Commit Graph

Author	SHA1	Message	Date
stijn	795370ca23	py/bc.h: Fix C++ compilation of public API. Casts between unrelated types must be explicit. Regression in `f2040bfc7e`	2022-03-01 16:17:30 +11:00
Damien George	f2040bfc7e	py: Rework bytecode and .mpy file format to be mostly static data. Background: .mpy files are precompiled .py files, built using mpy-cross, that contain compiled bytecode functions (and can also contain machine code). The benefit of using an .mpy file over a .py file is that they are faster to import and take less memory when importing. They are also smaller on disk. But the real benefit of .mpy files comes when they are frozen into the firmware. This is done by loading the .mpy file during compilation of the firmware and turning it into a set of big C data structures (the job of mpy-tool.py), which are then compiled and downloaded into the ROM of a device. These C data structures can be executed in-place, ie directly from ROM. This makes importing even faster because there is very little to do, and also means such frozen modules take up much less RAM (because their bytecode stays in ROM). The downside of frozen code is that it requires recompiling and reflashing the entire firmware. This can be a big barrier to entry, slows down development time, and makes it harder to do OTA updates of frozen code (because the whole firmware must be updated). This commit attempts to solve this problem by providing a solution that sits between loading .mpy files into RAM and freezing them into the firmware. The .mpy file format has been reworked so that it consists of data and bytecode which is mostly static and ready to run in-place. If these new .mpy files are located in flash/ROM which is memory addressable, the .mpy file can be executed (mostly) in-place. With this approach there is still a small amount of unpacking and linking of the .mpy file that needs to be done when it's imported, but it's still much better than loading an .mpy from disk into RAM (although not as good as freezing .mpy files into the firmware). The main trick to make static .mpy files is to adjust the bytecode so any qstrs that it references now go through a lookup table to convert from local qstr number in the module to global qstr number in the firmware. That means the bytecode does not need linking/rewriting of qstrs when it's loaded. Instead only a small qstr table needs to be built (and put in RAM) at import time. This means the bytecode itself is static/constant and can be used directly if it's in addressable memory. Also the qstr string data in the .mpy file, and some constant object data, can be used directly. Note that the qstr table is global to the module (ie not per function). In more detail, in the VM what used to be (schematically): qst = DECODE_QSTR_VALUE; is now (schematically): idx = DECODE_QSTR_INDEX; qst = qstr_table[idx]; That allows the bytecode to be fixed at compile time and not need relinking/rewriting of the qstr values. Only qstr_table needs to be linked when the .mpy is loaded. Incidentally, this helps to reduce the size of bytecode because what used to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices. If the module uses the same qstr more than two times then the bytecode is smaller than before. The following changes are measured for this commit compared to the previous (the baseline): - average 7%-9% reduction in size of .mpy files - frozen code size is reduced by about 5%-7% - importing .py files uses about 5% less RAM in total - importing .mpy files uses about 4% less RAM in total - importing .py and .mpy files takes about the same time as before The qstr indirection in the bytecode has only a small impact on VM performance. For stm32 on PYBv1.0 the performance change of this commit is: diff of scores (higher is better) N=100 M=100 baseline -> this-commit diff diff% (error%) bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%) bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%) bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%) bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%) bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%) bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%) bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%) core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%) core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%) core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%) core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%) misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%) misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%) misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%) misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%) viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%) viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%) viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%) viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%) viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%) viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%) And for unix on x64: diff of scores (higher is better) N=2000 M=2000 baseline -> this-commit diff diff% (error%) bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%) bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%) bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%) bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%) bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%) bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%) bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%) misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%) misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%) misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%) misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%) The code size change is (firmware with a lot of frozen code benefits the most): bare-arm: +396 +0.697% minimal x86: +1595 +0.979% [incl +32(data)] unix x64: +2408 +0.470% [incl +800(data)] unix nanbox: +1396 +0.309% [incl -96(data)] stm32: -1256 -0.318% PYBV10 cc3200: +288 +0.157% esp8266: -260 -0.037% GENERIC esp32: -216 -0.014% GENERIC[incl -1072(data)] nrf: +116 +0.067% pca10040 rp2: -664 -0.135% PICO samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS As part of this change the .mpy file format version is bumped to version 6. And mpy-tool.py has been improved to provide a good visualisation of the contents of .mpy files. In summary: this commit changes the bytecode to use qstr indirection, and reworks the .mpy file format to be simpler and allow .mpy files to be executed in-place. Performance is not impacted too much. Eventually it will be possible to store such .mpy files in a linear, read-only, memory- mappable filesystem so they can be executed from flash/ROM. This will essentially be able to replace frozen code for most applications. Signed-off-by: Damien George <damien@micropython.org>	2022-02-24 18:08:43 +11:00
Damien George	cfd08448a1	py: Mark unused arguments from bytecode decoding macros. Signed-off-by: Damien George <damien@micropython.org>	2021-06-25 10:58:22 +10:00
Damien George	85f2b239d8	py/showbc: Pass in an mp_print_t struct to all bytecode-print functions. So the output can be redirected if needed. Signed-off-by: Damien George <damien@micropython.org>	2020-09-11 17:22:28 +10:00
stijn	84fa3312cf	all: Format code to add space after C++-style comment start. Note: the uncrustify configuration is explicitly set to 'add' instead of 'force' in order not to alter the comments which use extra spaces after // as a means of indenting text for clarity.	2020-04-23 11:24:25 +10:00
Damien George	69661f3343	all: Reformat C and Python source code with tools/codeformat.py. This is run with uncrustify 0.70.1, and black 19.10b0.	2020-02-28 10:33:03 +11:00
Damien George	b47e155bd0	py/persistentcode: Add ability to relocate loaded native code. Implements text, rodata and bss generalised relocations, as well as generic qstr-object linking. This allows importing dynamic native modules on all supported architectures in a unified way.	2019-12-12 20:15:28 +11:00
Damien George	c8c0fd4ca3	py: Rework and compress second part of bytecode prelude. This patch compresses the second part of the bytecode prelude which contains the source file name, function name, source-line-number mapping and cell closure information. This part of the prelude now begins with a single varible length unsigned integer which encodes 2 numbers, being the byte-size of the following 2 sections in the header: the "source info section" and the "closure section". After decoding this variable unsigned integer it's possible to skip over one or both of these sections very easily. This scheme saves about 2 bytes for most functions compared to the original format: one in the case that there are no closure cells, and one because padding was eliminated.	2019-10-01 12:26:22 +10:00
Damien George	b5ebfadbd6	py: Compress first part of bytecode prelude. The start of the bytecode prelude contains 6 numbers telling the amount of stack needed for the Python values and exceptions, and the signature of the function. Prior to this patch these numbers were all encoded one after the other (2x variable unsigned integers, then 4x bytes), but using so many bytes is unnecessary. An entropy analysis of around 150,000 bytecode functions from the CPython standard library showed that the optimal Shannon coding would need about 7.1 bits on average to encode these 6 numbers, compared to the existing 48 bits. This patch attempts to get close to this optimal value by packing the 6 numbers into a single, varible-length unsigned integer via bit-wise interleaving. The interleaving scheme is chosen to minimise the average number of bytes needed, and at the same time keep the scheme simple enough so it can be implemented without too much overhead in code size or speed. The scheme requires about 10.5 bits on average to store the 6 numbers. As a result most functions which originally took 6 bytes to encode these 6 numbers now need only 1 byte (in 80% of cases).	2019-10-01 12:26:22 +10:00
Damien George	81d04a0200	py: Add n_state to mp_code_state_t struct. This value is used often enough that it is better to cache it instead of decode it each time.	2019-10-01 12:26:22 +10:00
Damien George	4c5e1a0368	py/bc: Change mp_code_state_t.exc_sp to exc_sp_idx. Change from a pointer to an index, to make space in mp_code_state_t.	2019-10-01 12:26:22 +10:00
Damien George	1d7afcce49	py/bc: Remove comments referring to obsolete currently_in_except_block. It was made obsolete in `6f9e3ff719`	2019-10-01 12:26:22 +10:00
Damien George	1f7202d122	py/bc: Replace big opcode format table with simple macro.	2019-09-26 15:27:11 +10:00
Milan Rossa	310b3d1b81	py: Integrate sys.settrace feature into the VM and runtime. This commit adds support for sys.settrace, allowing to install Python handlers to trace execution of Python code. The interface follows CPython as closely as possible. The feature is disabled by default and can be enabled via MICROPY_PY_SYS_SETTRACE.	2019-08-30 16:44:12 +10:00
Damien George	dbf35d3da3	py/bc: Factor out code to get bytecode line number info into new func.	2019-08-30 16:43:46 +10:00
Paul Sokolovsky	016d9a40fe	various: Add and update my copyright line based on git history. For modules I initially created or made substantial contributions to.	2019-05-17 18:04:15 +10:00
Damien George	992a6e1dea	py/persistentcode: Pack qstrs directly in bytecode to reduce mpy size. Instead of emitting two bytes in the bytecode for where the linked qstr should be written to, it is now replaced by the actual qstr data, or a reference into the qstr window. Reduces mpy file size by about 10%.	2019-03-05 16:27:34 +11:00
Damien George	a3dc1b1957	all: Remove inclusion of internal py header files. Header files that are considered internal to the py core and should not normally be included directly are: py/nlr.h - internal nlr configuration and declarations py/bc0.h - contains bytecode macro definitions py/runtime0.h - contains basic runtime enums Instead, the top-level header files to include are one of: py/obj.h - includes runtime0.h and defines everything to use the mp_obj_t type py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h, and defines everything to use the general runtime support functions Additional, specific headers (eg py/objlist.h) can be included if needed.	2017-10-04 12:37:50 +11:00
Alexander Steffen	55f33240f3	all: Use the name MicroPython consistently in comments There were several different spellings of MicroPython present in comments, when there should be only one.	2017-07-31 18:35:40 +10:00
Alexander Steffen	299bc62586	all: Unify header guard usage. The code conventions suggest using header guards, but do not define how those should look like and instead point to existing files. However, not all existing files follow the same scheme, sometimes omitting header guards altogether, sometimes using non-standard names, making it easy to accidentally pick a "wrong" example. This commit ensures that all header files of the MicroPython project (that were not simply copied from somewhere else) follow the same pattern, that was already present in the majority of files, especially in the py folder. The rules are as follows. Naming convention: * start with the words MICROPY_INCLUDED * contain the full path to the file * replace special characters with _ In addition, there are no empty lines before #ifndef, between #ifndef and one empty line before #endif. #endif is followed by a comment containing the name of the guard macro. py/grammar.h cannot use header guards by design, since it has to be included multiple times in a single C file. Several other files also do not need header guards as they are only used internally and guaranteed to be included only once: * MICROPY_MPHALPORT_H * mpconfigboard.h * mpconfigport.h * mpthreadport.h * pin_defs_.h qstrdefs*.h	2017-07-18 11:57:39 +10:00
Damien George	a8a5d1e8c8	py: Provide mp_decode_uint_skip() to help reduce stack usage. Taking the address of a local variable leads to increased stack usage, so the mp_decode_uint_skip() function is added to reduce the need for taking addresses. The changes in this patch reduce stack usage of a Python call by 8 bytes on ARM Thumb, by 16 bytes on non-windowing Xtensa archs, and by 16 bytes on x86-64. Code size is also slightly reduced on most archs by around 32 bytes.	2017-06-09 13:36:33 +10:00
Damien George	5640e6dacd	py: Provide mp_decode_uint_value to help optimise stack usage. This has a noticeable improvement on x86-64 and Thumb2 archs, where stack usage is reduced by 2 machine words in the VM.	2017-03-17 16:50:19 +11:00
Damien George	71a3d6ec3b	py: Reduce size of mp_code_state_t structure. Instead of caching data that is constant (code_info, const_table and n_state), store just a pointer to the underlying function object from which this data can be derived. This helps reduce stack usage for the case when the mp_code_state_t structure is stored on the stack, as well as heap usage when it's stored on the heap. The downside is that the VM becomes a little more complex because it now needs to derive the data from the underlying function object. But this doesn't impact the performance by much (if at all) because most of the decoding of data is done outside the main opcode loop. Measurements using pystone show that little to no performance is lost. This patch also fixes a nasty bug whereby the bytecode can be reclaimed by the GC during execution. With this patch there is always a pointer to the function object held by the VM during execution, since it's stored in the mp_code_state_t structure.	2017-03-17 16:39:13 +11:00
Damien George	cc4c1adf6e	py/showbc: Make sure to set the const_table before printing bytecode.	2017-01-27 12:34:09 +11:00
Damien George	f4ee9f8853	py/bc.h: Rename _mp_code_state to _mp_code_state_t. This rename was missed in the previous patch.	2016-08-27 23:23:51 +10:00
Damien George	581a59a456	py: Rename struct mp_code_state to mp_code_state_t. Also at _t to mp_exc_stack pre-declaration in struct typedef.	2016-08-27 23:21:00 +10:00
Damien George	1d899e1783	py/bc: Use size_t instead of mp_uint_t to count size of state and args.	2015-12-17 12:33:42 +00:00
Damien George	999cedb90f	py: Wrap all obj-ptr conversions in MP_OBJ_TO_PTR/MP_OBJ_FROM_PTR. This allows the mp_obj_t type to be configured to something other than a pointer-sized primitive type. This patch also includes additional changes to allow the code to compile when sizeof(mp_uint_t) != sizeof(void*), such as using size_t instead of mp_uint_t, and various casts.	2015-11-29 14:25:35 +00:00
Damien George	254cfa6c31	py: Use uintptr_t instead of mp_uint_t in MP_TAGPTR_* macros.	2015-11-29 14:25:04 +00:00
Damien George	9f6976b74e	py: Make mp_setup_code_state take concrete pointer for func arg.	2015-11-29 14:25:04 +00:00
Damien George	d8c834c95d	py: Add MICROPY_PERSISTENT_CODE_LOAD/SAVE to load/save bytecode. MICROPY_PERSISTENT_CODE must be enabled, and then enabling MICROPY_PERSISTENT_CODE_LOAD/SAVE (either or both) will allow loading and/or saving of code (at the moment just bytecode) from/to a .mpy file.	2015-11-13 12:49:18 +00:00
Damien George	713ea1800d	py: Add constant table to bytecode. Contains just argument names at the moment but makes it easy to add arbitrary constants.	2015-11-13 12:49:18 +00:00
Damien George	3a3db4dcf0	py: Put all bytecode state (arg count, etc) in bytecode.	2015-11-13 12:49:18 +00:00
Damien George	9b7f583b0c	py: Reorganise bytecode layout so it's more structured, easier to edit.	2015-11-13 12:49:18 +00:00
Paul Sokolovsky	2039757b85	vm: Initial support for calling bytecode functions w/o C stack ("stackless").	2015-04-03 00:03:07 +03:00
Paul Sokolovsky	53e5e0fa28	py: Make old_globals part of mp_code_state structure. Conceptually it is part of code state, so let it be allocated in the same way as the rest of state.	2015-02-15 19:24:15 +03:00
Damien George	51dfcb4bb7	py: Move to guarded includes, everywhere in py/ core. Addresses issue #1022.	2015-01-01 20:32:09 +00:00
Paul Sokolovsky	343266ea51	showbc: Refactor to allow inline instruction printing.	2014-12-27 05:01:21 +02:00
Damien George	74eb44c392	py: Reduce size of VM exception stack element by 1 machine word. This optimisation reduces the VM exception stack element (mp_exc_stack_t) by 1 word, by using bit 1 of a pointer to store whether the opcode was a FINALLY or WITH opcode. This optimisation was pending, waiting for maturity of the exception handling code, which has now proven itself. Saves 1 machine word RAM for each exception (4->3 words per exception). Increases stmhal code by 4 bytes, and decreases unix x64 code by 32 bytes.	2014-12-22 12:49:57 +00:00
Damien George	1084b0f9c2	py: Store bytecode arg names in bytecode (were in own array). This saves a lot of RAM for 2 reasons: 1. For functions that don't have default values, var args or var kw args (which is a large number of functions in the general case), the mp_obj_fun_bc_t type now fits in 1 GC block (previously needed 2 because of the extra pointer to point to the arg_names array). So this saves 16 bytes per function (32 bytes on 64-bit machines). 2. Combining separate memory regions generally saves RAM because the unused bytes at the end of the GC block are saved for 1 of the blocks (since that block doesn't exist on its own anymore). So generally this saves 8 bytes per function. Tested by importing lots of modules: - 64-bit Linux gave about an 8% RAM saving for 86k of used RAM. - pyboard gave about a 6% RAM saving for 31k of used RAM.	2014-10-25 20:23:13 +01:00
Damien George	39dc145478	py: Change [u]int to mp_[u]int_t in qstr.[ch], and some other places. This should pretty much resolve issue #50.	2014-10-03 19:52:22 +01:00
Damien George	42f3de924b	py: Convert [u]int to mp_[u]int_t where appropriate. Addressing issue #50.	2014-10-03 17:44:14 +00:00
Damien George	b534e1b9f1	py: Use variable length encoded uints in more places in bytecode. Code-info size, block name, source name, n_state and n_exc_stack now use variable length encoded uints. This saves 7-9 bytes per bytecode function for most functions.	2014-09-04 14:44:01 +01:00
Damien George	3c658a4e75	py: Fix bug where GC collected native/viper/asm function data. Because (for Thumb) a function pointer has the LSB set, pointers to dynamic functions in RAM (eg native, viper or asm functions) were not being traced by the GC. This patch is a comprehensive fix for this. Addresses issue #820.	2014-08-24 16:28:17 +01:00
Damien George	40f3c02682	Rename machine_(u)int_t to mp_(u)int_t. See discussion in issue #50.	2014-07-03 13:25:24 +01:00
Paul Sokolovsky	f77d0c5bb3	objgenerator: First iteration of refactor to use mp_setup_code_state().	2014-06-11 20:43:47 +03:00
Damien George	aabd83ea20	py: Merge mp_execute_bytecode into fun_bc_call. This reduces stack usage by 16 words (64 bytes) for stmhal/ port. See issue #640.	2014-06-07 14:16:08 +01:00
Paul Sokolovsky	a4ac5b9f05	showbc: Make sure it's possible to trace MAKE_FUNCTION arg to actual bytecode.	2014-06-03 01:26:51 +03:00
Paul Sokolovsky	b4ebad3310	vm: Factor out structure with code execution state and pass it around. This improves stack usage in callers to mp_execute_bytecode2, and is step forward towards unifying execution interface for function and generators (which is important because generators don't even support full forms of arguments passing (keywords, etc.)).	2014-05-31 18:22:01 +03:00
Damien George	3417bc2f25	py: Rename byte_code to bytecode everywhere. bytecode is the more widely used. See issue #590.	2014-05-10 10:36:38 +01:00

1 2

72 Commits