micropython/tests/cmdline/cmd_verbose.py.exp

File cmdline/cmd_verbose.py, code block '<module>' (descriptor: \.\+, bytecode \.\+ bytes)
Raw bytecode (code_info_size=\\d\+, bytecode_size=\\d\+):
 08 \.\+
########
\.\+63
arg names:
(N_STATE 2)
(N_EXC_STACK 0)
  bc=0 line=1
  bc=0 line=3
00 LOAD_NAME print
03 LOAD_CONST_SMALL_INT 1
04 CALL_FUNCTION n=1 nkw=0
06 POP_TOP
07 LOAD_CONST_NONE
08 RETURN_VALUE
1
mem: total=\\d\+, current=\\d\+, peak=\\d\+
stack: \\d\+ out of \\d\+
GC: total: \\d\+, used: \\d\+, free: \\d\+
 No. of 1-blocks: \\d\+, 2-blocks: \\d\+, max blk sz: \\d\+, max free sz: \\d\+
tests: Make cmdline tests more stable by using regex for matching. 2015-03-20 17:25:25 +00:00			`File cmdline/cmd_verbose.py, code block '<module>' (descriptor: \.\+, bytecode \.\+ bytes)`
			`Raw bytecode (code_info_size=\\d\+, bytecode_size=\\d\+):`
py: Compress first part of bytecode prelude. The start of the bytecode prelude contains 6 numbers telling the amount of stack needed for the Python values and exceptions, and the signature of the function. Prior to this patch these numbers were all encoded one after the other (2x variable unsigned integers, then 4x bytes), but using so many bytes is unnecessary. An entropy analysis of around 150,000 bytecode functions from the CPython standard library showed that the optimal Shannon coding would need about 7.1 bits on average to encode these 6 numbers, compared to the existing 48 bits. This patch attempts to get close to this optimal value by packing the 6 numbers into a single, varible-length unsigned integer via bit-wise interleaving. The interleaving scheme is chosen to minimise the average number of bytes needed, and at the same time keep the scheme simple enough so it can be implemented without too much overhead in code size or speed. The scheme requires about 10.5 bits on average to store the 6 numbers. As a result most functions which originally took 6 bytes to encode these 6 numbers now need only 1 byte (in 80% of cases). 2019-09-16 13:12:59 +01:00			`08 \.\+`
tests: Make cmdline tests more stable by using regex for matching. 2015-03-20 17:25:25 +00:00			`########`
tests: Update tests for changes to opcode ordering. 2019-09-02 12:35:17 +01:00			`\.\+63`
tests: Add ability to test uPy cmdline executable. This allows to test options passed to cmdline executable, as well as the behaviour of the REPL. 2015-03-13 10:58:34 +00:00			`arg names:`
			`(N_STATE 2)`
			`(N_EXC_STACK 0)`
py: Rework and compress second part of bytecode prelude. This patch compresses the second part of the bytecode prelude which contains the source file name, function name, source-line-number mapping and cell closure information. This part of the prelude now begins with a single varible length unsigned integer which encodes 2 numbers, being the byte-size of the following 2 sections in the header: the "source info section" and the "closure section". After decoding this variable unsigned integer it's possible to skip over one or both of these sections very easily. This scheme saves about 2 bytes for most functions compared to the original format: one in the case that there are no closure cells, and one because padding was eliminated. 2019-09-25 06:45:47 +01:00			`bc=0 line=1`
tests: Add ability to test uPy cmdline executable. This allows to test options passed to cmdline executable, as well as the behaviour of the REPL. 2015-03-13 10:58:34 +00:00			`bc=0 line=3`
all: Remove MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE. This commit removes all parts of code associated with the existing MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the -mcache-lookup-bc option to mpy-cross. This feature originally provided a significant performance boost for Unix, but wasn't able to be enabled for MCU targets (due to frozen bytecode), and added significant extra complexity to generating and distributing .mpy files. The equivalent performance gain is now provided by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has been enabled on the unix port in the previous commit). It's hard to provide precise performance numbers, but tests have been run on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V, xtensa) and they all generally agree on the qualitative improvements seen by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE. For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined with MAP_LOOKUP_CACHE is: diff of scores (higher is better) N=2000 M=2000 bccache -> attrmapcache diff diff% (error%) bm_chaos.py 13742.56 -> 13905.67 : +163.11 = +1.187% (+/-3.75%) bm_fannkuch.py 60.13 -> 61.34 : +1.21 = +2.012% (+/-2.11%) bm_fft.py 113083.20 -> 114793.68 : +1710.48 = +1.513% (+/-1.57%) bm_float.py 256552.80 -> 243908.29 : -12644.51 = -4.929% (+/-1.90%) bm_hexiom.py 521.93 -> 625.41 : +103.48 = +19.826% (+/-0.40%) bm_nqueens.py 197544.25 -> 217713.12 : +20168.87 = +10.210% (+/-3.01%) bm_pidigits.py 8072.98 -> 8198.75 : +125.77 = +1.558% (+/-3.22%) misc_aes.py 17283.45 -> 16480.52 : -802.93 = -4.646% (+/-0.82%) misc_mandel.py 99083.99 -> 128939.84 : +29855.85 = +30.132% (+/-5.88%) misc_pystone.py 83860.10 -> 82592.56 : -1267.54 = -1.511% (+/-2.27%) misc_raytrace.py 21490.40 -> 22227.23 : +736.83 = +3.429% (+/-1.88%) This shows that the new optimisations are at least as good as the existing inline-bytecode-caching, and are sometimes much better (because the new ones apply caching to a wider variety of map lookups). The new optimisations can also benefit code generated by the native emitter, because they apply to the runtime rather than the generated code. The improvement for the native emitter when LOAD_ATTR_FAST_PATH and MAP_LOOKUP_CACHE are enabled is (same Linux environment as above): diff of scores (higher is better) N=2000 M=2000 native -> nat-attrmapcache diff diff% (error%) bm_chaos.py 14130.62 -> 15464.68 : +1334.06 = +9.441% (+/-7.11%) bm_fannkuch.py 74.96 -> 76.16 : +1.20 = +1.601% (+/-1.80%) bm_fft.py 166682.99 -> 168221.86 : +1538.87 = +0.923% (+/-4.20%) bm_float.py 233415.23 -> 265524.90 : +32109.67 = +13.756% (+/-2.57%) bm_hexiom.py 628.59 -> 734.17 : +105.58 = +16.796% (+/-1.39%) bm_nqueens.py 225418.44 -> 232926.45 : +7508.01 = +3.331% (+/-3.10%) bm_pidigits.py 6322.00 -> 6379.52 : +57.52 = +0.910% (+/-5.62%) misc_aes.py 20670.10 -> 27223.18 : +6553.08 = +31.703% (+/-1.56%) misc_mandel.py 138221.11 -> 152014.01 : +13792.90 = +9.979% (+/-2.46%) misc_pystone.py 85032.14 -> 105681.44 : +20649.30 = +24.284% (+/-2.25%) misc_raytrace.py 19800.01 -> 23350.73 : +3550.72 = +17.933% (+/-2.79%) In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options: - are simpler; - take less code size; - are faster (generally); - work with code generated by the native emitter; - can be used on embedded targets with a small and constant RAM overhead; - allow the same .mpy bytecode to run on all targets. See #7680 for further discussion. And see also #7653 for a discussion about simplifying mpy-cross options. Signed-off-by: Jim Mussared <jim.mussared@gmail.com> 2021-09-06 03:28:06 +01:00			`00 LOAD_NAME print`
			`03 LOAD_CONST_SMALL_INT 1`
			`04 CALL_FUNCTION n=1 nkw=0`
			`06 POP_TOP`
			`07 LOAD_CONST_NONE`
			`08 RETURN_VALUE`
tests: Get cmdline verbose tests running again. The showbc function now no longer uses the system printf so works correctly. 2016-09-20 02:33:19 +01:00			`1`
tests: Make cmdline tests more stable by using regex for matching. 2015-03-20 17:25:25 +00:00			`mem: total=\\d\+, current=\\d\+, peak=\\d\+`
			`stack: \\d\+ out of \\d\+`
			`GC: total: \\d\+, used: \\d\+, free: \\d\+`
tests: Get cmdline verbose tests running again. The showbc function now no longer uses the system printf so works correctly. 2016-09-20 02:33:19 +01:00			`No. of 1-blocks: \\d\+, 2-blocks: \\d\+, max blk sz: \\d\+, max free sz: \\d\+`