Commit Graph

84 Commits

Author SHA1 Message Date
Jim Mussared 154b4eb354 py: Implement "common word" compression scheme for error messages.
The idea here is that there's a moderate amount of ROM used up by exception
text.  Obviously we try to keep the messages short, and the code can enable
terse errors, but it still adds up.  Listed below is the total string data
size for various ports:

    bare-arm 2860
    minimal 2876
    stm32 8926  (PYBV11)
    cc3200 3751
    esp32 5721

This commit implements compression of these strings.  It takes advantage of
the fact that these strings are all 7-bit ascii and extracts the top 128
frequently used words from the messages and stores them packed (dropping
their null-terminator), then uses (0x80 | index) inside strings to refer to
these common words.  Spaces are automatically added around words, saving
more bytes.  This happens transparently in the build process, mirroring the
steps that are used to generate the QSTR data.  The MP_COMPRESSED_ROM_TEXT
macro wraps any literal string that should compressed, and it's
automatically decompressed in mp_decompress_rom_string.

There are many schemes that could be used for the compression, and some are
included in py/makecompresseddata.py for reference (space, Huffman, ngram,
common word).  Results showed that the common-word compression gets better
results.  This is before counting the increased cost of the Huffman
decoder.  This might be slightly counter-intuitive, but this data is
extremely repetitive at a word-level, and the byte-level entropy coder
can't quite exploit that as efficiently.  Ideally one would combine both
approaches, but for now the common-word approach is the one that is used.

For additional comparison, the size of the raw data compressed with gzip
and zlib is calculated, as a sort of proxy for a lower entropy bound.  With
this scheme we come within 15% on stm32, and 30% on bare-arm (i.e. we use
x% more bytes than the data compressed with gzip -- not counting the code
overhead of a decoder, and how this would be hypothetically implemented).

The feature is disabled by default and can be enabled by setting
MICROPY_ROM_TEXT_COMPRESSION at the Makefile-level.
2020-04-05 14:20:57 +10:00
Damien George 69661f3343 all: Reformat C and Python source code with tools/codeformat.py.
This is run with uncrustify 0.70.1, and black 19.10b0.
2020-02-28 10:33:03 +11:00
Damien George ce39c958ef py: Factor out definition of mp_float_union_t to one location. 2020-02-18 13:04:36 +11:00
Yonatan Goldschmidt df5c3bd976 py/unicode: Add unichar_isalnum(). 2020-01-12 13:03:57 +11:00
Damien George c0a1de3c21 py/misc.h: Rename _MP_STRINGIFY to not use leading underscore in ident.
Macro identifiers with a leading underscore are reserved.
2019-05-09 17:11:33 +10:00
Damien George 43d08d6dd6 py/misc.h: Add MP_STATIC_ASSERT macro to do static assertions. 2018-05-18 23:31:00 +10:00
Damien George d4b55eff44 py/misc.h: Remove unused count_lead_ones() inline function.
This function was never used for unicode/utf8 handling code, or anything
else, so remove it to keep things clean.
2018-03-13 13:23:30 +11:00
Damien George 19aee9438a py/unicode: Clean up utf8 funcs and provide non-utf8 inline versions.
This patch provides inline versions of the utf8 helper functions for the
case when unicode is disabled (MICROPY_PY_BUILTINS_STR_UNICODE set to 0).
This saves code size.

The unichar_charlen function is also renamed to utf8_charlen to match the
other utf8 helper functions, and the signature of this function is adjusted
for consistency (const char* -> const byte*, mp_uint_t -> size_t).
2018-02-14 18:19:22 +11:00
Paul Sokolovsky 75d3c046da py/misc.h: Add m_new_obj_var_with_finaliser().
Similar to existing m_new_obj_with_finaliser().
2017-12-04 11:05:49 +02:00
Damien George ca21aed0a1 py: Make m_malloc_fail() have void return type, since it doesn't return. 2017-08-31 17:00:14 +10:00
Alexander Steffen 55f33240f3 all: Use the name MicroPython consistently in comments
There were several different spellings of MicroPython present in comments,
when there should be only one.
2017-07-31 18:35:40 +10:00
Alexander Steffen 299bc62586 all: Unify header guard usage.
The code conventions suggest using header guards, but do not define how
those should look like and instead point to existing files. However, not
all existing files follow the same scheme, sometimes omitting header guards
altogether, sometimes using non-standard names, making it easy to
accidentally pick a "wrong" example.

This commit ensures that all header files of the MicroPython project (that
were not simply copied from somewhere else) follow the same pattern, that
was already present in the majority of files, especially in the py folder.

The rules are as follows.

Naming convention:
* start with the words MICROPY_INCLUDED
* contain the full path to the file
* replace special characters with _

In addition, there are no empty lines before #ifndef, between #ifndef and
one empty line before #endif. #endif is followed by a comment containing
the name of the guard macro.

py/grammar.h cannot use header guards by design, since it has to be
included multiple times in a single C file. Several other files also do not
need header guards as they are only used internally and guaranteed to be
included only once:
* MICROPY_MPHALPORT_H
* mpconfigboard.h
* mpconfigport.h
* mpthreadport.h
* pin_defs_*.h
* qstrdefs*.h
2017-07-18 11:57:39 +10:00
Damien George 2138258fea py/runtime: Mark m_malloc_fail() as NORETURN. 2017-07-04 02:12:36 +10:00
Ville Skyttä ca16c38210 various: Spelling fixes 2017-05-29 11:36:05 +03:00
Paul Sokolovsky 25f44c19f1 cc3200: Re-add support for UART REPL (MICROPY_STDIO_UART setting).
UART REPL support was lost in os.dupterm() refactorings, etc. As
os.dupterm() is there, implement UART REPL support at the high level -
if MICROPY_STDIO_UART is set, make default boot.py contain os.dupterm()
call for a UART. This means that changing MICROPY_STDIO_UART value will
also require erasing flash on a module to force boot.py re-creation.
2016-12-27 01:05:37 +03:00
Paul Sokolovsky cf96be60dc py/misc.h: Typo fix in comment. 2016-12-27 01:05:30 +03:00
Damien George 824f5c5a32 py/vstr: Combine vstr_new_size with vstr_new since they are rarely used.
Now there is just one function to allocate a new vstr, namely vstr_new
(in addition to vstr_init etc).  The caller of this function should know
what initial size to allocate for the buffer, or at least have some policy
or config option, instead of leaving it to a default (as it was before).
2016-10-14 16:46:34 +11:00
Damien George 5da0d29d3c py/vstr: Remove vstr.had_error flag and inline basic vstr functions.
The vstr.had_error flag was a relic from the very early days which assumed
that the malloc functions (eg m_new, m_renew) returned NULL if they failed
to allocate.  But that's no longer the case: these functions will raise an
exception if they fail.

Since it was impossible for had_error to be set, this patch introduces no
change in behaviour.

An alternative option would be to change the malloc calls to the _maybe
variants, which return NULL instead of raising, but then a lot of code
will need to explicitly check if the vstr had an error and raise if it
did.

The code-size savings for this patch are, in bytes: bare-arm:188,
minimal:456, unix(NDEBUG,x86-64):368, stmhal:228, esp8266:360.
2016-09-19 12:28:55 +10:00
Alex March 69d9e7d27d py/repl: Check for an identifier char after the keyword.
- As described in the #1850.
- Add cmdline tests.
2016-02-17 08:56:15 +00:00
Paul Sokolovsky 946f870e3c py/misc.h: Include stdint.h only once (unconditionally at the top). 2015-12-08 02:23:58 +02:00
Paul Sokolovsky 9f001b09a8 py/misc.h: Include stdint.h, as large share of code now depends on it. 2015-12-07 20:08:07 +02:00
Damien George 999cedb90f py: Wrap all obj-ptr conversions in MP_OBJ_TO_PTR/MP_OBJ_FROM_PTR.
This allows the mp_obj_t type to be configured to something other than a
pointer-sized primitive type.

This patch also includes additional changes to allow the code to compile
when sizeof(mp_uint_t) != sizeof(void*), such as using size_t instead of
mp_uint_t, and various casts.
2015-11-29 14:25:35 +00:00
Damien George ade9a05236 py: Improve allocation policy of qstr data.
Previous to this patch all interned strings lived in their own malloc'd
chunk.  On average this wastes N/2 bytes per interned string, where N is
the number-of-bytes for a quanta of the memory allocator (16 bytes on 32
bit archs).

With this patch interned strings are concatenated into the same malloc'd
chunk when possible.  Such chunks are enlarged inplace when possible,
and shrunk to fit when a new chunk is needed.

RAM savings with this patch are highly varied, but should always show an
improvement (unless only 3 or 4 strings are interned).  New version
typically uses about 70% of previous memory for the qstr data, and can
lead to savings of around 10% of total memory footprint of a running
script.

Costs about 120 bytes code size on Thumb2 archs (depends on how many
calls to gc_realloc are made).
2015-07-14 22:56:32 +01:00
Dave Hylands 3ad94d6072 extmod: Add ubinascii.unhexlify
This also pulls out hex_digit from py/lexer.c and makes unichar_hex_digit
2015-05-20 09:29:22 +01:00
Damien George 7f9d1d6ab9 py: Overhaul and simplify printf/pfenv mechanism.
Previous to this patch the printing mechanism was a bit of a tangled
mess.  This patch attempts to consolidate printing into one interface.

All (non-debug) printing now uses the mp_print* family of functions,
mainly mp_printf.  All these functions take an mp_print_t structure as
their first argument, and this structure defines the printing backend
through the "print_strn" function of said structure.

Printing from the uPy core can reach the platform-defined print code via
two paths: either through mp_sys_stdout_obj (defined pert port) in
conjunction with mp_stream_write; or through the mp_plat_print structure
which uses the MP_PLAT_PRINT_STRN macro to define how string are printed
on the platform.  The former is only used when MICROPY_PY_IO is defined.

With this new scheme printing is generally more efficient (less layers
to go through, less arguments to pass), and, given an mp_print_t*
structure, one can call mp_print_str for efficiency instead of
mp_printf("%s", ...).  Code size is also reduced by around 200 bytes on
Thumb2 archs.
2015-04-16 14:30:16 +00:00
Damien George d891452a73 py: Add MICROPY_MALLOC_USES_ALLOCATED_SIZE to allow simpler malloc API. 2015-03-03 21:23:13 +00:00
Damien George 827b0f747b py: Change vstr_null_terminate -> vstr_null_terminated_str, returns str. 2015-01-29 13:57:23 +00:00
Damien George 0d3cb6726d py: Change vstr so that it doesn't null terminate buffer by default.
This cleans up vstr so that it's a pure "variable buffer", and the user
can decide whether they need to add a terminating null byte.  In most
places where vstr is used, the vstr did not need to be null terminated
and so this patch saves code size, a tiny bit of RAM, and makes vstr
usage more efficient.  When null termination is needed it must be
done explicitly using vstr_null_terminate.
2015-01-28 23:43:01 +00:00
Damien George 16677ce311 py: Be more precise about unicode type and disabled unicode behaviour. 2015-01-28 14:07:11 +00:00
David Steinberg c585ad1020 py: Move mp_float_t related defines to misc.h 2015-01-24 20:54:28 +00:00
Damien George 05005f679e py: Remove mp_obj_str_builder and use vstr instead.
With this patch str/bytes construction is streamlined.  Always use a
vstr to build a str/bytes object.  If the size is known beforehand then
use vstr_init_len to allocate only required memory.  Otherwise use
vstr_init and the vstr will grow as needed.  Then use
mp_obj_new_str_from_vstr to create a str/bytes object using the vstr
memory.

Saves code ROM: 68 bytes on stmhal, 108 bytes on bare-arm, and 336 bytes
on unix x64.
2015-01-21 23:18:02 +00:00
Damien George 0b9ee86133 py: Add mp_obj_new_str_from_vstr, and use it where relevant.
This patch allows to reuse vstr memory when creating str/bytes object.
This improves memory usage.

Also saves code ROM: 128 bytes on stmhal, 92 bytes on bare-arm, and 88
bytes on unix x64.
2015-01-21 23:17:27 +00:00
Damien George 9ddbe291c4 py: Add include guards to mpconfig,misc,qstr,obj,runtime,parsehelper. 2014-12-29 01:02:19 +00:00
Damien George 9bf5f2857d py: Add further checks for failed malloc in lexer init functions. 2014-10-09 16:53:37 +01:00
bvernoux f6f248b464 Fix error: unknown type name 'size_t' 2014-09-28 09:54:35 +02:00
Damien George b0261341d3 py: For malloc and vstr functions, use size_t exclusively for int type.
It seems most sensible to use size_t for measuring "number of bytes" in
malloc and vstr functions (since that's what size_t is for).  We don't
use mp_uint_t because malloc and vstr are not Micro Python specific.
2014-09-25 15:49:26 +01:00
Paul Sokolovsky 564e46452d py: Add generic helper to align a pointer. 2014-07-12 15:57:28 +03:00
Damien George 40f3c02682 Rename machine_(u)int_t to mp_(u)int_t.
See discussion in issue #50.
2014-07-03 13:25:24 +01:00
Paul Sokolovsky 9e215fa4c2 py: Make unichar_charlen() accept/return machine_uint_t. 2014-06-28 23:15:29 +03:00
Damien George e04a44e2f6 py: Small comments, name changes, use of machine_int_t. 2014-06-28 10:27:23 +01:00
Paul Sokolovsky ce81312d8a misc: Add count_lead_ones() function, useful for UTF-8 handling. 2014-06-27 00:04:20 +03:00
Chris Angelico c88987c1af py: Implement basic unicode functions. 2014-06-27 00:04:17 +03:00
Emmanuel Blot f6932d6506 Prefix ARRAY_SIZE with micropython prefix MP_ 2014-06-19 18:54:34 +02:00
Paul Sokolovsky 7ddbd1bee7 unicode: Add trivial implementation of unichar_charlen(). 2014-06-14 06:30:30 +03:00
Paul Sokolovsky b0bb458810 unicode: String API is const byte*.
We still have that char vs byte dichotomy, but majority of string operations
now use byte.
2014-06-14 06:22:11 +03:00
Kim Bauters a3f4b83018 add methods isspace(), isalpha(), isdigit(), isupper() and islower() to str 2014-05-31 07:30:57 +01:00
Paul Sokolovsky 6913521911 objstr: Implement .lower() and .upper(). 2014-05-10 19:49:07 +03:00
Damien George 1b82e9af5c py: Improve handling of memory error in parser.
Parser shouldn't raise exceptions, so needs to check when memory
allocation fails.  This patch does that for the initial set up of the
parser state.

Also, we now put the parser object on the stack.  It's small enough to
go there instead of on the heap.

This partially addresses issue #558.
2014-05-10 17:36:41 +01:00
Paul Sokolovsky 6b344d7816 py, unix: Add -v option, print bytecode dump if used.
This will work if MICROPY_DEBUG_PRINTERS is defined, which is only for
unix/windows ports. This makes it convenient to user uPy normally, but
easily get bytecode dump on the spot if needed, without constant recompiles
back and forth.

TODO: Add more useful debug output, adjust verbosity level on which
specifically bytecode dump happens.
2014-05-05 00:57:00 +03:00
Damien George 04b9147e15 Add license header to (almost) all files.
Blanket wide to all .c and .h files.  Some files originating from ST are
difficult to deal with (license wise) so it was left out of those.

Also merged modpyb.h, modos.h, modstm.h and modtime.h in stmhal/.
2014-05-03 23:27:38 +01:00