micropython/py/makeqstrdata.py

"""
Process raw qstr file and output qstr data with length, hash and data bytes.

This script works with Python 2.6, 2.7, 3.3 and 3.4.
"""

from __future__ import print_function

import re
import sys

# codepoint2name is different in Python 2 to Python 3
import platform
if platform.python_version_tuple()[0] == '2':
    from htmlentitydefs import codepoint2name
elif platform.python_version_tuple()[0] == '3':
    from html.entities import codepoint2name
codepoint2name[ord('-')] = 'hyphen';

# add some custom names to map characters that aren't in HTML
codepoint2name[ord(' ')] = 'space'
codepoint2name[ord('\'')] = 'squot'
codepoint2name[ord(',')] = 'comma'
codepoint2name[ord('.')] = 'dot'
codepoint2name[ord(':')] = 'colon'
codepoint2name[ord('/')] = 'slash'
codepoint2name[ord('%')] = 'percent'
codepoint2name[ord('#')] = 'hash'
codepoint2name[ord('(')] = 'paren_open'
codepoint2name[ord(')')] = 'paren_close'
codepoint2name[ord('[')] = 'bracket_open'
codepoint2name[ord(']')] = 'bracket_close'
codepoint2name[ord('{')] = 'brace_open'
codepoint2name[ord('}')] = 'brace_close'
codepoint2name[ord('*')] = 'star'
codepoint2name[ord('!')] = 'bang'
codepoint2name[ord('\\')] = 'backslash'

# this must match the equivalent function in qstr.c
def compute_hash(qstr, bytes_hash):
    hash = 5381
    for char in qstr:
        hash = (hash * 33) ^ ord(char)
    # Make sure that valid hash is never zero, zero means "hash not computed"
    return (hash & ((1 << (8 * bytes_hash)) - 1)) or 1

def do_work(infiles):
    # read the qstrs in from the input files
    qcfgs = {}
    qstrs = {}
    for infile in infiles:
        with open(infile, 'rt') as f:
            for line in f:
                line = line.strip()

                # is this a config line?
                match = re.match(r'^QCFG\((.+), (.+)\)', line)
                if match:
                    value = match.group(2)
                    if value[0] == '(' and value[-1] == ')':
                        # strip parenthesis from config value
                        value = value[1:-1]
                    qcfgs[match.group(1)] = value
                    continue

                # is this a QSTR line?
                match = re.match(r'^Q\((.*)\)$', line)
                if not match:
                    continue

                # get the qstr value
                qstr = match.group(1)
                ident = re.sub(r'[^A-Za-z0-9_]', lambda s: "_" + codepoint2name[ord(s.group(0))] + "_", qstr)

                # don't add duplicates
                if ident in qstrs:
                    continue

                # add the qstr to the list, with order number to retain original order in file
                qstrs[ident] = (len(qstrs), ident, qstr)

    # get config variables
    cfg_bytes_len = int(qcfgs['BYTES_IN_LEN'])
    cfg_bytes_hash = int(qcfgs['BYTES_IN_HASH'])
    cfg_max_len = 1 << (8 * cfg_bytes_len)

    # print out the starte of the generated C header file
    print('// This file was automatically generated by makeqstrdata.py')
    print('')

    # add NULL qstr with no hash or data
    print('QDEF(MP_QSTR_NULL, (const byte*)"%s%s" "")' % ('\\x00' * cfg_bytes_hash, '\\x00' * cfg_bytes_len))

    # go through each qstr and print it out
    for order, ident, qstr in sorted(qstrs.values(), key=lambda x: x[0]):
        qhash = compute_hash(qstr, cfg_bytes_hash)
        # Calculate len of str, taking escapes into account
        qlen = len(qstr.replace("\\\\", "-").replace("\\", ""))
        qdata = qstr.replace('"', '\\"')
        if qlen >= cfg_max_len:
            print('qstr is too long:', qstr)
            assert False
        qlen_str = ('\\x%02x' * cfg_bytes_len) % tuple(((qlen >> (8 * i)) & 0xff) for i in range(cfg_bytes_len))
        qhash_str = ('\\x%02x' * cfg_bytes_hash) % tuple(((qhash >> (8 * i)) & 0xff) for i in range(cfg_bytes_hash))
        print('QDEF(MP_QSTR_%s, (const byte*)"%s%s" "%s")' % (ident, qhash_str, qlen_str, qdata))

if __name__ == "__main__":
    do_work(sys.argv[1:])
py: Get makeqstrdata.py and makeversionhdr.py running under Python 2.6. These scripts should run under as wide a range of Python versions as possible. 2015-05-30 23:11:16 +01:00			`"""`
			`Process raw qstr file and output qstr data with length, hash and data bytes.`

			`This script works with Python 2.6, 2.7, 3.3 and 3.4.`
			`"""`

Fix makeqstrdata.py to work in Python 2.7 2014-03-10 07:07:35 +00:00			`from __future__ import print_function`

Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00			`import re`
makeqstrdata: print error to stderr. 2014-03-08 15:03:25 +00:00			`import sys`
Retain file order of qstr definitions. Want common qstrs to be first in the list so they have the lowest ids, so that in the byte code they take up the least room. 2014-01-24 22:22:00 +00:00
			`# codepoint2name is different in Python 2 to Python 3`
			`import platform`
			`if platform.python_version_tuple()[0] == '2':`
			`from htmlentitydefs import codepoint2name`
			`elif platform.python_version_tuple()[0] == '3':`
			`from html.entities import codepoint2name`
objstr: Add str.encode() and bytes.decode() methods. These largely duplicate str() & bytes() constructors' functionality, but can be used to achieve Python2 compatibility. 2014-04-13 03:28:46 +01:00			`codepoint2name[ord('-')] = 'hyphen';`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00
Change mp_obj_type_t.name from const char * to qstr. Ultimately all static strings should be qstr. This entry in the type structure is only used for printing error messages (to tell the type of the bad argument), and printing objects that don't supply a .print method. 2014-02-15 11:34:50 +00:00			`# add some custom names to map characters that aren't in HTML`
py/makeqstrdata.py: Add more allowed qstr characters; escape quot. 2015-01-11 14:16:24 +00:00			`codepoint2name[ord(' ')] = 'space'`
			`codepoint2name[ord('\'')] = 'squot'`
			`codepoint2name[ord(',')] = 'comma'`
Change mp_obj_type_t.name from const char * to qstr. Ultimately all static strings should be qstr. This entry in the type structure is only used for printing error messages (to tell the type of the bad argument), and printing objects that don't supply a .print method. 2014-02-15 11:34:50 +00:00			`codepoint2name[ord('.')] = 'dot'`
stm: Initialize sys.path with ["0:/", "0:/src", "0:/lib"]. This is compatible with what search path was before sys.path refactor, with addition of module library path ("0:/lib"). 2014-02-17 22:06:37 +00:00			`codepoint2name[ord(':')] = 'colon'`
			`codepoint2name[ord('/')] = 'slash'`
py: Add hex builtin function. A one-liner, added especially for @pfalcon :) 2014-04-15 12:42:52 +01:00			`codepoint2name[ord('%')] = 'percent'`
py: Fix builtin hex to print prefix. I was too hasty. Still a one-liner though. 2014-04-15 12:50:21 +01:00			`codepoint2name[ord('#')] = 'hash'`
py/makeqstrdata.py: Add more allowed qstr characters; escape quot. 2015-01-11 14:16:24 +00:00			`codepoint2name[ord('(')] = 'paren_open'`
			`codepoint2name[ord(')')] = 'paren_close'`
			`codepoint2name[ord('[')] = 'bracket_open'`
			`codepoint2name[ord(']')] = 'bracket_close'`
py: Add builtin functions bin and oct, and some tests for them. 2014-04-15 22:03:55 +01:00			`codepoint2name[ord('{')] = 'brace_open'`
			`codepoint2name[ord('}')] = 'brace_close'`
py: Add '' qstr for 'import '; use blank qstr for comprehension arg. 2014-04-27 19:23:46 +01:00			`codepoint2name[ord('*')] = 'star'`
py/makeqstrdata.py: Add more allowed qstr characters; escape quot. 2015-01-11 14:16:24 +00:00			`codepoint2name[ord('!')] = 'bang'`
makeqstrdata.py: Add support for strings with backslash escapes. 2015-04-01 23:09:24 +01:00			`codepoint2name[ord('\\')] = 'backslash'`
Change mp_obj_type_t.name from const char * to qstr. Ultimately all static strings should be qstr. This entry in the type structure is only used for printing error messages (to tell the type of the bad argument), and printing objects that don't supply a .print method. 2014-02-15 11:34:50 +00:00
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00			`# this must match the equivalent function in qstr.c`
py: Make qstr hash size configurable, defaults to 2 bytes. This patch makes configurable, via MICROPY_QSTR_BYTES_IN_HASH, the number of bytes used for a qstr hash. It was originally fixed at 2 bytes, and now defaults to 2 bytes. Setting it to 1 byte will save ROM and RAM at a small expense of hash collisions. 2015-07-20 12:03:13 +01:00			`def compute_hash(qstr, bytes_hash):`
py: Replace naive and teribble hash function with djb2. 2014-03-25 15:27:15 +00:00			`hash = 5381`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00			`for char in qstr:`
py: Replace naive and teribble hash function with djb2. 2014-03-25 15:27:15 +00:00			`hash = (hash * 33) ^ ord(char)`
Bring the C and Python compute_hash functions into consistency 2014-06-06 21:55:27 +01:00			`# Make sure that valid hash is never zero, zero means "hash not computed"`
py: Make qstr hash size configurable, defaults to 2 bytes. This patch makes configurable, via MICROPY_QSTR_BYTES_IN_HASH, the number of bytes used for a qstr hash. It was originally fixed at 2 bytes, and now defaults to 2 bytes. Setting it to 1 byte will save ROM and RAM at a small expense of hash collisions. 2015-07-20 12:03:13 +01:00			`return (hash & ((1 << (8 * bytes_hash)) - 1)) or 1`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00
			`def do_work(infiles):`
			`# read the qstrs in from the input files`
py: Add qstr cfg capability; generate QSTR_NULL and QSTR_ from script. 2015-01-11 17:52:45 +00:00			`qcfgs = {}`
Allow qstr's with non-ident chars, construct good identifier for them. Also, add qstr's for string appearing in unix REPL loop, gross effect being less allocations for each command run. 2014-01-23 22:22:00 +00:00			`qstrs = {}`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00			`for infile in infiles:`
			`with open(infile, 'rt') as f:`
			`for line in f:`
py: Add qstr cfg capability; generate QSTR_NULL and QSTR_ from script. 2015-01-11 17:52:45 +00:00			`line = line.strip()`

			`# is this a config line?`
			`match = re.match(r'^QCFG\((.+), (.+)\)', line)`
			`if match:`
			`value = match.group(2)`
			`if value[0] == '(' and value[-1] == ')':`
			`# strip parenthesis from config value`
			`value = value[1:-1]`
			`qcfgs[match.group(1)] = value`
			`continue`

More relaxed parsing of preprocessed qstr header The original parsing would error out on any C declarations that are not typedefs or extern variables. This limits what can go in mpconfig.h and mpconfigport.h, as they are included in qstr.h. For instance even a function declaration would be rejected and including system headers is a complete no-go. That seems too limiting for a global config header, so makeqstrdata now ignores everything that does not match a qstr definition. 2014-05-02 20:10:47 +01:00			`# is this a QSTR line?`
py: Add qstr cfg capability; generate QSTR_NULL and QSTR_ from script. 2015-01-11 17:52:45 +00:00			`match = re.match(r'^Q\((.*)\)$', line)`
More relaxed parsing of preprocessed qstr header The original parsing would error out on any C declarations that are not typedefs or extern variables. This limits what can go in mpconfig.h and mpconfigport.h, as they are included in qstr.h. For instance even a function declaration would be rejected and including system headers is a complete no-go. That seems too limiting for a global config header, so makeqstrdata now ignores everything that does not match a qstr definition. 2014-05-02 20:10:47 +01:00			`if not match:`
py: Modify makeqstrdata to recognise better the output of CPP. 2014-04-13 13:16:51 +01:00			`continue`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00
			`# get the qstr value`
			`qstr = match.group(1)`
Allow qstr's with non-ident chars, construct good identifier for them. Also, add qstr's for string appearing in unix REPL loop, gross effect being less allocations for each command run. 2014-01-23 22:22:00 +00:00			`ident = re.sub(r'[^A-Za-z0-9_]', lambda s: "_" + codepoint2name[ord(s.group(0))] + "_", qstr)`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00
			`# don't add duplicates`
Allow qstr's with non-ident chars, construct good identifier for them. Also, add qstr's for string appearing in unix REPL loop, gross effect being less allocations for each command run. 2014-01-23 22:22:00 +00:00			`if ident in qstrs:`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00			`continue`

Retain file order of qstr definitions. Want common qstrs to be first in the list so they have the lowest ids, so that in the byte code they take up the least room. 2014-01-24 22:22:00 +00:00			`# add the qstr to the list, with order number to retain original order in file`
Revert "makeqstrdata.py: Add support for conditionally defined qstrs." This reverts commit acb133d1b1a68847bd85c545312c3e221a6f7c0b. Conditionals will be suported using C preprocessor. 2014-04-11 18:36:08 +01:00			`qstrs[ident] = (len(qstrs), ident, qstr)`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00
py: Add MICROPY_QSTR_BYTES_IN_LEN config option, defaulting to 1. This new config option sets how many fixed-number-of-bytes to use to store the length of each qstr. Previously this was hard coded to 2, but, as per issue #1056, this is considered overkill since no-one needs identifiers longer than 255 bytes. With this patch the number of bytes for the length is configurable, and defaults to 1 byte. The configuration option filters through to the makeqstrdata.py script. Code size savings going from 2 to 1 byte: - unix x64 down by 592 bytes - stmhal down by 1148 bytes - bare-arm down by 284 bytes Also has RAM savings, and will be slightly more efficient in execution. 2015-01-11 22:27:30 +00:00			`# get config variables`
			`cfg_bytes_len = int(qcfgs['BYTES_IN_LEN'])`
py: Make qstr hash size configurable, defaults to 2 bytes. This patch makes configurable, via MICROPY_QSTR_BYTES_IN_HASH, the number of bytes used for a qstr hash. It was originally fixed at 2 bytes, and now defaults to 2 bytes. Setting it to 1 byte will save ROM and RAM at a small expense of hash collisions. 2015-07-20 12:03:13 +01:00			`cfg_bytes_hash = int(qcfgs['BYTES_IN_HASH'])`
py: Add MICROPY_QSTR_BYTES_IN_LEN config option, defaulting to 1. This new config option sets how many fixed-number-of-bytes to use to store the length of each qstr. Previously this was hard coded to 2, but, as per issue #1056, this is considered overkill since no-one needs identifiers longer than 255 bytes. With this patch the number of bytes for the length is configurable, and defaults to 1 byte. The configuration option filters through to the makeqstrdata.py script. Code size savings going from 2 to 1 byte: - unix x64 down by 592 bytes - stmhal down by 1148 bytes - bare-arm down by 284 bytes Also has RAM savings, and will be slightly more efficient in execution. 2015-01-11 22:27:30 +00:00			`cfg_max_len = 1 << (8 * cfg_bytes_len)`

			`# print out the starte of the generated C header file`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00			`print('// This file was automatically generated by makeqstrdata.py')`
Fix malformed generated file when using python 2.7 2014-01-21 23:28:27 +00:00			`print('')`
py: Add MICROPY_QSTR_BYTES_IN_LEN config option, defaulting to 1. This new config option sets how many fixed-number-of-bytes to use to store the length of each qstr. Previously this was hard coded to 2, but, as per issue #1056, this is considered overkill since no-one needs identifiers longer than 255 bytes. With this patch the number of bytes for the length is configurable, and defaults to 1 byte. The configuration option filters through to the makeqstrdata.py script. Code size savings going from 2 to 1 byte: - unix x64 down by 592 bytes - stmhal down by 1148 bytes - bare-arm down by 284 bytes Also has RAM savings, and will be slightly more efficient in execution. 2015-01-11 22:27:30 +00:00
py: Add qstr cfg capability; generate QSTR_NULL and QSTR_ from script. 2015-01-11 17:52:45 +00:00			`# add NULL qstr with no hash or data`
py: Make qstr hash size configurable, defaults to 2 bytes. This patch makes configurable, via MICROPY_QSTR_BYTES_IN_HASH, the number of bytes used for a qstr hash. It was originally fixed at 2 bytes, and now defaults to 2 bytes. Setting it to 1 byte will save ROM and RAM at a small expense of hash collisions. 2015-07-20 12:03:13 +01:00			`print('QDEF(MP_QSTR_NULL, (const byte)"%s%s" "")' % ('\\x00' cfg_bytes_hash, '\\x00' * cfg_bytes_len))`
py: Add MICROPY_QSTR_BYTES_IN_LEN config option, defaulting to 1. This new config option sets how many fixed-number-of-bytes to use to store the length of each qstr. Previously this was hard coded to 2, but, as per issue #1056, this is considered overkill since no-one needs identifiers longer than 255 bytes. With this patch the number of bytes for the length is configurable, and defaults to 1 byte. The configuration option filters through to the makeqstrdata.py script. Code size savings going from 2 to 1 byte: - unix x64 down by 592 bytes - stmhal down by 1148 bytes - bare-arm down by 284 bytes Also has RAM savings, and will be slightly more efficient in execution. 2015-01-11 22:27:30 +00:00
			`# go through each qstr and print it out`
Revert "makeqstrdata.py: Add support for conditionally defined qstrs." This reverts commit acb133d1b1a68847bd85c545312c3e221a6f7c0b. Conditionals will be suported using C preprocessor. 2014-04-11 18:36:08 +01:00			`for order, ident, qstr in sorted(qstrs.values(), key=lambda x: x[0]):`
py: Make qstr hash size configurable, defaults to 2 bytes. This patch makes configurable, via MICROPY_QSTR_BYTES_IN_HASH, the number of bytes used for a qstr hash. It was originally fixed at 2 bytes, and now defaults to 2 bytes. Setting it to 1 byte will save ROM and RAM at a small expense of hash collisions. 2015-07-20 12:03:13 +01:00			`qhash = compute_hash(qstr, cfg_bytes_hash)`
makeqstrdata.py: Add support for strings with backslash escapes. 2015-04-01 23:09:24 +01:00			`# Calculate len of str, taking escapes into account`
			`qlen = len(qstr.replace("\\\\", "-").replace("\\", ""))`
py/makeqstrdata.py: Add more allowed qstr characters; escape quot. 2015-01-11 14:16:24 +00:00			`qdata = qstr.replace('"', '\\"')`
py: Add MICROPY_QSTR_BYTES_IN_LEN config option, defaulting to 1. This new config option sets how many fixed-number-of-bytes to use to store the length of each qstr. Previously this was hard coded to 2, but, as per issue #1056, this is considered overkill since no-one needs identifiers longer than 255 bytes. With this patch the number of bytes for the length is configurable, and defaults to 1 byte. The configuration option filters through to the makeqstrdata.py script. Code size savings going from 2 to 1 byte: - unix x64 down by 592 bytes - stmhal down by 1148 bytes - bare-arm down by 284 bytes Also has RAM savings, and will be slightly more efficient in execution. 2015-01-11 22:27:30 +00:00			`if qlen >= cfg_max_len:`
			`print('qstr is too long:', qstr)`
			`assert False`
py/makeqstrdata.py: Make it work again with both Python2 and Python3. 2015-01-11 22:40:38 +00:00			`qlen_str = ('\\x%02x' * cfg_bytes_len) % tuple(((qlen >> (8 * i)) & 0xff) for i in range(cfg_bytes_len))`
py: Make qstr hash size configurable, defaults to 2 bytes. This patch makes configurable, via MICROPY_QSTR_BYTES_IN_HASH, the number of bytes used for a qstr hash. It was originally fixed at 2 bytes, and now defaults to 2 bytes. Setting it to 1 byte will save ROM and RAM at a small expense of hash collisions. 2015-07-20 12:03:13 +01:00			`qhash_str = ('\\x%02x' * cfg_bytes_hash) % tuple(((qhash >> (8 * i)) & 0xff) for i in range(cfg_bytes_hash))`
			`print('QDEF(MP_QSTR_%s, (const byte*)"%s%s" "%s")' % (ident, qhash_str, qlen_str, qdata))`
Revamp qstrs: they now include length and hash. Can now have null bytes in strings. Can define ROM qstrs per port using qstrdefsport.h 2014-01-21 21:40:13 +00:00
			`if __name__ == "__main__":`
py: Get makeqstrdata.py and makeversionhdr.py running under Python 2.6. These scripts should run under as wide a range of Python versions as possible. 2015-05-30 23:11:16 +01:00			`do_work(sys.argv[1:])`