docs/develop/qstr.rst: Add documentation for string interning.
This commit is contained in:
parent
25a9bccdee
commit
4ddd46e6cf
|
@ -10,3 +10,4 @@ See the `getting started guide
|
||||||
:maxdepth: 1
|
:maxdepth: 1
|
||||||
|
|
||||||
cmodules.rst
|
cmodules.rst
|
||||||
|
qstr.rst
|
||||||
|
|
|
@ -0,0 +1,112 @@
|
||||||
|
MicroPython string interning
|
||||||
|
============================
|
||||||
|
|
||||||
|
MicroPython uses `string interning`_ to save both RAM and ROM. This avoids
|
||||||
|
having to store duplicate copies of the same string. Primarily, this applies to
|
||||||
|
identifiers in your code, as something like a function or variable name is very
|
||||||
|
likely to appear in multiple places in the code. In MicroPython an interned
|
||||||
|
string is called a QSTR (uniQue STRing).
|
||||||
|
|
||||||
|
A QSTR value (with type ``qstr``) is a index into a linked list of QSTR pools.
|
||||||
|
QSTRs store their length and a hash of their contents for fast comparison during
|
||||||
|
the de-duplication process. All bytecode operations that work with strings use
|
||||||
|
a QSTR argument.
|
||||||
|
|
||||||
|
Compile-time QSTR generation
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
In the MicroPython C code, any strings that should be interned in the final
|
||||||
|
firmware are written as ``MP_QSTR_Foo``. At compile time this will evaluate to
|
||||||
|
a ``qstr`` value that points to the index of ``"Foo"`` in the QSTR pool.
|
||||||
|
|
||||||
|
A multi-step process in the ``Makefile`` makes this work. In summary this
|
||||||
|
process has three parts:
|
||||||
|
|
||||||
|
1. Find all ``MP_QSTR_Foo`` tokens in the code.
|
||||||
|
|
||||||
|
2. Generate a static QSTR pool containing all the string data (including lengths
|
||||||
|
and hashes).
|
||||||
|
|
||||||
|
3. Replace all ``MP_QSTR_Foo`` (via the preprocessor) with their corresponding
|
||||||
|
index.
|
||||||
|
|
||||||
|
``MP_QSTR_Foo`` tokens are searched for in two sources:
|
||||||
|
|
||||||
|
1. All files referenced in ``$(SRC_QSTR)``. This is all C code (i.e. ``py``,
|
||||||
|
``extmod``, ``ports/stm32``) but not including third-party code such as
|
||||||
|
``lib``.
|
||||||
|
|
||||||
|
2. Additional ``$(QSTR_GLOBAL_DEPENDENCIES)`` (which includes ``mpconfig*.h``).
|
||||||
|
|
||||||
|
*Note:* ``frozen_mpy.c`` (generated by mpy-tool.py) has its own QSTR generation
|
||||||
|
and pool.
|
||||||
|
|
||||||
|
Some additional strings that can't be expressed using the ``MP_QSTR_Foo`` syntax
|
||||||
|
(e.g. they contain non-alphanumeric characters) are explicitly provided in
|
||||||
|
``qstrdefs.h`` and ``qstrdefsport.h`` via the ``$(QSTR_DEFS)`` variable.
|
||||||
|
|
||||||
|
Processing happens in the following stages:
|
||||||
|
|
||||||
|
1. ``qstr.i.last`` is the concatenation of putting every single input file
|
||||||
|
through the C pre-processor. This means that any conditionally disabled code
|
||||||
|
will be removed, and macros expanded. This means we don't add strings to the
|
||||||
|
pool that won't be used in the final firmware. Because at this stage (thanks
|
||||||
|
to the ``NO_QSTR`` macro added by ``QSTR_GEN_EXTRA_CFLAGS``) there is no
|
||||||
|
definition for ``MP_QSTR_Foo`` it passes through this stage unaffected. This
|
||||||
|
file also includes comments from the preprocessor that include line number
|
||||||
|
information. Note that this step only uses files that have changed, which
|
||||||
|
means that ``qstr.i.last`` will only contain data from files that have
|
||||||
|
changed since the last compile.
|
||||||
|
2. ``qstr.split`` is an empty file created after running ``makeqstrdefs.py split``
|
||||||
|
on qstr.i.last. It's just used as a dependency to indicate that the step ran.
|
||||||
|
This script outputs one file per input C file, ``genhdr/qstr/...file.c.qstr``,
|
||||||
|
which contains only the matched QSTRs. Each QSTR is printed as ``Q(Foo)``.
|
||||||
|
This step is necessary to combine the existing files with the new data
|
||||||
|
generated from the incremental update in ``qstr.i.last``.
|
||||||
|
|
||||||
|
3. ``qstrdefs.collected.h`` is the output of concatenating ``genhdr/qstr/*``
|
||||||
|
using ``makeqstrdefs.py cat``. This is now the full set of ``MP_QSTR_Foo``'s
|
||||||
|
found in the code, now formatted as ``Q(Foo)``, one-per-line, with duplicates.
|
||||||
|
This file is only updated if the set of qstrs has changed. A hash of the QSTR
|
||||||
|
data is written to another file (``qstrdefs.collected.h.hash``) which allows
|
||||||
|
it to track changes across builds.
|
||||||
|
|
||||||
|
4. ``qstrdefs.preprocessed.h`` adds in the QSTRs from qstrdefs*. It
|
||||||
|
concatenates ``qstrdefs.collected.h`` with ``qstrdefs*.h``, then it transforms
|
||||||
|
each line from ``Q(Foo)`` to ``"Q(Foo)"`` so they pass through the preprocessor
|
||||||
|
unchanged. Then the preprocessor is used to deal with any conditional
|
||||||
|
compilation in ``qstrdefs*.h``. Then the transformation is undone back to
|
||||||
|
``Q(Foo)``, and saved as ``qstrdefs.preprocessed.h``.
|
||||||
|
|
||||||
|
5. ``qstrdefs.generated.h`` is the output of ``makeqstrdata.py``. For each
|
||||||
|
``Q(Foo)`` in qstrdefs.preprocessed.h (plus some extra hard-coded ones), it outputs
|
||||||
|
``QDEF(MP_QSTR_Foo, (const byte*)"hash" "Foo")``.
|
||||||
|
|
||||||
|
Then in the main compile, two things happen with ``qstrdefs.generated.h``:
|
||||||
|
|
||||||
|
1. In qstr.h, each QDEF becomes an entry in an enum, which makes ``MP_QSTR_Foo``
|
||||||
|
available to code and equal to the index of that string in the QSTR table.
|
||||||
|
|
||||||
|
2. In qstr.c, the actual QSTR data table is generated as elements of the
|
||||||
|
``mp_qstr_const_pool->qstrs``.
|
||||||
|
|
||||||
|
.. _`string interning`: https://en.wikipedia.org/wiki/String_interning
|
||||||
|
|
||||||
|
Run-time QSTR generation
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
Additional QSTR pools can be created at runtime so that strings can be added to
|
||||||
|
them. For example, the code::
|
||||||
|
|
||||||
|
foo[x] = 3
|
||||||
|
|
||||||
|
Will need to create a QSTR for the value of ``x`` so it can be used by the
|
||||||
|
"load attr" bytecode.
|
||||||
|
|
||||||
|
Also, when compiling Python code, identifiers and literals need to have QSTRs
|
||||||
|
created. Note: only literals shorter than 10 characters become QSTRs. This is
|
||||||
|
because a regular string on the heap always takes up a minimum of 16 bytes (one
|
||||||
|
GC block), whereas QSTRs allow them to be packed more efficiently into the pool.
|
||||||
|
|
||||||
|
QSTR pools (and the underlying "chunks" that store the string data) are allocated
|
||||||
|
on-demand on the heap with a minimum size.
|
|
@ -70,6 +70,7 @@ $(OBJ): | $(HEADER_BUILD)/qstrdefs.generated.h $(HEADER_BUILD)/mpversion.h
|
||||||
# - if anything in QSTR_GLOBAL_DEPENDENCIES is newer, then process all source files ($^)
|
# - if anything in QSTR_GLOBAL_DEPENDENCIES is newer, then process all source files ($^)
|
||||||
# - else, if list of newer prerequisites ($?) is not empty, then process just these ($?)
|
# - else, if list of newer prerequisites ($?) is not empty, then process just these ($?)
|
||||||
# - else, process all source files ($^) [this covers "make -B" which can set $? to empty]
|
# - else, process all source files ($^) [this covers "make -B" which can set $? to empty]
|
||||||
|
# See more information about this process in docs/develop/qstr.rst.
|
||||||
$(HEADER_BUILD)/qstr.i.last: $(SRC_QSTR) $(QSTR_GLOBAL_DEPENDENCIES) | $(QSTR_GLOBAL_REQUIREMENTS)
|
$(HEADER_BUILD)/qstr.i.last: $(SRC_QSTR) $(QSTR_GLOBAL_DEPENDENCIES) | $(QSTR_GLOBAL_REQUIREMENTS)
|
||||||
$(ECHO) "GEN $@"
|
$(ECHO) "GEN $@"
|
||||||
$(Q)$(CPP) $(QSTR_GEN_EXTRA_CFLAGS) $(CFLAGS) $(if $(filter $?,$(QSTR_GLOBAL_DEPENDENCIES)),$^,$(if $?,$?,$^)) >$(HEADER_BUILD)/qstr.i.last
|
$(Q)$(CPP) $(QSTR_GEN_EXTRA_CFLAGS) $(CFLAGS) $(if $(filter $?,$(QSTR_GLOBAL_DEPENDENCIES)),$^,$(if $?,$?,$^)) >$(HEADER_BUILD)/qstr.i.last
|
||||||
|
|
1
py/py.mk
1
py/py.mk
|
@ -231,6 +231,7 @@ MPCONFIGPORT_MK = $(wildcard mpconfigport.mk)
|
||||||
# created before we run the script to generate the .h
|
# created before we run the script to generate the .h
|
||||||
# Note: we need to protect the qstr names from the preprocessor, so we wrap
|
# Note: we need to protect the qstr names from the preprocessor, so we wrap
|
||||||
# the lines in "" and then unwrap after the preprocessor is finished.
|
# the lines in "" and then unwrap after the preprocessor is finished.
|
||||||
|
# See more information about this process in docs/develop/qstr.rst.
|
||||||
$(HEADER_BUILD)/qstrdefs.generated.h: $(PY_QSTR_DEFS) $(QSTR_DEFS) $(QSTR_DEFS_COLLECTED) $(PY_SRC)/makeqstrdata.py mpconfigport.h $(MPCONFIGPORT_MK) $(PY_SRC)/mpconfig.h | $(HEADER_BUILD)
|
$(HEADER_BUILD)/qstrdefs.generated.h: $(PY_QSTR_DEFS) $(QSTR_DEFS) $(QSTR_DEFS_COLLECTED) $(PY_SRC)/makeqstrdata.py mpconfigport.h $(MPCONFIGPORT_MK) $(PY_SRC)/mpconfig.h | $(HEADER_BUILD)
|
||||||
$(ECHO) "GEN $@"
|
$(ECHO) "GEN $@"
|
||||||
$(Q)$(CAT) $(PY_QSTR_DEFS) $(QSTR_DEFS) $(QSTR_DEFS_COLLECTED) | $(SED) 's/^Q(.*)/"&"/' | $(CPP) $(CFLAGS) - | $(SED) 's/^\"\(Q(.*)\)\"/\1/' > $(HEADER_BUILD)/qstrdefs.preprocessed.h
|
$(Q)$(CAT) $(PY_QSTR_DEFS) $(QSTR_DEFS) $(QSTR_DEFS_COLLECTED) | $(SED) 's/^Q(.*)/"&"/' | $(CPP) $(CFLAGS) - | $(SED) 's/^\"\(Q(.*)\)\"/\1/' > $(HEADER_BUILD)/qstrdefs.preprocessed.h
|
||||||
|
|
Loading…
Reference in New Issue