Commit Graph

347 Commits

Author SHA1 Message Date
Paul Sokolovsky fc9a6dd09e py/objstr: strip: Don't strip "\0" by default.
An issue was due to incorrectly taking size of default strip characters
set.
2017-09-19 21:21:12 +03:00
tll 68c28174d0 py/objstr: Add check for valid UTF-8 when making a str from bytes.
This patch adds a function utf8_check() to check for a valid UTF-8 encoded
string, and calls it when constructing a str from raw bytes.  The feature
is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and
is enabled if unicode is enabled.  It costs about 110 bytes on Thumb-2, 150
bytes on Xtensa and 170 bytes on x86-64.
2017-09-06 16:43:09 +10:00
Damien George 58321dd985 all: Convert mp_uint_t to mp_unary_op_t/mp_binary_op_t where appropriate
The unary-op/binary-op enums are already defined, and there are no
arithmetic tricks used with these types, so it makes sense to use the
correct enum type for arguments that take these values.  It also reduces
code size quite a bit for nan-boxing builds.
2017-08-29 13:16:30 +10:00
Paul Sokolovsky 37379a2974 py/objstr: startswith, endswith: Check arg to be a string.
Otherwise, it will silently get incorrect result on other values types,
including CPython tuple form like "foo.png".endswith(("png", "jpg"))
(which MicroPython doesn't support for unbloatedness).
2017-08-29 00:06:21 +03:00
Javier Candeira 35a1fea90b all: Raise exceptions via mp_raise_XXX
- Changed: ValueError, TypeError, NotImplementedError
  - OSError invocations unchanged, because the corresponding utility
    function takes ints, not strings like the long form invocation.
  - OverflowError, IndexError and RuntimeError etc. not changed for now
    until we decide whether to add new utility functions.
2017-08-13 22:52:33 +10:00
Damien George 3d25d9c7d9 py/objstr: Raise an exception for wrong type on RHS of str binary op.
The main case to catch is invalid types for the containment operator, of
the form str.__contains__(non-str).
2017-08-09 21:25:48 +10:00
Alexander Steffen 55f33240f3 all: Use the name MicroPython consistently in comments
There were several different spellings of MicroPython present in comments,
when there should be only one.
2017-07-31 18:35:40 +10:00
Damien George 9d2c72ad4f py/objstr: Remove unnecessary "sign" variable in formatting code. 2017-07-04 02:13:27 +10:00
Damien George 65417c5ad9 py/objstr: Move uPy function wrappers to just after the C function.
This matches the coding/layout style of all the other objects.
2017-07-02 23:35:42 +10:00
Damien George 326e8860ab py/objstr: Allow to compile with obj-repr D, and unicode disabled. 2017-06-08 00:40:38 +10:00
Damien George 9f85c4fe48 py/objstr: Catch case of negative "maxsplit" arg to str.rsplit().
Negative values mean no limit on the number of splits so should delegate to
the .split() method.
2017-06-02 13:07:22 +10:00
Ville Skyttä ca16c38210 various: Spelling fixes 2017-05-29 11:36:05 +03:00
Paul Sokolovsky 9a973977bb py/objstr: Use MICROPY_FULL_CHECKS for range checking when constructing bytes.
Split this setting from MICROPY_CPYTHON_COMPAT. The idea is to be able to
keep MICROPY_CPYTHON_COMPAT disabled, but still pass more of regression
testsuite. In particular, this fixes last failing test in basics/ for
Zephyr port.
2017-04-02 21:20:07 +03:00
Damien George 6b34107537 py: Change mp_uint_t to size_t for mp_obj_str_get_data len arg. 2017-03-29 12:56:45 +11:00
Damien George 6213ad7f46 py: Convert mp_uint_t to size_t for tuple/list accessors.
This patch changes mp_uint_t to size_t for the len argument of the
following public facing C functions:

mp_obj_tuple_get
mp_obj_list_get
mp_obj_get_array

These functions take a pointer to the len argument (to be filled in by the
function) and callers of these functions should update their code so the
type of len is changed to size_t.  For ports that don't use nan-boxing
there should be no change in generate code because the size of the type
remains the same (word sized), and in a lot of cases there won't even be a
compiler warning if the type remains as mp_uint_t.

The reason for this change is to standardise on the use of size_t for
variables that count memory (or memory related) sizes/lengths.  It helps
builds that use nan-boxing.
2017-03-29 12:56:17 +11:00
Damien George c88cfe165b py: Use size_t as len argument and return type of mp_get_index.
These values are used to compute memory addresses and so size_t is the
more appropriate type to use.
2017-03-23 16:17:40 +11:00
stijn bf29fe2e13 py/objstr: Use better msg in bad implicit str/bytes conversion exception
Instead of always reporting some object cannot be implicitly be converted
to a 'str', even when it is a 'bytes' object, adjust the logic so that
when trying to convert str to bytes it is shown like that.
This will still report bad implicit conversion from e.g. 'int to bytes'
as 'int to str' but it will not result in the confusing
'can't convert 'str' object to str implicitly' anymore for calls like
b'somestring'.count('a').
2017-03-20 15:11:45 +11:00
Damien George d279bcff8a py/objstr: Fix eager optimisation of str/bytes addition.
The RHS can only be returned if it is the same type as the LHS.
2017-03-16 14:30:04 +11:00
Krzysztof Blazewicz 7e480e8a30 py: Use mp_obj_get_array where sequence may be a tuple or a list. 2017-03-07 16:48:16 +11:00
Damien George ae8d867586 py: Add iter_buf to getiter type method.
Allows to iterate over the following without allocating on the heap:
- tuple
- list
- string, bytes
- bytearray, array
- dict (not dict.keys, dict.values, dict.items)
- set, frozenset

Allows to call the following without heap memory:
- all, any, min, max, sum

TODO: still need to allocate stack memory in bytecode for iter_buf.
2017-02-16 18:38:06 +11:00
Damien George c0d9500eee py/objstr: Convert mp_uint_t to size_t (and use int) where appropriate. 2017-02-16 16:51:16 +11:00
Damien George 90ab191b65 py/objstr: Convert some instances of mp_uint_t to size_t. 2017-02-03 13:04:56 +11:00
Damien George 7317e34383 py/objstr: Give correct behaviour when passing a dict to %-formatting.
This patch fixes two main things:
- dicts can be printed directly using '%s' % dict
- %-formatting should not crash when passed a non-dict to, eg, '%(foo)s'
2017-02-03 12:13:44 +11:00
Paul Sokolovsky e2e663291d py/objstr: Optimize string concatenation with empty string.
In this, don't allocate copy, just return non-empty string. This helps
with a standard pattern of buffering data in case of short reads:

    buf = b""
    while ...:
        s = f.read(...)
        buf += s
        ...

For a typical case when single read returns all data needed, there won't
be extra allocation. This optimization helps uasyncio.
2017-01-27 00:49:39 +03:00
Damien George 897129a7ff py/objstr: Remove unreachable function used only for terse error msgs. 2016-09-27 15:45:42 +10:00
Damien George 5f3bda422a py: If str/bytes hash is 0 then explicitly compute it. 2016-09-02 14:49:50 +10:00
Damien George 2196799051 py/objstr: Use mp_raise_{Type,Value}Error instead of mp_raise_msg.
This patch does further refactoring using the new mp_raise_TypeError
and mp_raise_ValueError functions.
2016-08-14 16:51:54 +10:00
Paul Sokolovsky c4a8004933 py: Get rid of assert() in method argument checking functions.
Checks for number of args removes where guaranteed by function descriptor,
self checking is replaced with mp_check_self(). In few cases, exception
is raised instead of assert.
2016-08-12 22:39:03 +03:00
Paul Sokolovsky 9e1b61dedd py/runtime: Factor out exception raising helpers.
Introduce mp_raise_msg(), mp_raise_ValueError(), mp_raise_TypeError()
instead of previous pattern nlr_raise(mp_obj_new_exception_msg(...)).
Save few bytes on each call, which are many.
2016-08-12 21:28:45 +03:00
Paul Sokolovsky 1563388001 py/objstr,objstrunicode: Fix inconistent #if indentation. 2016-08-07 15:24:57 +03:00
Paul Sokolovsky 56eb25f049 py/objstr: Make .partition()/.rpartition() methods configurable.
Default is disabled, enabled for unix port. Saves 600 bytes on x86.
2016-08-07 06:46:55 +03:00
Paul Sokolovsky 9dde6062cc py/objstr: Fix mix-signed comparison in str.center(). 2016-05-22 02:22:14 +03:00
Dave Hylands 6a60fb3cf4 py/objstr*: Properly ifdef str.center(). 2016-05-22 01:54:41 +03:00
Paul Sokolovsky 1b5abfcaae py/objstr: Implement str.center().
Disabled by default, enabled in unix port. Need for this method easily
pops up when working with text UI/reporting, and coding workalike
manually again and again counter-productive.
2016-05-22 00:13:44 +03:00
Damien George cc80c4dd59 py/objstr: Make dedicated splitlines function, supporting diff newlines.
It now supports \n, \r and \r\n as newline separators.

Adds 56 bytes to stmhal and 80 bytes to unix x86-64.

Fixes issue #1689.
2016-05-13 12:21:32 +01:00
Paul Sokolovsky 40f0096ee7 Revert "py/objstr: .format(): Avoid call to vstr_null_terminated_str()."
This reverts commit 6de8dbb488. The change
was incorrect (correct change would require comparing with end pointer in
each if statement in the block).
2016-05-09 23:42:42 +03:00
Paul Sokolovsky 6de8dbb488 py/objstr: .format(): Avoid call to vstr_null_terminated_str().
By comparing with string end pointer instead of checking for NUL byte.
Should alleviate reallocations and fragmentation a tiny bit.
2016-05-09 21:55:09 +03:00
Damien George 12dd8df375 py/objstr: Binary type of str/bytes for buffer protocol is 'B'.
The type is an unsigned 8-bit value, since bytes objects are exactly
that.  And it's also sensible for unicode strings to return unsigned
values when accessed in a byte-wise manner (CPython does not allow this).
2016-05-07 21:18:17 +01:00
Damien George a649d72606 py/makeqstrdata: Add special case to handle \n qstr. 2016-04-14 15:22:36 +01:00
Paul Sokolovsky c38809e26b py/objarray: Implement "in" operator for bytearray. 2016-02-14 18:57:11 +02:00
Damien George 086d98cbde py/objstr: Make mp_obj_str_format_helper static. 2016-02-02 16:51:52 +00:00
Damien George 87e07ea943 py/objstr: For str.format, don't allocate on the heap for field name. 2016-02-02 16:26:21 +00:00
pohmelie e3a29de1dc py/objstr: For str.format, add nested/computed fields support.
Eg: '{:{}}'.format(123, '>20')

@pohmelie was the original author of this patch, but @dpgeorge made
significant changes to reduce code size and improve efficiency.
2016-02-02 16:25:24 +00:00
Damien George 22d85ec5be py: Use new code pattern for parsing kw args with mp_arg_parse_all.
Makes code easier to read and more maintainable.
2016-01-13 15:47:56 +00:00
Damien George 5b3f0b7f39 py: Change first arg of type.make_new from mp_obj_t to mp_obj_type_t*.
The first argument to the type.make_new method is naturally a uPy type,
and all uses of this argument cast it directly to a pointer to a type
structure.  So it makes sense to just have it a pointer to a type from
the very beginning (and a const pointer at that).  This patch makes
such a change, and removes all unnecessary casting to/from mp_obj_t.
2016-01-11 00:49:27 +00:00
Damien George 4b72b3a133 py: Change type signature of builtin funs that take variable or kw args.
With this patch the n_args parameter is changed type from mp_uint_t to
size_t.
2016-01-11 00:49:27 +00:00
Damien George a0c97814df py: Change type of .make_new and .call args: mp_uint_t becomes size_t.
This patch changes the type signature of .make_new and .call object method
slots to use size_t for n_args and n_kw (was mp_uint_t.  Makes code more
efficient when mp_uint_t is larger than a machine word.  Doesn't affect
ports when size_t and mp_uint_t have the same size.
2016-01-11 00:48:41 +00:00
Damien George d4df8f4925 py/objstr: In str.format, handle case of no format spec for string arg.
Handles, eg, "{:>20}".format("foo"), where there is no explicit spec for
the type of the argument.
2016-01-04 13:13:39 +00:00
Damien George 8212d97317 py: Use polymorphic iterator type where possible to reduce code size.
Only types whose iterator instances still fit in 4 machine words have
been changed to use the polymorphic iterator.

Reduces Thumb2 arch code size by 264 bytes.
2016-01-03 16:27:55 +00:00
Paul Sokolovsky d50f649cf8 py/objstr: Applying % (format) operator to bytes should return bytes, not str. 2015-12-20 16:52:11 +02:00
Paul Sokolovsky ef63ab5724 py/objstr: Make sure that b"%s" % b"foo" uses undecorated bytes value.
I.e. the expected result for above is b"foo", whereas previously we got
b"b'foo'".
2015-12-20 16:51:59 +02:00
Damien George 999cedb90f py: Wrap all obj-ptr conversions in MP_OBJ_TO_PTR/MP_OBJ_FROM_PTR.
This allows the mp_obj_t type to be configured to something other than a
pointer-sized primitive type.

This patch also includes additional changes to allow the code to compile
when sizeof(mp_uint_t) != sizeof(void*), such as using size_t instead of
mp_uint_t, and various casts.
2015-11-29 14:25:35 +00:00
Damien George cbf7674025 py: Add MP_ROM_* macros and mp_rom_* types and use them. 2015-11-29 14:25:04 +00:00
Damien George c3f64d9799 py: Change qstr_* functions to use size_t as the type for str len arg. 2015-11-29 14:25:04 +00:00
Damien George 04353cc85e py: With obj repr "C", change raw str accessor from macro to function.
This saves around 1000 bytes (Thumb2 arch) because in repr "C" it is
costly to check and extract a qstr.  So making such check/extract a
function instead of a macro saves lots of code space.
2015-10-20 12:38:54 +01:00
Damien George aaef1851a7 py: Add mp_obj_is_float function (macro) and use it where appropriate. 2015-10-20 12:35:17 +01:00
Paul Sokolovsky 1b586f3a73 py: Rename MP_BOOL() to mp_obj_new_bool() for consistency in naming. 2015-10-11 15:18:15 +03:00
Damien George 3a2171e406 py: Eliminate some cases which trigger unused parameter warnings. 2015-09-04 16:53:46 +01:00
Damien George 42cec5c893 py/objstr: Check for keyword args before checking for no posn args.
Otherwise something like bytes(abc=123) will succeed.
2015-09-04 16:51:55 +01:00
Damien George 55b11e6d38 py/objstr: For str.endswith(s, start) raise NotImpl instead of assert. 2015-09-04 16:49:56 +01:00
Damien George 821b7f22fe py: Use mp_not_implemented consistently for not implemented features. 2015-09-03 23:14:06 +01:00
Damien George e2aa117798 py/objstr: Simplify printing of bytes objects when unicode enabled. 2015-09-03 23:03:57 +01:00
Damien George 516982242d py: Inline single use of mp_obj_str_get_len in mp_obj_len_maybe.
Gets rid of redundant double check for string type.

Also remove obsolete declaration of mp_obj_str_get_hash.
2015-09-03 23:01:07 +01:00
Damien George 22602cc37b py/objstr: Make str.rsplit(None,n) raise NotImpl instead of assert(0). 2015-09-01 15:35:31 +01:00
Damien George 000730ecaa py/objstr: Simplify error handling for bad conversion specifier. 2015-08-30 12:43:21 +01:00
Damien George b648e98ad0 py/objstr: Fix error reporting for unexpected end of modulo format str. 2015-08-29 23:13:51 +01:00
Damien George 7ef75f9f75 py/objstr: Fix error type for badly formatted format specifier.
Was KeyError, should be ValueError.
2015-08-29 23:13:51 +01:00
Damien George 51b9a0d0c4 py/objstr: Make string formatting 8-bit clean. 2015-08-29 23:13:51 +01:00
Dave Hylands 9f76dcd682 py: Prevent many extra vstr allocations.
I checked the entire codebase, and every place that vstr_init_len
was called, there was a call to mp_obj_new_str_from_vstr after it.

mp_obj_new_str_from_vstr always tries to reallocate a new buffer
1 byte larger than the original to store the terminating null
character.

In many cases, if we allocated the initial buffer to be 1 byte
longer, we can prevent this extra allocation, and just reuse
the originally allocated buffer.

Asking to read 256 bytes and only getting 100 will still cause
the extra allocation, but if you ask to read 256 and get 256
then the extra allocation will be optimized away.

Yes - the reallocation is optimized in the heap to try and reuse
the buffer if it can, but it takes quite a few cycles to figure
this out.

Note by Damien: vstr_init_len should now be considered as a
string-init convenience function and used only when creating
null-terminated objects.
2015-07-06 17:29:27 +01:00
Paul Sokolovsky f44cc517a2 objstr: Add note that replace() is nicely optimized.
Doesn't allocate memory and returns original string if no replacements are
to be made.
2015-06-26 17:35:12 +03:00
Damien George 79474c6b16 py: Remove unnecessary extra handling of padding of nan/inf.
C's printf will pad nan/inf differently to CPython.  Our implementation
originally conformed to C, now it conforms to CPython's way.

Tests for this are also added in this patch.
2015-05-28 14:22:12 +00:00
Damien George 44e7cbf019 py: Clean up declarations of str type/funcs that are also in unicode.
Background: trying to make an amalgamation of all the code gave some
errors with redefined types and inconsistent use of static.
2015-05-17 16:44:24 +01:00
Damien George c2a4e4effc py: Convert hash API to use MP_UNARY_OP_HASH instead of ad-hoc function.
Hashing is now done using mp_unary_op function with MP_UNARY_OP_HASH as
the operator argument.  Hashing for int, str and bytes still go via
fast-path in mp_unary_op since they are the most common objects which
need to be hashed.

This lead to quite a bit of code cleanup, and should be more efficient
if anything.  It saves 176 bytes code space on Thumb2, and 360 bytes on
x86.

The only loss is that the error message "unhashable type" is now the
more generic "unsupported type for __hash__".
2015-05-12 22:46:02 +01:00
Damien George ede0f3ab3d py: Add optional code to check bytes constructor values are in range.
Compiled in only if MICROPY_CPYTHON_COMPAT is set.

Addresses issue #1093.
2015-04-23 15:28:18 +01:00
Damien George 7f9d1d6ab9 py: Overhaul and simplify printf/pfenv mechanism.
Previous to this patch the printing mechanism was a bit of a tangled
mess.  This patch attempts to consolidate printing into one interface.

All (non-debug) printing now uses the mp_print* family of functions,
mainly mp_printf.  All these functions take an mp_print_t structure as
their first argument, and this structure defines the printing backend
through the "print_strn" function of said structure.

Printing from the uPy core can reach the platform-defined print code via
two paths: either through mp_sys_stdout_obj (defined pert port) in
conjunction with mp_stream_write; or through the mp_plat_print structure
which uses the MP_PLAT_PRINT_STRN macro to define how string are printed
on the platform.  The former is only used when MICROPY_PY_IO is defined.

With this new scheme printing is generally more efficient (less layers
to go through, less arguments to pass), and, given an mp_print_t*
structure, one can call mp_print_str for efficiency instead of
mp_printf("%s", ...).  Code size is also reduced by around 200 bytes on
Thumb2 archs.
2015-04-16 14:30:16 +00:00
Paul Sokolovsky 8b7faa31e1 objstr: split(None): Fix whitespace properly. 2015-04-12 00:17:57 +03:00
Damien George 2801e6fad8 py: Some trivial cosmetic changes, for code style consistency. 2015-04-04 15:53:11 +01:00
Paul Sokolovsky 7f59b4b2ca objstr: Fix bugs introduced by inability to have shadow variables.
Warnings lead to programming errors - as expected.
2015-04-04 01:55:40 +03:00
Paul Sokolovsky acf6aec71c objstr: Avoid variable shadowing. 2015-04-04 01:24:59 +03:00
Paul Sokolovsky ac2f7a7f6a objstr: Add .splitlines() method.
splitlines() occurs ~179 times in CPython3 standard library, so was
deemed worthy to implement. The method has subtle semantic differences
from just .split("\n"). It is also defined as working for any end-of-line
combination, but this is currently not implemented - it works only with
LF line-endings (which should be OK for text strings on any platforms,
but not OK for bytes).
2015-04-04 00:09:48 +03:00
Paul Sokolovsky 8705171233 objstr: Expose mp_obj_str_split() for reuse in other modules. 2015-03-23 22:43:37 +02:00
Damien George fa1edff006 py: Remove unnecessary and unused sgn argument from pfenv_print_mp_int. 2015-03-14 22:32:40 +00:00
Paul Sokolovsky 194117a066 objstr: Fix bytes creation from array of long ints. 2015-02-09 12:11:49 +08:00
Damien George 827b0f747b py: Change vstr_null_terminate -> vstr_null_terminated_str, returns str. 2015-01-29 13:57:23 +00:00
Damien George 0d3cb6726d py: Change vstr so that it doesn't null terminate buffer by default.
This cleans up vstr so that it's a pure "variable buffer", and the user
can decide whether they need to add a terminating null byte.  In most
places where vstr is used, the vstr did not need to be null terminated
and so this patch saves code size, a tiny bit of RAM, and makes vstr
usage more efficient.  When null termination is needed it must be
done explicitly using vstr_null_terminate.
2015-01-28 23:43:01 +00:00
Paul Sokolovsky bbd9251bac py: bytes(): Make sure we add values as bytes, not as chars. 2015-01-28 22:29:07 +02:00
Damien George 98e3a64694 py: Remove duplicated mp_obj_str_make_new function from objstrunicode.c. 2015-01-28 14:14:57 +00:00
Paul Sokolovsky 344e15b1ae objstr: Remove code duplication and unbreak Windows build.
There was really weird warning (promoted to error) when building Windows
port. Exact cause is still unknown, but it uncovered another issue:
8-bit and unicode str_make_new implementations should be mutually exclusive,
and not built at the same time. What we had is that bytes_decode() pulled
8-bit str_make_new() even for unicode build.
2015-01-23 02:15:56 +02:00
Paul Sokolovsky 6113eb2f33 objstr*: Use separate names for locals_dict of 8-bit and unicode str's.
To somewhat unbreak -DSTATIC="" compile.
2015-01-23 02:05:58 +02:00
Damien George 77089bebd4 py: Add comments for vstr_init and mp_obj_new_str. 2015-01-21 23:18:02 +00:00
Damien George 05005f679e py: Remove mp_obj_str_builder and use vstr instead.
With this patch str/bytes construction is streamlined.  Always use a
vstr to build a str/bytes object.  If the size is known beforehand then
use vstr_init_len to allocate only required memory.  Otherwise use
vstr_init and the vstr will grow as needed.  Then use
mp_obj_new_str_from_vstr to create a str/bytes object using the vstr
memory.

Saves code ROM: 68 bytes on stmhal, 108 bytes on bare-arm, and 336 bytes
on unix x64.
2015-01-21 23:18:02 +00:00
Damien George 0b9ee86133 py: Add mp_obj_new_str_from_vstr, and use it where relevant.
This patch allows to reuse vstr memory when creating str/bytes object.
This improves memory usage.

Also saves code ROM: 128 bytes on stmhal, 92 bytes on bare-arm, and 88
bytes on unix x64.
2015-01-21 23:17:27 +00:00
Damien George ff8dd3f486 py, unix: Allow to compile with -Wunused-parameter.
See issue #699.
2015-01-20 12:47:20 +00:00
Damien George 50912e7f5d py, unix, stmhal: Allow to compile with -Wshadow.
See issue #699.
2015-01-20 11:55:10 +00:00
Damien George 963a5a3e82 py, unix: Allow to compile with -Wsign-compare.
See issue #699.
2015-01-16 17:47:07 +00:00
Damien George 0178aa9a11 py, unix: Allow to compile with -Wdouble-promotion.
Ref issue #699.
2015-01-12 21:56:35 +00:00
Damien George e233a55a29 py: Remove unnecessary BINARY_OP_EQUAL code that just checks pointers.
Previous patch c38dc3ccc7 allowed any
object to be compared with any other, using pointer comparison for a
fallback.  As such, existing code which checked for this case is no
longer needed.
2015-01-11 21:07:15 +00:00
Paul Sokolovsky ff8e35b42e objstr: Common subexpression elimination for vstr_str(field_name). 2015-01-04 13:23:44 +02:00
Paul Sokolovsky c114496641 objstr: Implement kwargs support for str.format(). 2015-01-04 00:26:31 +02:00
Damien George 51dfcb4bb7 py: Move to guarded includes, everywhere in py/ core.
Addresses issue #1022.
2015-01-01 20:32:09 +00:00