Commit Graph

21 Commits

Author SHA1 Message Date
Jeff Epler e90b85cc98 extmod/modure: Convert byte offsets to unicode indices when necessary.
And add a test.

Fixes issue #9202.

Signed-off-by: Jeff Epler <jepler@gmail.com>
2022-09-06 17:08:18 +10:00
Jim Mussared bd4e45fd68 tests/unicode: Add test for invalid utf-8 file contents.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
2022-08-26 16:47:27 +10:00
David Lechner 3dc324d3f1 tests: Format all Python code with black, except tests in basics subdir.
This adds the Python files in the tests/ directory to be formatted with
./tools/codeformat.py.  The basics/ subdirectory is excluded for now so we
aren't changing too much at once.

In a few places `# fmt: off`/`# fmt: on` was used where the code had
special formatting for readability or where the test was actually testing
the specific formatting.
2020-03-30 13:21:58 +11:00
Nicko van Someren 4c93955b7b py/objslice: Add support for indices() method on slice objects.
Instances of the slice class are passed to __getitem__() on objects when
the user indexes them with a slice.  In practice the majority of the time
(other than passing it on untouched) is to work out what the slice means in
the context of an array dimension of a particular length.  Since Python 2.3
there has been a method on the slice class, indices(), that takes a
dimension length and returns the real start, stop and step, accounting for
missing or negative values in the slice spec.  This commit implements such
a indices() method on the slice class.

It is configurable at compile-time via MICROPY_PY_BUILTINS_SLICE_INDICES,
disabled by default, enabled on unix, stm32 and esp32 ports.

This commit also adds new tests for slice indices and for slicing unicode
strings.
2019-12-28 23:55:15 +11:00
Damien George 7c85c7c210 py/unicode: Fix check for valid utf8 being stricter about contn chars. 2018-11-26 16:13:08 +11:00
tll 68c28174d0 py/objstr: Add check for valid UTF-8 when making a str from bytes.
This patch adds a function utf8_check() to check for a valid UTF-8 encoded
string, and calls it when constructing a str from raw bytes.  The feature
is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and
is enabled if unicode is enabled.  It costs about 110 bytes on Thumb-2, 150
bytes on Xtensa and 170 bytes on x86-64.
2017-09-06 16:43:09 +10:00
Damien George e9404e5f5f tests: Improve coverage of array, range, dict, slice, exc, unicode. 2016-10-17 11:43:47 +11:00
Paul Sokolovsky d1771bbae0 tests/unicode_subscr.py: Detailed test for subscripting unicode strings. 2016-07-25 19:28:19 +03:00
Damien George 75a811a6df tests: Move int+unicode test to unicode-specific test directory. 2015-09-07 21:36:24 +01:00
Damien George 0be3c70cd8 py/lexer: Raise SyntaxError when unicode char point out of range. 2015-09-07 17:19:17 +01:00
Damien George 51b9a0d0c4 py/objstr: Make string formatting 8-bit clean. 2015-08-29 23:13:51 +01:00
Damien George 7ed58cb663 py: Support unicode (utf-8 encoded) identifiers in Python source.
Enabled simply by making the identifier lexing code 8-bit clean.
2015-06-09 10:58:07 +00:00
Damien George 9dd3640464 tests: Add missing tests for builtins, and many other things. 2015-04-04 22:05:30 +01:00
Damien George 92ab95f215 tests: Add some tests to improve coverage. 2015-01-29 14:56:09 +00:00
stijn a3efe04dce Use mode/encoding kwargs in io and unicode tests
mode argument is used to assert it works
encoding argument is used to make sure CPython uses the correct encoding
as it does not automatically use utf-8
2014-10-21 22:10:38 +03:00
Damien George 1694bc733d py: Add stream reading of n unicode chars; unicode support by default.
With unicode enabled, this patch allows reading a fixed number of
characters from text-mode streams; eg file.read(5) will read 5 unicode
chars, which can made of more than 5 bytes.

For an ASCII stream (ie no chars > 127) it only needs to do 1 read.  If
there are lots of non-ASCII chars in a stream, then it needs multiple
reads of the underlying object.

Adds a new test for this case.  Enables unicode support by default on
unix and stmhal ports.
2014-07-19 18:34:04 +01:00
Paul Sokolovsky ed07d035d5 tests: Add basic test for unicode file i/o. 2014-06-27 00:04:20 +03:00
Paul Sokolovsky 63143c94ce tests: Test for explicit start/end args to str methods for unicode. 2014-06-27 00:04:20 +03:00
Paul Sokolovsky b1949e4c09 tests: Add tests for unicode find()/rfind()/index(). 2014-06-27 00:04:19 +03:00
Paul Sokolovsky 17994d1bd3 tests: Add test for unicode string iteration. 2014-06-27 00:04:19 +03:00
Chris Angelico 1e3781bc35 tests: Add unicode test. 2014-06-27 00:04:17 +03:00