micropython/tests/unicode/unicode.py

# Test a UTF-8 encoded literal
s = "asdf©qwer"
for i in range(len(s)):
    print("s[%d]: %s   %X" % (i, s[i], ord(s[i])))

# Test all three forms of Unicode escape, and
# all blocks of UTF-8 byte patterns
s = "a\xA9\xFF\u0123\u0800\uFFEE\U0001F44C"
for i in range(-len(s), len(s)):
    print("s[%d]: %s   %X" % (i, s[i], ord(s[i])))
    print("s[:%d]: %d chars, '%s'" % (i, len(s[:i]), s[:i]))
    for j in range(i, len(s)):
        print("s[%d:%d]: %d chars, '%s'" % (i, j, len(s[i:j]), s[i:j]))
    print("s[%d:]: %d chars, '%s'" % (i, len(s[i:]), s[i:]))

# Test UTF-8 encode and decode
enc = s.encode()
print(enc, enc.decode() == s)

# printing of unicode chars using repr
# NOTE: for some characters (eg \u10ff) we differ to CPython
print(repr("a\uffff"))
print(repr("a\U0001ffff"))

# test invalid escape code
try:
    eval('"\\U00110000"')
except SyntaxError:
    print("SyntaxError")

# test unicode string given to int
try:
    int("\u0200")
except ValueError:
    print("ValueError")

# test invalid UTF-8 string
try:
    str(b"ab\xa1", "utf8")
except UnicodeError:
    print("UnicodeError")
try:
    str(b"ab\xf8", "utf8")
except UnicodeError:
    print("UnicodeError")
try:
    str(bytearray(b"ab\xc0a"), "utf8")
except UnicodeError:
    print("UnicodeError")
try:
    str(b"\xf0\xe0\xed\xe8", "utf8")
except UnicodeError:
    print("UnicodeError")
tests: Add unicode test. 2014-06-03 20:28:12 +01:00			`# Test a UTF-8 encoded literal`
			`s = "asdf©qwer"`
			`for i in range(len(s)):`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`print("s[%d]: %s %X" % (i, s[i], ord(s[i])))`
tests: Add unicode test. 2014-06-03 20:28:12 +01:00
			`# Test all three forms of Unicode escape, and`
			`# all blocks of UTF-8 byte patterns`
			`s = "a\xA9\xFF\u0123\u0800\uFFEE\U0001F44C"`
			`for i in range(-len(s), len(s)):`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`print("s[%d]: %s %X" % (i, s[i], ord(s[i])))`
			`print("s[:%d]: %d chars, '%s'" % (i, len(s[:i]), s[:i]))`
tests: Add unicode test. 2014-06-03 20:28:12 +01:00			`for j in range(i, len(s)):`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`print("s[%d:%d]: %d chars, '%s'" % (i, j, len(s[i:j]), s[i:j]))`
			`print("s[%d:]: %d chars, '%s'" % (i, len(s[i:]), s[i:]))`
tests: Add unicode test. 2014-06-03 20:28:12 +01:00
			`# Test UTF-8 encode and decode`
			`enc = s.encode()`
			`print(enc, enc.decode() == s)`
tests: Add missing tests for builtins, and many other things. 2015-04-04 22:05:30 +01:00
			`# printing of unicode chars using repr`
tests: Improve coverage of array, range, dict, slice, exc, unicode. 2016-10-17 01:43:47 +01:00			`# NOTE: for some characters (eg \u10ff) we differ to CPython`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`print(repr("a\uffff"))`
			`print(repr("a\U0001ffff"))`
py/lexer: Raise SyntaxError when unicode char point out of range. 2015-09-07 17:19:17 +01:00
			`# test invalid escape code`
			`try:`
			`eval('"\\U00110000"')`
			`except SyntaxError:`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`print("SyntaxError")`
tests: Move int+unicode test to unicode-specific test directory. 2015-09-07 21:36:24 +01:00
			`# test unicode string given to int`
			`try:`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`int("\u0200")`
tests: Move int+unicode test to unicode-specific test directory. 2015-09-07 21:36:24 +01:00			`except ValueError:`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`print("ValueError")`
py/objstr: Add check for valid UTF-8 when making a str from bytes. This patch adds a function utf8_check() to check for a valid UTF-8 encoded string, and calls it when constructing a str from raw bytes. The feature is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and is enabled if unicode is enabled. It costs about 110 bytes on Thumb-2, 150 bytes on Xtensa and 170 bytes on x86-64. 2017-06-24 01:38:32 +01:00
			`# test invalid UTF-8 string`
			`try:`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`str(b"ab\xa1", "utf8")`
py/objstr: Add check for valid UTF-8 when making a str from bytes. This patch adds a function utf8_check() to check for a valid UTF-8 encoded string, and calls it when constructing a str from raw bytes. The feature is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and is enabled if unicode is enabled. It costs about 110 bytes on Thumb-2, 150 bytes on Xtensa and 170 bytes on x86-64. 2017-06-24 01:38:32 +01:00			`except UnicodeError:`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`print("UnicodeError")`
py/objstr: Add check for valid UTF-8 when making a str from bytes. This patch adds a function utf8_check() to check for a valid UTF-8 encoded string, and calls it when constructing a str from raw bytes. The feature is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and is enabled if unicode is enabled. It costs about 110 bytes on Thumb-2, 150 bytes on Xtensa and 170 bytes on x86-64. 2017-06-24 01:38:32 +01:00			`try:`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`str(b"ab\xf8", "utf8")`
py/objstr: Add check for valid UTF-8 when making a str from bytes. This patch adds a function utf8_check() to check for a valid UTF-8 encoded string, and calls it when constructing a str from raw bytes. The feature is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and is enabled if unicode is enabled. It costs about 110 bytes on Thumb-2, 150 bytes on Xtensa and 170 bytes on x86-64. 2017-06-24 01:38:32 +01:00			`except UnicodeError:`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`print("UnicodeError")`
py/objstr: Add check for valid UTF-8 when making a str from bytes. This patch adds a function utf8_check() to check for a valid UTF-8 encoded string, and calls it when constructing a str from raw bytes. The feature is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and is enabled if unicode is enabled. It costs about 110 bytes on Thumb-2, 150 bytes on Xtensa and 170 bytes on x86-64. 2017-06-24 01:38:32 +01:00			`try:`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`str(bytearray(b"ab\xc0a"), "utf8")`
py/objstr: Add check for valid UTF-8 when making a str from bytes. This patch adds a function utf8_check() to check for a valid UTF-8 encoded string, and calls it when constructing a str from raw bytes. The feature is selectable at compile time via MICROPY_PY_BUILTINS_STR_UNICODE_CHECK and is enabled if unicode is enabled. It costs about 110 bytes on Thumb-2, 150 bytes on Xtensa and 170 bytes on x86-64. 2017-06-24 01:38:32 +01:00			`except UnicodeError:`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`print("UnicodeError")`
py/unicode: Fix check for valid utf8 being stricter about contn chars. 2018-11-26 05:13:08 +00:00			`try:`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`str(b"\xf0\xe0\xed\xe8", "utf8")`
py/unicode: Fix check for valid utf8 being stricter about contn chars. 2018-11-26 05:13:08 +00:00			`except UnicodeError:`
tests: Format all Python code with black, except tests in basics subdir. This adds the Python files in the tests/ directory to be formatted with ./tools/codeformat.py. The basics/ subdirectory is excluded for now so we aren't changing too much at once. In a few places `# fmt: off`/`# fmt: on` was used where the code had special formatting for readability or where the test was actually testing the specific formatting. 2020-03-23 02:26:08 +00:00			`print("UnicodeError")`