Commit Graph

27 Commits

Author SHA1 Message Date
Damien George 0528c5a22a py: In str unicode, str_subscr will never be passed a bytes object. 2015-04-04 19:42:03 +01:00
Paul Sokolovsky ac2f7a7f6a objstr: Add .splitlines() method.
splitlines() occurs ~179 times in CPython3 standard library, so was
deemed worthy to implement. The method has subtle semantic differences
from just .split("\n"). It is also defined as working for any end-of-line
combination, but this is currently not implemented - it works only with
LF line-endings (which should be OK for text strings on any platforms,
but not OK for bytes).
2015-04-04 00:09:48 +03:00
Damien George 2e2e404ff7 py: Allow to compile with extra warnings (sign-compare, unused-param). 2015-03-19 00:25:33 +00:00
Damien George 98e3a64694 py: Remove duplicated mp_obj_str_make_new function from objstrunicode.c. 2015-01-28 14:14:57 +00:00
Paul Sokolovsky 344e15b1ae objstr: Remove code duplication and unbreak Windows build.
There was really weird warning (promoted to error) when building Windows
port. Exact cause is still unknown, but it uncovered another issue:
8-bit and unicode str_make_new implementations should be mutually exclusive,
and not built at the same time. What we had is that bytes_decode() pulled
8-bit str_make_new() even for unicode build.
2015-01-23 02:15:56 +02:00
Paul Sokolovsky 6113eb2f33 objstr*: Use separate names for locals_dict of 8-bit and unicode str's.
To somewhat unbreak -DSTATIC="" compile.
2015-01-23 02:05:58 +02:00
Damien George 0b9ee86133 py: Add mp_obj_new_str_from_vstr, and use it where relevant.
This patch allows to reuse vstr memory when creating str/bytes object.
This improves memory usage.

Also saves code ROM: 128 bytes on stmhal, 92 bytes on bare-arm, and 88
bytes on unix x64.
2015-01-21 23:17:27 +00:00
Damien George ff8dd3f486 py, unix: Allow to compile with -Wunused-parameter.
See issue #699.
2015-01-20 12:47:20 +00:00
Damien George 51dfcb4bb7 py: Move to guarded includes, everywhere in py/ core.
Addresses issue #1022.
2015-01-01 20:32:09 +00:00
Paul Sokolovsky e62a0fe367 objstr: Allow to convert any buffer proto object to str.
Original motivation is to support converting bytearrays, but easier to just
support buffer protocol at all.
2014-10-31 00:03:53 +02:00
Damien George cde0ca21bf py: Simplify JSON str printing (while still conforming to JSON spec).
The JSON specs are relatively flexible and allow us to use one function
to print strings, be they ascii, bytes or utf-8 encoded.
2014-09-25 17:35:56 +01:00
Damien George 612045f53f py: Add native json printing using existing print framework.
Also add start of ujson module with dumps implemented.  Enabled in unix
and stmhal ports.  Test passes on both.
2014-09-17 22:56:34 +01:00
Damien George 4abff7500f py: Change uint to mp_uint_t in runtime.h, stackctrl.h, binary.h.
Part of code cleanup, working towards resolving issue #50.
2014-08-30 14:59:21 +01:00
Damien George ecc88e949c Change some parts of the core API to use mp_uint_t instead of uint/int.
Addressing issue #50, still some way to go yet.
2014-08-30 00:35:11 +01:00
Damien George bb4c6f35c6 py: Make MP_OBJ_NEW_SMALL_INT cast arg to mp_int_t itself.
Addresses issue #724.
2014-07-31 10:49:14 +01:00
Damien George 40f3c02682 Rename machine_(u)int_t to mp_(u)int_t.
See discussion in issue #50.
2014-07-03 13:25:24 +01:00
Paul Sokolovsky 9e215fa4c2 py: Make unichar_charlen() accept/return machine_uint_t. 2014-06-28 23:15:29 +03:00
Damien George e04a44e2f6 py: Small comments, name changes, use of machine_int_t. 2014-06-28 10:27:23 +01:00
Paul Sokolovsky ea2c936c7e objstrunicode: Refactor str_index_to_ptr() following objstr. 2014-06-27 00:04:20 +03:00
Paul Sokolovsky 00c904b47a objstrunicode: Signedness issues. 2014-06-27 00:04:19 +03:00
Paul Sokolovsky 79b7fe2ee5 objstrunicode: Implement iterator. 2014-06-27 00:04:19 +03:00
Paul Sokolovsky cdc020da4b objstrunicode: Re-add buffer protocol back for now, required for io.StringIO. 2014-06-27 00:04:18 +03:00
Paul Sokolovsky e7f2b4c875 objstrunicode: Revamp len() handling for unicode, and optimize bool(). 2014-06-27 00:04:18 +03:00
Paul Sokolovsky 86d3898e70 objstrunicode: Get rid of bytes checking, it's separate type. 2014-06-27 00:04:18 +03:00
Paul Sokolovsky 9731912ccb py: Prune unneeded code from objstrunicode, reuse code in objstr. 2014-06-27 00:04:18 +03:00
Chris Angelico 64b468d873 objstrunicode: Basic implementation of unicode handling.
Squashed commit of the following:

commit 99dc21b67a
Author: Chris Angelico <rosuav@gmail.com>
Date:   Thu Jun 12 02:18:54 2014 +1000

    Optimize as per TODO (thanks Damien!)

commit 5bf0153eca
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 08:42:06 2014 +1000

    Test a default (= UTF-8) encode and decode

commit c962057ac3
Merge: e2c9782 195de32
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 05:23:03 2014 +1000

    Merge branch 'master' into unicode, resolving conflict on py/obj.h

commit e2c9782a65
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 05:05:57 2014 +1000

    More whitespace fixups

commit 086a2a0f57
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 05:04:20 2014 +1000

    Properly implement string slicing

commit 0d339a143e
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 02:24:11 2014 +1000

    Support slicing in str_index_to_ptr, and fix a bounds error

commit 24371c7267
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 02:10:22 2014 +1000

    Break out index-to-pointer calculation into a function

commit 616c24ac01
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 02:03:11 2014 +1000

    Add tests of string slicing, which currently fail

commit a24d19f676
Author: Chris Angelico <rosuav@gmail.com>
Date:   Tue Jun 10 01:56:53 2014 +1000

    Change string indexing to not precalculate the charlen, and add test for neg indexing

commit 0bcc7ab89e
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 22:09:17 2014 +1000

    Clean up constant qstr declarations now that charlen isn't needed

commit 5473e1a1db
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 07:18:42 2014 +1000

    Remove the charlen field from strings, calculating it when required

commit 5c1658ec71
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 07:11:27 2014 +1000

    Get rid of mp_obj_str_get_data_len() which was used in only one place

commit a019ba968b
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 06:58:26 2014 +1000

    Add a unichar_charlen() function to calculate length-in-characters from length-in-bytes

commit 44b0d5cff8
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 06:32:44 2014 +1000

    Use utf8_get/next_char in building up a string's repr

commit 30d1bad33f
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 06:10:45 2014 +1000

    Make utf8_get_char() and utf8_next_char() actually do what their names say

commit bc990dad9a
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sun Jun 8 02:10:59 2014 +1000

    Revert "Add PEP 393-flags to strings and stub usage."

    This reverts commit c239f50952.

commit f9bebb28ad
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 15:41:48 2014 +1000

    Whitespace fixes

commit 279de0c8eb
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 15:28:35 2014 +1000

    Formatting/layout improvements - introduce macros for UTF-8 byte detection, add braces. No functional changes.

commit f1911f53d5
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 11:56:02 2014 +1000

    Make chr() Unicode-aware

commit f51ad737b4
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 11:44:07 2014 +1000

    Make a string's repr Unicode-aware

commit 01bd686846
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 11:33:43 2014 +1000

    Expand the Unicode tests

commit 7bc91904f8
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 11:27:30 2014 +1000

    Record byte lengths for byte strings

commit bb13212071
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 11:25:06 2014 +1000

    Make ord() Unicode-aware

commit 03f0cbe905
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 10:24:35 2014 +1000

    Retain characters as UTF-8 encoded Unicode

commit e924659b85
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 08:37:27 2014 +1000

    Add support for \u and \U escapes, but not \N (with explanatory comment)

commit 231031ac5f
Author: Chris Angelico <rosuav@gmail.com>
Date:   Sat Jun 7 05:09:35 2014 +1000

    Add character length to qstr

commit 6df1b946fb
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 13:48:36 2014 +1000

    Add test of UTF-8 encoded source file resulting in properly formed string

commit 16429b81a8
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 13:44:15 2014 +1000

    Make len(s) return character length (even though creation's still buggy)

commit cd2cf6663c
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 13:15:36 2014 +1000

    HACK - When indexing a qstr, count its charlen. Stupidly inefficient but POC.

    All tests pass now, though string creation is still buggy.

commit 47c234584d
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 13:15:32 2014 +1000

    objstr: Record character length separately from byte length

    CAUTION: Buggy, may crash stuff - qstr needs equivalent functionality too

commit b0f41c72af
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 05:37:36 2014 +1000

    Beginnings of UTF-8 support - construct strings from that many UTF-8-encoded chars, and subscript bytes the same way

commit 89452be641
Author: Chris Angelico <rosuav@gmail.com>
Date:   Fri Jun 6 05:28:47 2014 +1000

    Update comments - now aiming for UTF-8 rather than PEP 393 strings

commit c239f50952
Author: Chris Angelico <rosuav@gmail.com>
Date:   Wed Jun 4 05:28:12 2014 +1000

    Add PEP 393-flags to strings and stub usage.

    The test suite all passes, but nothing has actually been changed.
2014-06-27 00:04:17 +03:00
Paul Sokolovsky 83865347db objstrunicode: Complete copy of objstr, to be patched for unicode support. 2014-06-27 00:04:17 +03:00