From 169b152f297e6c678ea10b36af889085dd27c527 Mon Sep 17 00:00:00 2001 From: Paul Sokolovsky Date: Wed, 17 Oct 2018 21:18:44 +0300 Subject: [PATCH] docs/ure: Fully describe supported syntax subset, add example. --- docs/library/ure.rst | 94 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 74 insertions(+), 20 deletions(-) diff --git a/docs/library/ure.rst b/docs/library/ure.rst index 6f9094028d..d2615e37d8 100644 --- a/docs/library/ure.rst +++ b/docs/library/ure.rst @@ -10,47 +10,101 @@ This module implements regular expression operations. Regular expression syntax supported is a subset of CPython ``re`` module (and actually is a subset of POSIX extended regular expressions). -Supported operators are: +Supported operators and special sequences are: -``'.'`` +``.`` Match any character. -``'[...]'`` +``[...]`` Match set of characters. Individual characters and ranges are supported, including negated sets (e.g. ``[^a-c]``). -``'^'`` +``^`` Match the start of the string. -``'$'`` +``$`` Match the end of the string. -``'?'`` - Match zero or one of the previous entity. +``?`` + Match zero or one of the previous sub-pattern. -``'*'`` - Match zero or more of the previous entity. +``*`` + Match zero or more of the previous sub-pattern. -``'+'`` - Match one or more of the previous entity. +``+`` + Match one or more of the previous sub-pattern. -``'??'`` +``??`` + Non-greedy version of ``?``, match zero or one, with the preference + for zero. -``'*?'`` +``*?`` + Non-greedy version of ``*``, match zero or more, with the preference + for the shortest match. -``'+?'`` +``+?`` + Non-greedy version of ``+``, match one or more, with the preference + for the shortest match. -``'|'`` - Match either the LHS or the RHS of this operator. +``|`` + Match either the left-hand side or the right-hand side sub-patterns of + this operator. -``'(...)'`` +``(...)`` Grouping. Each group is capturing (a substring it captures can be accessed with `match.group()` method). -**NOT SUPPORTED**: Counted repetitions (``{m,n}``), more advanced assertions -(``\b``, ``\B``), named groups (``(?P...)``), non-capturing groups -(``(?:...)``), etc. +``\d`` + Matches digit. Equivalent to ``[0-9]``. +``\D`` + Matches non-digit. Equivalent to ``[^0-9]``. + +``\s`` + Matches whitespace. Equivalent to ``[ \t-\r]``. + +``\S`` + Matches non-whitespace. Equivalent to ``[^ \t-\r]``. + +``\w`` + Matches "word characters" (ASCII only). Equivalent to ``[A-Za-z0-9_]``. + +``\W`` + Matches non "word characters" (ASCII only). Equivalent to ``[^A-Za-z0-9_]``. + +``\`` + Escape character. Any other character following the backslash, except + for those listed above, is taken literally. For example, ``\*`` is + equivalent to literal ``*`` (not treated as the ``*`` operator). + Note that ``\r``, ``\n``, etc. are not handled specially, and will be + equivalent to literal letters ``r``, ``n``, etc. Due to this, it's + not recommended to use raw Python strings (``r""``) for regular + expressions. For example, ``r"\r\n"`` when used as the regular + expression is equivalent to ``"rn"``. To match CR character followed + by LF, use ``"\r\n"``. + +**NOT SUPPORTED**: + +* counted repetitions (``{m,n}``) +* named groups (``(?P...)``) +* non-capturing groups (``(?:...)``) +* more advanced assertions (``\b``, ``\B``) +* special character escapes like ``\r``, ``\n`` - use Python's own escaping + instead +* etc. + +Example:: + + import ure + + # As ure doesn't support escapes itself, use of r"" strings is not + # recommended. + regex = ure.compile("[\r\n]") + + regex.split("line1\rline2\nline3\r\n") + + # Result: + # ['line1', 'line2', 'line3', '', ''] Functions ---------