docs/ure: Fully describe supported syntax subset, add example.

2018-10-17 21:18:44 +03:00 · 2018-10-17 21:18:44 +03:00 · 169b152f29
parent 1db55381b6
commit 169b152f29
1 changed files with 74 additions and 20 deletions
--- a/docs/library/ure.rst
+++ b/docs/library/ure.rst
@ -10,47 +10,101 @@ This module implements regular expression operations. Regular expression
 syntax supported is a subset of CPython ``re`` module (and actually is
 a subset of POSIX extended regular expressions).

-Supported operators are:
+Supported operators and special sequences are:

-``'.'``
+``.``
   Match any character.

-``'[...]'``
+``[...]``
   Match set of characters. Individual characters and ranges are supported,
   including negated sets (e.g. ``[^a-c]``).

-``'^'``
+``^``
   Match the start of the string.

-``'$'``
+``$``
   Match the end of the string.

-``'?'``
-   Match zero or one of the previous entity.
+``?``
+   Match zero or one of the previous sub-pattern.

-``'*'``
-   Match zero or more of the previous entity.
+``*``
+   Match zero or more of the previous sub-pattern.

-``'+'``
-   Match one or more of the previous entity.
+``+``
+   Match one or more of the previous sub-pattern.

-``'??'``
+``??``
+   Non-greedy version of ``?``, match zero or one, with the preference
+   for zero.

-``'*?'``
+``*?``
+   Non-greedy version of ``*``, match zero or more, with the preference
+   for the shortest match.

-``'+?'``
+``+?``
+   Non-greedy version of ``+``, match one or more, with the preference
+   for the shortest match.

-``'|'``
-   Match either the LHS or the RHS of this operator.
+``|``
+   Match either the left-hand side or the right-hand side sub-patterns of
+   this operator.

-``'(...)'``
+``(...)``
   Grouping. Each group is capturing (a substring it captures can be accessed
   with `match.group()` method).

-**NOT SUPPORTED**: Counted repetitions (``{m,n}``), more advanced assertions
-(``\b``, ``\B``), named groups (``(?P<name>...)``), non-capturing groups
-(``(?:...)``), etc.
+``\d``
+   Matches digit. Equivalent to ``[0-9]``.

+``\D``
+   Matches non-digit. Equivalent to ``[^0-9]``.
+
+``\s``
+   Matches whitespace. Equivalent to ``[ \t-\r]``.
+
+``\S``
+   Matches non-whitespace. Equivalent to ``[^ \t-\r]``.
+
+``\w``
+   Matches "word characters" (ASCII only). Equivalent to ``[A-Za-z0-9_]``.
+
+``\W``
+   Matches non "word characters" (ASCII only). Equivalent to ``[^A-Za-z0-9_]``.
+
+``\``
+   Escape character. Any other character following the backslash, except
+   for those listed above, is taken literally. For example, ``\*`` is
+   equivalent to literal ``*`` (not treated as the ``*`` operator).
+   Note that ``\r``, ``\n``, etc. are not handled specially, and will be
+   equivalent to literal letters ``r``, ``n``, etc. Due to this, it's
+   not recommended to use raw Python strings (``r""``) for regular
+   expressions. For example, ``r"\r\n"`` when used as the regular
+   expression is equivalent to ``"rn"``. To match CR character followed
+   by LF, use ``"\r\n"``.
+
+**NOT SUPPORTED**:
+
+* counted repetitions (``{m,n}``)
+* named groups (``(?P<name>...)``)
+* non-capturing groups (``(?:...)``)
+* more advanced assertions (``\b``, ``\B``)
+* special character escapes like ``\r``, ``\n`` - use Python's own escaping
+  instead
+* etc.
+
+Example::
+
+    import ure
+
+    # As ure doesn't support escapes itself, use of r"" strings is not
+    # recommended.
+    regex = ure.compile("[\r\n]")
+
+    regex.split("line1\rline2\nline3\r\n")
+
+    # Result:
+    # ['line1', 'line2', 'line3', '', '']

 Functions
 ---------