docs/ure: Fully describe supported syntax subset, add example.

This commit is contained in:
Paul Sokolovsky 2018-10-17 21:18:44 +03:00 committed by Damien George
parent 1db55381b6
commit 169b152f29
1 changed files with 74 additions and 20 deletions

View File

@ -10,47 +10,101 @@ This module implements regular expression operations. Regular expression
syntax supported is a subset of CPython ``re`` module (and actually is
a subset of POSIX extended regular expressions).
Supported operators are:
Supported operators and special sequences are:
``'.'``
``.``
Match any character.
``'[...]'``
``[...]``
Match set of characters. Individual characters and ranges are supported,
including negated sets (e.g. ``[^a-c]``).
``'^'``
``^``
Match the start of the string.
``'$'``
``$``
Match the end of the string.
``'?'``
Match zero or one of the previous entity.
``?``
Match zero or one of the previous sub-pattern.
``'*'``
Match zero or more of the previous entity.
``*``
Match zero or more of the previous sub-pattern.
``'+'``
Match one or more of the previous entity.
``+``
Match one or more of the previous sub-pattern.
``'??'``
``??``
Non-greedy version of ``?``, match zero or one, with the preference
for zero.
``'*?'``
``*?``
Non-greedy version of ``*``, match zero or more, with the preference
for the shortest match.
``'+?'``
``+?``
Non-greedy version of ``+``, match one or more, with the preference
for the shortest match.
``'|'``
Match either the LHS or the RHS of this operator.
``|``
Match either the left-hand side or the right-hand side sub-patterns of
this operator.
``'(...)'``
``(...)``
Grouping. Each group is capturing (a substring it captures can be accessed
with `match.group()` method).
**NOT SUPPORTED**: Counted repetitions (``{m,n}``), more advanced assertions
(``\b``, ``\B``), named groups (``(?P<name>...)``), non-capturing groups
(``(?:...)``), etc.
``\d``
Matches digit. Equivalent to ``[0-9]``.
``\D``
Matches non-digit. Equivalent to ``[^0-9]``.
``\s``
Matches whitespace. Equivalent to ``[ \t-\r]``.
``\S``
Matches non-whitespace. Equivalent to ``[^ \t-\r]``.
``\w``
Matches "word characters" (ASCII only). Equivalent to ``[A-Za-z0-9_]``.
``\W``
Matches non "word characters" (ASCII only). Equivalent to ``[^A-Za-z0-9_]``.
``\``
Escape character. Any other character following the backslash, except
for those listed above, is taken literally. For example, ``\*`` is
equivalent to literal ``*`` (not treated as the ``*`` operator).
Note that ``\r``, ``\n``, etc. are not handled specially, and will be
equivalent to literal letters ``r``, ``n``, etc. Due to this, it's
not recommended to use raw Python strings (``r""``) for regular
expressions. For example, ``r"\r\n"`` when used as the regular
expression is equivalent to ``"rn"``. To match CR character followed
by LF, use ``"\r\n"``.
**NOT SUPPORTED**:
* counted repetitions (``{m,n}``)
* named groups (``(?P<name>...)``)
* non-capturing groups (``(?:...)``)
* more advanced assertions (``\b``, ``\B``)
* special character escapes like ``\r``, ``\n`` - use Python's own escaping
instead
* etc.
Example::
import ure
# As ure doesn't support escapes itself, use of r"" strings is not
# recommended.
regex = ure.compile("[\r\n]")
regex.split("line1\rline2\nline3\r\n")
# Result:
# ['line1', 'line2', 'line3', '', '']
Functions
---------