docs/ure: Fully describe supported syntax subset, add example.
This commit is contained in:
parent
1db55381b6
commit
169b152f29
|
@ -10,47 +10,101 @@ This module implements regular expression operations. Regular expression
|
|||
syntax supported is a subset of CPython ``re`` module (and actually is
|
||||
a subset of POSIX extended regular expressions).
|
||||
|
||||
Supported operators are:
|
||||
Supported operators and special sequences are:
|
||||
|
||||
``'.'``
|
||||
``.``
|
||||
Match any character.
|
||||
|
||||
``'[...]'``
|
||||
``[...]``
|
||||
Match set of characters. Individual characters and ranges are supported,
|
||||
including negated sets (e.g. ``[^a-c]``).
|
||||
|
||||
``'^'``
|
||||
``^``
|
||||
Match the start of the string.
|
||||
|
||||
``'$'``
|
||||
``$``
|
||||
Match the end of the string.
|
||||
|
||||
``'?'``
|
||||
Match zero or one of the previous entity.
|
||||
``?``
|
||||
Match zero or one of the previous sub-pattern.
|
||||
|
||||
``'*'``
|
||||
Match zero or more of the previous entity.
|
||||
``*``
|
||||
Match zero or more of the previous sub-pattern.
|
||||
|
||||
``'+'``
|
||||
Match one or more of the previous entity.
|
||||
``+``
|
||||
Match one or more of the previous sub-pattern.
|
||||
|
||||
``'??'``
|
||||
``??``
|
||||
Non-greedy version of ``?``, match zero or one, with the preference
|
||||
for zero.
|
||||
|
||||
``'*?'``
|
||||
``*?``
|
||||
Non-greedy version of ``*``, match zero or more, with the preference
|
||||
for the shortest match.
|
||||
|
||||
``'+?'``
|
||||
``+?``
|
||||
Non-greedy version of ``+``, match one or more, with the preference
|
||||
for the shortest match.
|
||||
|
||||
``'|'``
|
||||
Match either the LHS or the RHS of this operator.
|
||||
``|``
|
||||
Match either the left-hand side or the right-hand side sub-patterns of
|
||||
this operator.
|
||||
|
||||
``'(...)'``
|
||||
``(...)``
|
||||
Grouping. Each group is capturing (a substring it captures can be accessed
|
||||
with `match.group()` method).
|
||||
|
||||
**NOT SUPPORTED**: Counted repetitions (``{m,n}``), more advanced assertions
|
||||
(``\b``, ``\B``), named groups (``(?P<name>...)``), non-capturing groups
|
||||
(``(?:...)``), etc.
|
||||
``\d``
|
||||
Matches digit. Equivalent to ``[0-9]``.
|
||||
|
||||
``\D``
|
||||
Matches non-digit. Equivalent to ``[^0-9]``.
|
||||
|
||||
``\s``
|
||||
Matches whitespace. Equivalent to ``[ \t-\r]``.
|
||||
|
||||
``\S``
|
||||
Matches non-whitespace. Equivalent to ``[^ \t-\r]``.
|
||||
|
||||
``\w``
|
||||
Matches "word characters" (ASCII only). Equivalent to ``[A-Za-z0-9_]``.
|
||||
|
||||
``\W``
|
||||
Matches non "word characters" (ASCII only). Equivalent to ``[^A-Za-z0-9_]``.
|
||||
|
||||
``\``
|
||||
Escape character. Any other character following the backslash, except
|
||||
for those listed above, is taken literally. For example, ``\*`` is
|
||||
equivalent to literal ``*`` (not treated as the ``*`` operator).
|
||||
Note that ``\r``, ``\n``, etc. are not handled specially, and will be
|
||||
equivalent to literal letters ``r``, ``n``, etc. Due to this, it's
|
||||
not recommended to use raw Python strings (``r""``) for regular
|
||||
expressions. For example, ``r"\r\n"`` when used as the regular
|
||||
expression is equivalent to ``"rn"``. To match CR character followed
|
||||
by LF, use ``"\r\n"``.
|
||||
|
||||
**NOT SUPPORTED**:
|
||||
|
||||
* counted repetitions (``{m,n}``)
|
||||
* named groups (``(?P<name>...)``)
|
||||
* non-capturing groups (``(?:...)``)
|
||||
* more advanced assertions (``\b``, ``\B``)
|
||||
* special character escapes like ``\r``, ``\n`` - use Python's own escaping
|
||||
instead
|
||||
* etc.
|
||||
|
||||
Example::
|
||||
|
||||
import ure
|
||||
|
||||
# As ure doesn't support escapes itself, use of r"" strings is not
|
||||
# recommended.
|
||||
regex = ure.compile("[\r\n]")
|
||||
|
||||
regex.split("line1\rline2\nline3\r\n")
|
||||
|
||||
# Result:
|
||||
# ['line1', 'line2', 'line3', '', '']
|
||||
|
||||
Functions
|
||||
---------
|
||||
|
|
Loading…
Reference in New Issue