2017-07-02 00:01:47 +01:00
|
|
|
:mod:`ure` -- simple regular expressions
|
|
|
|
========================================
|
2014-12-03 22:06:44 +00:00
|
|
|
|
|
|
|
.. module:: ure
|
|
|
|
:synopsis: regular expressions
|
|
|
|
|
2017-07-02 13:37:31 +01:00
|
|
|
|see_cpython_module| :mod:`python:re`.
|
|
|
|
|
2014-12-03 22:06:44 +00:00
|
|
|
This module implements regular expression operations. Regular expression
|
|
|
|
syntax supported is a subset of CPython ``re`` module (and actually is
|
|
|
|
a subset of POSIX extended regular expressions).
|
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
Supported operators and special sequences are:
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``.``
|
2014-12-03 22:06:44 +00:00
|
|
|
Match any character.
|
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``[...]``
|
2017-11-07 22:24:39 +00:00
|
|
|
Match set of characters. Individual characters and ranges are supported,
|
|
|
|
including negated sets (e.g. ``[^a-c]``).
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``^``
|
2018-07-02 05:52:43 +01:00
|
|
|
Match the start of the string.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``$``
|
2018-07-02 05:52:43 +01:00
|
|
|
Match the end of the string.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``?``
|
|
|
|
Match zero or one of the previous sub-pattern.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``*``
|
|
|
|
Match zero or more of the previous sub-pattern.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``+``
|
|
|
|
Match one or more of the previous sub-pattern.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``??``
|
|
|
|
Non-greedy version of ``?``, match zero or one, with the preference
|
|
|
|
for zero.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``*?``
|
|
|
|
Non-greedy version of ``*``, match zero or more, with the preference
|
|
|
|
for the shortest match.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``+?``
|
|
|
|
Non-greedy version of ``+``, match one or more, with the preference
|
|
|
|
for the shortest match.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``|``
|
|
|
|
Match either the left-hand side or the right-hand side sub-patterns of
|
|
|
|
this operator.
|
2017-10-31 12:45:26 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``(...)``
|
2017-07-02 00:01:47 +01:00
|
|
|
Grouping. Each group is capturing (a substring it captures can be accessed
|
|
|
|
with `match.group()` method).
|
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``\d``
|
|
|
|
Matches digit. Equivalent to ``[0-9]``.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-10-17 19:18:44 +01:00
|
|
|
``\D``
|
|
|
|
Matches non-digit. Equivalent to ``[^0-9]``.
|
|
|
|
|
|
|
|
``\s``
|
|
|
|
Matches whitespace. Equivalent to ``[ \t-\r]``.
|
|
|
|
|
|
|
|
``\S``
|
|
|
|
Matches non-whitespace. Equivalent to ``[^ \t-\r]``.
|
|
|
|
|
|
|
|
``\w``
|
|
|
|
Matches "word characters" (ASCII only). Equivalent to ``[A-Za-z0-9_]``.
|
|
|
|
|
|
|
|
``\W``
|
|
|
|
Matches non "word characters" (ASCII only). Equivalent to ``[^A-Za-z0-9_]``.
|
|
|
|
|
|
|
|
``\``
|
|
|
|
Escape character. Any other character following the backslash, except
|
|
|
|
for those listed above, is taken literally. For example, ``\*`` is
|
|
|
|
equivalent to literal ``*`` (not treated as the ``*`` operator).
|
|
|
|
Note that ``\r``, ``\n``, etc. are not handled specially, and will be
|
|
|
|
equivalent to literal letters ``r``, ``n``, etc. Due to this, it's
|
|
|
|
not recommended to use raw Python strings (``r""``) for regular
|
|
|
|
expressions. For example, ``r"\r\n"`` when used as the regular
|
|
|
|
expression is equivalent to ``"rn"``. To match CR character followed
|
|
|
|
by LF, use ``"\r\n"``.
|
|
|
|
|
|
|
|
**NOT SUPPORTED**:
|
|
|
|
|
|
|
|
* counted repetitions (``{m,n}``)
|
|
|
|
* named groups (``(?P<name>...)``)
|
|
|
|
* non-capturing groups (``(?:...)``)
|
|
|
|
* more advanced assertions (``\b``, ``\B``)
|
|
|
|
* special character escapes like ``\r``, ``\n`` - use Python's own escaping
|
|
|
|
instead
|
|
|
|
* etc.
|
|
|
|
|
|
|
|
Example::
|
|
|
|
|
|
|
|
import ure
|
|
|
|
|
|
|
|
# As ure doesn't support escapes itself, use of r"" strings is not
|
|
|
|
# recommended.
|
|
|
|
regex = ure.compile("[\r\n]")
|
|
|
|
|
|
|
|
regex.split("line1\rline2\nline3\r\n")
|
|
|
|
|
|
|
|
# Result:
|
|
|
|
# ['line1', 'line2', 'line3', '', '']
|
2014-12-03 22:06:44 +00:00
|
|
|
|
|
|
|
Functions
|
|
|
|
---------
|
|
|
|
|
2017-11-03 22:26:31 +00:00
|
|
|
.. function:: compile(regex_str, [flags])
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2017-07-02 00:01:47 +01:00
|
|
|
Compile regular expression, return `regex <regex>` object.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2017-07-02 00:01:47 +01:00
|
|
|
.. function:: match(regex_str, string)
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2017-07-02 00:01:47 +01:00
|
|
|
Compile *regex_str* and match against *string*. Match always happens
|
|
|
|
from starting position in a string.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2017-07-02 00:01:47 +01:00
|
|
|
.. function:: search(regex_str, string)
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2017-07-02 00:01:47 +01:00
|
|
|
Compile *regex_str* and search it in a *string*. Unlike `match`, this will search
|
2014-12-03 22:06:44 +00:00
|
|
|
string for first position which matches regex (which still may be
|
|
|
|
0 if regex is anchored).
|
|
|
|
|
2018-07-02 05:47:53 +01:00
|
|
|
.. function:: sub(regex_str, replace, string, count=0, flags=0)
|
|
|
|
|
|
|
|
Compile *regex_str* and search for it in *string*, replacing all matches
|
|
|
|
with *replace*, and returning the new string.
|
|
|
|
|
|
|
|
*replace* can be a string or a function. If it is a string then escape
|
|
|
|
sequences of the form ``\<number>`` and ``\g<number>`` can be used to
|
|
|
|
expand to the corresponding group (or an empty string for unmatched groups).
|
|
|
|
If *replace* is a function then it must take a single argument (the match)
|
|
|
|
and should return a replacement string.
|
|
|
|
|
|
|
|
If *count* is specified and non-zero then substitution will stop after
|
|
|
|
this many substitutions are made. The *flags* argument is ignored.
|
|
|
|
|
|
|
|
Note: availability of this function depends on `MicroPython port`.
|
|
|
|
|
2014-12-03 22:06:44 +00:00
|
|
|
.. data:: DEBUG
|
|
|
|
|
|
|
|
Flag value, display debug information about compiled expression.
|
2017-11-03 22:26:31 +00:00
|
|
|
(Availability depends on `MicroPython port`.)
|
2014-12-03 22:06:44 +00:00
|
|
|
|
|
|
|
|
2017-07-02 00:01:47 +01:00
|
|
|
.. _regex:
|
|
|
|
|
2014-12-03 22:06:44 +00:00
|
|
|
Regex objects
|
|
|
|
-------------
|
|
|
|
|
|
|
|
Compiled regular expression. Instances of this class are created using
|
2017-07-02 00:01:47 +01:00
|
|
|
`ure.compile()`.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
|
|
|
.. method:: regex.match(string)
|
2017-07-02 00:01:47 +01:00
|
|
|
regex.search(string)
|
2018-07-02 05:47:53 +01:00
|
|
|
regex.sub(replace, string, count=0, flags=0)
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2018-07-02 05:47:53 +01:00
|
|
|
Similar to the module-level functions :meth:`match`, :meth:`search`
|
|
|
|
and :meth:`sub`.
|
2017-07-02 00:01:47 +01:00
|
|
|
Using methods is (much) more efficient if the same regex is applied to
|
|
|
|
multiple strings.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
|
|
|
.. method:: regex.split(string, max_split=-1)
|
|
|
|
|
2017-07-02 00:01:47 +01:00
|
|
|
Split a *string* using regex. If *max_split* is given, it specifies
|
|
|
|
maximum number of splits to perform. Returns list of strings (there
|
|
|
|
may be up to *max_split+1* elements if it's specified).
|
2014-12-03 22:06:44 +00:00
|
|
|
|
|
|
|
Match objects
|
|
|
|
-------------
|
|
|
|
|
2018-07-02 05:47:53 +01:00
|
|
|
Match objects as returned by `match()` and `search()` methods, and passed
|
|
|
|
to the replacement function in `sub()`.
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2019-02-07 21:48:34 +00:00
|
|
|
.. method:: match.group(index)
|
2014-12-03 22:06:44 +00:00
|
|
|
|
2017-07-02 00:01:47 +01:00
|
|
|
Return matching (sub)string. *index* is 0 for entire match,
|
|
|
|
1 and above for each capturing group. Only numeric groups are supported.
|
2018-07-02 05:47:53 +01:00
|
|
|
|
|
|
|
.. method:: match.groups()
|
|
|
|
|
|
|
|
Return a tuple containing all the substrings of the groups of the match.
|
|
|
|
|
|
|
|
Note: availability of this method depends on `MicroPython port`.
|
|
|
|
|
|
|
|
.. method:: match.start([index])
|
|
|
|
match.end([index])
|
|
|
|
|
|
|
|
Return the index in the original string of the start or end of the
|
|
|
|
substring group that was matched. *index* defaults to the entire
|
|
|
|
group, otherwise it will select a group.
|
|
|
|
|
|
|
|
Note: availability of these methods depends on `MicroPython port`.
|
|
|
|
|
|
|
|
.. method:: match.span([index])
|
|
|
|
|
|
|
|
Returns the 2-tuple ``(match.start(index), match.end(index))``.
|
|
|
|
|
|
|
|
Note: availability of this method depends on `MicroPython port`.
|