Update README.md

This commit is contained in:
julienmalard 2020-06-26 11:47:00 -04:00
parent e22536fc9b
commit c319ace48d
1 changed files with 21 additions and 0 deletions

View File

@ -176,6 +176,27 @@ You can use the output as a regular python module:
0.38981434460254655
```
### Using Unicode character classes with `regex`
Python's builtin `re` module has a few persistent known bugs and also won't parse
advanced regex features such as character classes.
With `pip install lark-parser[regex]`, the `regex` module will be installed alongside `lark`
and can act as a drop-in replacement to `re`.
Any instance of `Lark` instantiated with `regex=True` will now use the `regex` module
instead of `re`. For example, we can now use character classes to match PEP-3131 compliant Python identifiers.
```python
from lark import Lark
>>> g = Lark(r"""
?start: NAME
NAME: ID_START ID_CONTINUE*
ID_START: /[\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}_]+/
ID_CONTINUE: ID_START | /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}·]+/
""", regex=True)
>>> g.parse('வணக்கம்')
'வணக்கம்'
```
## License