mirror of https://github.com/lark-parser/lark.git
Update README.md
This commit is contained in:
parent
e22536fc9b
commit
c319ace48d
21
README.md
21
README.md
|
@ -176,6 +176,27 @@ You can use the output as a regular python module:
|
|||
0.38981434460254655
|
||||
```
|
||||
|
||||
### Using Unicode character classes with `regex`
|
||||
Python's builtin `re` module has a few persistent known bugs and also won't parse
|
||||
advanced regex features such as character classes.
|
||||
With `pip install lark-parser[regex]`, the `regex` module will be installed alongside `lark`
|
||||
and can act as a drop-in replacement to `re`.
|
||||
|
||||
Any instance of `Lark` instantiated with `regex=True` will now use the `regex` module
|
||||
instead of `re`. For example, we can now use character classes to match PEP-3131 compliant Python identifiers.
|
||||
```python
|
||||
from lark import Lark
|
||||
>>> g = Lark(r"""
|
||||
?start: NAME
|
||||
NAME: ID_START ID_CONTINUE*
|
||||
ID_START: /[\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}_]+/
|
||||
ID_CONTINUE: ID_START | /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}·]+/
|
||||
""", regex=True)
|
||||
|
||||
>>> g.parse('வணக்கம்')
|
||||
'வணக்கம்'
|
||||
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
|
|
Loading…
Reference in New Issue