Update README.md

2020-06-26 11:47:00 -04:00 · 2020-06-26 11:47:00 -04:00 · c319ace48d
parent e22536fc9b
commit c319ace48d
1 changed files with 21 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -176,6 +176,27 @@ You can use the output as a regular python module:
 0.38981434460254655
 ```

+### Using Unicode character classes with `regex`
+Python's builtin `re` module has a few persistent known bugs and also won't parse
+advanced regex features such as character classes.
+With `pip install lark-parser[regex]`, the `regex` module will be installed alongside `lark`
+and can act as a drop-in replacement to `re`.
+
+Any instance of `Lark` instantiated with `regex=True` will now use the `regex` module
+instead of `re`. For example, we can now use character classes to match PEP-3131 compliant Python identifiers. 
+```python
+from lark import Lark
+>>> g = Lark(r"""
+                    ?start: NAME
+                    NAME: ID_START ID_CONTINUE*
+                    ID_START: /[\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}_]+/
+                    ID_CONTINUE: ID_START | /[\p{Mn}\p{Mc}\p{Nd}\p{Pc}·]+/
+                """, regex=True)
+
+>>> g.parse('வணக்கம்') 
+'வணக்கம்'
+
+```

 ## License