2017-08-04 13:55:24 +00:00
# Lark - a modern parsing library for Python
2017-02-05 11:23:18 +00:00
2017-08-04 13:55:24 +00:00
Parse any context-free grammar, FAST and EASY!
2017-02-05 11:23:18 +00:00
2017-08-04 13:58:55 +00:00
**Beginners**: Forget everything you knew about parsers. Lark's algorithm can quickly parse any grammar you throw at it, no matter how complicated. It also constructs a parse-tree for you, without additional code on your part.
2017-03-08 23:06:43 +00:00
2017-08-04 13:55:24 +00:00
**Experts**: Lark lets you choose between Earley and LALR(1), to trade-off power and speed. It also contains experimental features such as a contextual-lexer.
2017-02-05 11:23:18 +00:00
2017-08-04 13:55:24 +00:00
Lark can:
2017-02-05 11:23:18 +00:00
2017-10-31 17:48:24 +00:00
- Parse all context-free grammars, and handle all ambiguity
2017-08-04 13:58:55 +00:00
- Build a parse-tree automagically, no construction code required
2017-10-31 17:48:24 +00:00
- Outperform all other Python libraries when using LALR(1) (Yes, including PLY)
2017-08-04 13:58:55 +00:00
- Run on every Python interpreter (it's pure-python)
2017-02-05 11:23:18 +00:00
2017-08-04 13:55:24 +00:00
And many more features. Read ahead and find out.
2017-02-05 11:23:18 +00:00
2017-10-31 17:48:24 +00:00
Most importantly, Lark will save you time and prevent you from getting parsing headaches.
2017-10-31 17:44:20 +00:00
### Quick links
2017-10-31 17:51:15 +00:00
- [Documentation wiki ](https://github.com/erezsh/lark/wiki )
2017-10-31 17:44:20 +00:00
- [Tutorial ](/docs/json_tutorial.md ) for writing a JSON parser.
- Blog post: [How to write a DSL with Lark ](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/ )
2017-02-05 11:23:18 +00:00
2017-03-05 12:44:46 +00:00
### Hello World
2017-02-08 10:19:10 +00:00
Here is a little program to parse "Hello, World!" (Or any other similar phrase):
```python
from lark import Lark
l = Lark('''start: WORD "," WORD "!"
2017-03-08 23:06:43 +00:00
%import common.WORD
2017-02-26 11:12:16 +00:00
%ignore " "
2017-02-08 10:19:10 +00:00
''')
print( l.parse("Hello, World!") )
```
And the output is:
```python
2017-02-11 13:51:47 +00:00
Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'World')])
2017-02-08 10:19:10 +00:00
```
Notice punctuation doesn't appear in the resulting tree. It's automatically filtered away by Lark.
2017-04-18 00:14:22 +00:00
### Fruit Flies Like Bananas
2017-10-31 17:44:20 +00:00
Lark is very good at handling ambiguity. Here's how it parses the phrase "fruit flies like bananas":
2017-04-18 00:14:22 +00:00
![fruitflies.png ](examples/fruitflies.png )
2017-10-31 17:51:15 +00:00
See more [examples in the wiki ](https://github.com/erezsh/lark/wiki/Examples )
2017-04-18 00:14:22 +00:00
2017-02-05 11:23:18 +00:00
2017-02-11 13:51:47 +00:00
2017-04-14 08:10:20 +00:00
### Install Lark
2017-02-11 13:51:47 +00:00
$ pip install lark-parser
Lark has no dependencies.
2017-02-10 09:50:50 +00:00
2017-04-14 08:10:20 +00:00
### Projects using Lark
2017-05-24 13:12:07 +00:00
- [mappyfile ](https://github.com/geographika/mappyfile ) - a MapFile parser for working with MapServer configuration
- [pytreeview ](https://gitlab.com/parmenti/pytreeview ) - a lightweight tree-based grammar explorer
2017-04-14 08:10:20 +00:00
Using Lark? Send me a message and I'll add your project!
### How to use Nearley grammars in Lark
Lark comes with a tool to convert grammars from [Nearley ](https://github.com/Hardmath123/nearley ), a popular Earley library for Javascript. It uses [Js2Py ](https://github.com/PiotrDabkowski/Js2Py ) to convert and run the Javascript postprocessing code segments.
Here's an example:
```bash
git clone https://github.com/Hardmath123/nearley
python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley > ncalc.py
```
You can use the output as a regular python module:
```python
>>> import ncalc
>>> ncalc.parse('sin(pi/4) ^ e')
0.38981434460254655
```
2017-02-10 09:50:50 +00:00
## List of Features
2017-02-05 11:23:18 +00:00
2017-10-31 17:44:20 +00:00
- Builds a parse-tree (AST) automagically, based on the structure of the grammar
2017-03-08 23:06:43 +00:00
- **Earley** parser
- Can parse *ALL* context-free grammars
2017-10-31 17:44:20 +00:00
- Full support for ambiguity in grammar
2017-03-08 23:06:43 +00:00
- **LALR(1)** parser
2017-10-31 17:44:20 +00:00
- Competitive with PLY
- **EBNF** grammar
2017-03-08 23:06:43 +00:00
- **Unicode** fully supported
- **Python 2 & 3** compatible
2017-10-31 17:44:20 +00:00
- Automatic line & column tracking
- Standard library of terminals (strings, numbers, names, etc.)
- Import grammars from Nearley.js
- Extensive test suite
2017-02-05 11:23:18 +00:00
2017-04-18 09:09:13 +00:00
[![codecov ](https://codecov.io/gh/erezsh/lark/branch/master/graph/badge.svg )](https://codecov.io/gh/erezsh/lark)
2017-04-20 23:56:11 +00:00
[![Build Status ](https://travis-ci.org/erezsh/lark.svg?branch=master )](https://travis-ci.org/erezsh/lark)
2017-04-18 09:09:13 +00:00
2017-10-31 17:51:15 +00:00
See the full list of [features in the wiki ](https://github.com/erezsh/lark/wiki/Features )
2017-02-08 10:19:10 +00:00
## Comparison to other parsers
2017-03-08 23:06:43 +00:00
### Lark does things a little differently
1. *Separates code from grammar* : Parsers written this way are cleaner and easier to read & work with.
2. *Automatically builds a parse tree (AST)* : Trees are always simpler to work with than state-machines. (But if you want to provide a callback for efficiency reasons, Lark lets you do that too)
3. *Follows Python's Idioms* : Beautiful is better than ugly. Readability counts.
2017-03-05 12:39:52 +00:00
### Lark is easier to use
2017-03-05 12:44:46 +00:00
- You can work with parse-trees instead of state-machines
2017-03-05 12:39:52 +00:00
- The grammar is simple to read and write
- There are no restrictions on grammar structure. Any grammar you write can be parsed.
2017-03-06 10:32:12 +00:00
- Some structures are faster than others. If you care about speed, you can learn them gradually while the parser is already working
2017-03-05 12:44:46 +00:00
- A well-written grammar is very fast
2017-03-05 12:39:52 +00:00
- Note: Nondeterminstic grammars will run a little slower
- Note: Ambiguous grammars (grammars that can be parsed in more than one way) are supported, but may cause significant slowdown if the ambiguity is too big)
- You don't have to worry about terminals (regexps) or rules colliding
- You can repeat expressions without losing efficiency (turns out that's a thing)
### Performance comparison
| Code | CPython Time | PyPy Time | CPython Mem | PyPy Mem
|:-----|:-------------|:------------|:----------|:---------
2017-03-20 17:09:08 +00:00
| **Lark - LALR(1)** | 4.7s | 1.2s | 70M | 134M |
2017-03-20 17:21:50 +00:00
| PyParsing | 32s | 3.5s | 443M | 225M |
| funcparserlib | 8.5s | 1.3s | 483M | 293M |
| Parsimonious | | 5.7s | | 1545M |
2017-03-05 12:39:52 +00:00
Check out the [JSON tutorial ](/docs/json_tutorial.md#conclusion ) for more details on how the comparison was made.
### Feature comparison
2017-02-10 11:19:32 +00:00
2017-10-31 17:44:20 +00:00
| Library | Algorithm | Grammar | Builds tree? | Supports ambiguity? | Can handle every CFG?
2017-10-31 17:51:15 +00:00
|:--------|:----------|:----|:--------|:------------|:------------
2017-10-31 17:44:20 +00:00
| **Lark** | Earley/LALR(1) | EBNF+ | Yes! | Yes! | Yes! |
| [PLY ](http://www.dabeaz.com/ply/ ) | LALR(1) | Yacc-like BNF | No | No | No |
| [PyParsing ](http://pyparsing.wikispaces.com/ ) | PEG | Parser combinators | No | No | No\* |
| [Parsley ](https://pypi.python.org/pypi/Parsley ) | PEG | EBNF-like | No | No | No\* |
| [funcparserlib ](https://github.com/vlasovskikh/funcparserlib ) | Recursive-Descent | Parser combinators | No | No | No |
| [Parsimonious ](https://github.com/erikrose/parsimonious ) | PEG | EBNF | Yes | No | No\* |
2017-02-08 10:19:10 +00:00
(\* *According to Wikipedia, it remains unanswered whether PEGs can really parse all deterministic CFGs* )
2017-02-05 11:23:18 +00:00
## License
2017-03-05 12:39:52 +00:00
Lark uses the [MIT license ](LICENSE ).
2017-02-05 11:23:18 +00:00
2017-05-27 14:19:36 +00:00
## Contribute
Lark is currently accepting pull-requests.
There are many ways you can help the project:
* Improve the performance of Lark's parsing algorithm
* Implement macros for grammars (important for grammar composition)
* Write new grammars for Lark's library
* Write & improve the documentation
* Write a blog post introducing Lark to your audience
If you're interested in taking one of these on, let me know and I will provide more details and assist you in the process.
2017-02-05 11:23:18 +00:00
## Contact
2017-05-27 14:19:36 +00:00
If you have any questions or want my assistance, you can email me at erezshin at gmail com.
I'm also available for contract work.