lark/README.md

195 lines
7.7 KiB
Markdown
Raw Normal View History

# Lark - a modern parsing library for Python
2017-02-05 11:23:18 +00:00
Parse any context-free grammar, FAST and EASY!
2017-02-05 11:23:18 +00:00
2018-01-24 12:44:09 +00:00
**Beginners**: Lark is not just another parser. It can parse any grammar you throw at it, no matter how complicated or ambiguous, and do so efficiently. It also constructs a parse-tree for you, without additional code on your part.
2017-03-08 23:06:43 +00:00
2019-03-28 13:53:28 +00:00
**Experts**: Lark implements both Earley(SPPF) and LALR(1), and several different lexers, so you can trade-off power and speed, according to your requirements. It also provides a variety of sophisticated features and utilities.
2017-02-05 11:23:18 +00:00
Lark can:
2017-02-05 11:23:18 +00:00
2019-02-04 23:38:39 +00:00
- Parse all context-free grammars, and handle any ambiguity
2017-08-04 13:58:55 +00:00
- Build a parse-tree automagically, no construction code required
2017-10-31 17:48:24 +00:00
- Outperform all other Python libraries when using LALR(1) (Yes, including PLY)
2017-08-04 13:58:55 +00:00
- Run on every Python interpreter (it's pure-python)
2018-01-10 12:45:56 +00:00
- Generate a stand-alone parser (for LALR(1) grammars)
2017-02-05 11:23:18 +00:00
And many more features. Read ahead and find out.
2017-02-05 11:23:18 +00:00
2017-10-31 17:48:24 +00:00
Most importantly, Lark will save you time and prevent you from getting parsing headaches.
### Quick links
2018-09-12 09:59:36 +00:00
- [Documentation @readthedocs](https://lark-parser.readthedocs.io/)
- [Cheatsheet (PDF)](/docs/lark_cheatsheet.pdf)
2019-03-28 13:53:28 +00:00
- [Tutorial](/docs/json_tutorial.md) for writing a JSON parser.
- Blog post: [How to write a DSL with Lark](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/)
2019-03-28 13:53:28 +00:00
- [Gitter chat](https://gitter.im/lark-parser/Lobby)
2017-02-05 11:23:18 +00:00
2017-11-02 11:47:05 +00:00
### Install Lark
$ pip install lark-parser
Lark has no dependencies.
[![Build Status](https://travis-ci.org/lark-parser/lark.svg?branch=master)](https://travis-ci.org/lark-parser/lark)
2017-11-02 12:23:43 +00:00
2019-12-11 10:47:56 +00:00
### Syntax Highlighting
2019-12-11 10:47:56 +00:00
Lark provides syntax highlighting for its grammar files (\*.lark):
- [Sublime Text & TextMate](https://github.com/lark-parser/lark_syntax)
- [vscode](https://github.com/lark-parser/vscode-lark)
2019-12-11 10:47:56 +00:00
### Clones
2019-12-12 07:24:51 +00:00
- [Lerche (Julia)](https://github.com/jamesrhester/Lerche.jl) - an unofficial clone, written entirely in Julia.
2017-03-05 12:44:46 +00:00
### Hello World
Here is a little program to parse "Hello, World!" (Or any other similar phrase):
```python
from lark import Lark
l = Lark('''start: WORD "," WORD "!"
%import common.WORD // imports from terminal library
%ignore " " // Disregard spaces in text
''')
print( l.parse("Hello, World!") )
```
And the output is:
```python
2017-02-11 13:51:47 +00:00
Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'World')])
```
Notice punctuation doesn't appear in the resulting tree. It's automatically filtered away by Lark.
### Fruit flies like bananas
Lark is great at handling ambiguity. Let's parse the phrase "fruit flies like bananas":
![fruitflies.png](examples/fruitflies.png)
2019-08-20 22:30:03 +00:00
See more [examples here](https://github.com/lark-parser/lark/tree/master/examples)
2017-02-05 11:23:18 +00:00
2017-02-11 13:51:47 +00:00
## List of main features
2017-11-02 12:23:43 +00:00
- Builds a parse-tree (AST) automagically, based on the structure of the grammar
- **Earley** parser
2018-01-09 15:00:53 +00:00
- Can parse all context-free grammars
- Full support for ambiguous grammars
2017-11-02 12:23:43 +00:00
- **LALR(1)** parser
2018-01-10 12:45:56 +00:00
- Fast and light, competitive with PLY
- Can generate a stand-alone parser
2018-01-24 12:44:09 +00:00
- **CYK** parser, for highly ambiguous grammars (NEW! Courtesy of [ehudt](https://github.com/ehudt))
2017-11-02 12:23:43 +00:00
- **EBNF** grammar
- **Unicode** fully supported
- **Python 2 & 3** compatible
- Automatic line & column tracking
- Standard library of terminals (strings, numbers, names, etc.)
- Import grammars from Nearley.js
- Extensive test suite [![codecov](https://codecov.io/gh/erezsh/lark/branch/master/graph/badge.svg)](https://codecov.io/gh/erezsh/lark)
- And much more!
2019-08-20 22:30:03 +00:00
See the full list of [features here](https://lark-parser.readthedocs.io/en/latest/features/)
2017-11-02 12:23:43 +00:00
2017-11-02 13:05:00 +00:00
### Comparison to other libraries
#### Performance comparison
2017-11-02 12:23:43 +00:00
2018-01-09 15:00:53 +00:00
Lark is the fastest and lightest (lower is better)
2017-11-02 14:04:41 +00:00
![Run-time Comparison](docs/comparison_runtime.png)
![Memory Usage Comparison](docs/comparison_memory.png)
2017-11-02 12:23:43 +00:00
Check out the [JSON tutorial](/docs/json_tutorial.md#conclusion) for more details on how the comparison was made.
2017-11-02 14:04:41 +00:00
*Note: I really wanted to add PLY to the benchmark, but I couldn't find a working JSON parser anywhere written in PLY. If anyone can point me to one that actually works, I would be happy to add it!*
2017-11-02 12:23:43 +00:00
2019-03-28 13:53:28 +00:00
*Note 2: The parsimonious code has been optimized for this specific test, unlike the other benchmarks (Lark included). Its "real-world" performance may not be as good.*
2017-11-02 13:05:00 +00:00
#### Feature comparison
2017-11-02 12:23:43 +00:00
2018-01-10 12:45:56 +00:00
| Library | Algorithm | Grammar | Builds tree? | Supports ambiguity? | Can handle every CFG? | Line/Column tracking | Generates Stand-alone
|:--------|:----------|:----|:--------|:------------|:------------|:----------|:----------
| **Lark** | Earley/LALR(1) | EBNF | Yes! | Yes! | Yes! | Yes! | Yes! (LALR only) |
| [PLY](http://www.dabeaz.com/ply/) | LALR(1) | BNF | No | No | No | No | No |
| [PyParsing](http://pyparsing.wikispaces.com/) | PEG | Combinators | No | No | No\* | No | No |
| [Parsley](https://pypi.python.org/pypi/Parsley) | PEG | EBNF | No | No | No\* | No | No |
| [Parsimonious](https://github.com/erikrose/parsimonious) | PEG | EBNF | Yes | No | No\* | No | No |
| [ANTLR](https://github.com/antlr/antlr4) | LL(*) | EBNF | Yes | No | Yes? | Yes | No |
2017-11-02 12:23:43 +00:00
2018-01-10 12:45:56 +00:00
(\* *PEGs cannot handle non-deterministic grammars. Also, according to Wikipedia, it remains unanswered whether PEGs can really parse all deterministic CFGs*)
2017-11-02 12:23:43 +00:00
2017-04-14 08:10:20 +00:00
### Projects using Lark
2019-05-27 11:40:11 +00:00
- [storyscript](https://github.com/storyscript/storyscript) - The programming language for Application Storytelling
- [tartiflette](https://github.com/dailymotion/tartiflette) - a GraphQL engine by Dailymotion. Lark is used to parse the GraphQL schemas definitions.
2019-05-27 11:44:40 +00:00
- [Hypothesis](https://github.com/HypothesisWorks/hypothesis) - Library for property-based testing
- [mappyfile](https://github.com/geographika/mappyfile) - a MapFile parser for working with MapServer configuration
2019-05-27 11:40:11 +00:00
- [synapse](https://github.com/vertexproject/synapse) - an intelligence analysis platform
- [Command-Block-Assembly](https://github.com/simon816/Command-Block-Assembly) - An assembly language, and C compiler, for Minecraft commands
- [SPFlow](https://github.com/SPFlow/SPFlow) - Library for Sum-Product Networks
2019-05-27 11:44:40 +00:00
- [Torchani](https://github.com/aiqm/torchani) - Accurate Neural Network Potential on PyTorch
2019-05-27 11:40:11 +00:00
- [required](https://github.com/shezadkhan137/required) - multi-field validation using docstrings
- [miniwdl](https://github.com/chanzuckerberg/miniwdl) - A static analysis toolkit for the Workflow Description Language
2019-05-27 11:44:40 +00:00
- [pytreeview](https://gitlab.com/parmenti/pytreeview) - a lightweight tree-based grammar explorer
2019-05-27 11:40:11 +00:00
2017-04-14 08:10:20 +00:00
Using Lark? Send me a message and I'll add your project!
### How to use Nearley grammars in Lark
Lark comes with a tool to convert grammars from [Nearley](https://github.com/Hardmath123/nearley), a popular Earley library for Javascript. It uses [Js2Py](https://github.com/PiotrDabkowski/Js2Py) to convert and run the Javascript postprocessing code segments.
Here's an example:
```bash
git clone https://github.com/Hardmath123/nearley
python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley > ncalc.py
```
You can use the output as a regular python module:
```python
>>> import ncalc
>>> ncalc.parse('sin(pi/4) ^ e')
0.38981434460254655
```
2017-03-05 12:39:52 +00:00
2017-02-05 11:23:18 +00:00
## License
2017-03-05 12:39:52 +00:00
Lark uses the [MIT license](LICENSE).
2017-02-05 11:23:18 +00:00
(The standalone tool is under GPL2)
2017-05-27 14:19:36 +00:00
## Contribute
Lark is currently accepting pull-requests. See [How to develop Lark](/docs/how_to_develop.md)
2018-09-13 22:02:45 +00:00
## Donate
If you like Lark and feel like donating, you can do so at my [patreon page](https://www.patreon.com/erezsh).
If you wish for a specific feature to get a higher priority, you can request it in a follow-up email, and I'll consider it favorably.
2017-02-05 11:23:18 +00:00
## Contact
2017-05-27 14:19:36 +00:00
If you have any questions or want my assistance, you can email me at erezshin at gmail com.
I'm also available for contract work.
2018-04-23 07:20:43 +00:00
-- [Erez](https://github.com/erezsh)