lark/README.md

# Lark - a modern parsing library for Python

Parse any context-free grammar, FAST and EASY!

**Beginners**: Forget everything you knew about parsers. Lark's algorithm can quickly parse any grammar you throw at it, no matter how complicated. It also constructs a parse-tree for you, without additional code on your part.

**Experts**: Lark lets you choose between Earley and LALR(1), to trade-off power and speed. It also contains experimental features such as a contextual-lexer.

Lark can:

 - Parse all context-free grammars, and handle all ambiguity
 - Build a parse-tree automagically, no construction code required
 - Outperform all other Python libraries when using LALR(1) (Yes, including PLY)
 - Run on every Python interpreter (it's pure-python)

And many more features. Read ahead and find out.

Most importantly, Lark will save you time and prevent you from getting parsing headaches.

### Quick links

- [Documentation wiki](https://github.com/erezsh/lark/wiki)
- [Tutorial](/docs/json_tutorial.md) for writing a JSON parser.
- Blog post: [How to write a DSL with Lark](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/)

### Hello World

Here is a little program to parse "Hello, World!" (Or any other similar phrase):

```python
from lark import Lark
l = Lark('''start: WORD "," WORD "!"
            %import common.WORD
            %ignore " "
         ''')
print( l.parse("Hello, World!") )
```

And the output is:

```python
Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'World')])
```

Notice punctuation doesn't appear in the resulting tree. It's automatically filtered away by Lark.

### Fruit Flies Like Bananas

Lark is very good at handling ambiguity. Here's how it parses the phrase "fruit flies like bananas":

![fruitflies.png](examples/fruitflies.png)

See more [examples in the wiki](https://github.com/erezsh/lark/wiki/Examples)


### Install Lark

    $ pip install lark-parser

Lark has no dependencies.

### Projects using Lark

 - [mappyfile](https://github.com/geographika/mappyfile) - a MapFile parser for working with MapServer configuration
 - [pytreeview](https://gitlab.com/parmenti/pytreeview) - a lightweight tree-based grammar explorer

Using Lark? Send me a message and I'll add your project!

### How to use Nearley grammars in Lark

Lark comes with a tool to convert grammars from [Nearley](https://github.com/Hardmath123/nearley), a popular Earley library for Javascript. It uses [Js2Py](https://github.com/PiotrDabkowski/Js2Py) to convert and run the Javascript postprocessing code segments.

Here's an example:
```bash
git clone https://github.com/Hardmath123/nearley
python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley > ncalc.py
```

You can use the output as a regular python module:

```python
>>> import ncalc
>>> ncalc.parse('sin(pi/4) ^ e')
0.38981434460254655
```

## List of Features

 - Builds a parse-tree (AST) automagically, based on the structure of the grammar
 - **Earley** parser
    - Can parse *ALL* context-free grammars
    - Full support for ambiguity in grammar
 - **LALR(1)** parser
    - Competitive with PLY
 - **EBNF** grammar
 - **Unicode** fully supported
 - **Python 2 & 3** compatible
 - Automatic line & column tracking
 - Standard library of terminals (strings, numbers, names, etc.)
 - Import grammars from Nearley.js
 - Extensive test suite

[![codecov](https://codecov.io/gh/erezsh/lark/branch/master/graph/badge.svg)](https://codecov.io/gh/erezsh/lark)
[![Build Status](https://travis-ci.org/erezsh/lark.svg?branch=master)](https://travis-ci.org/erezsh/lark)

See the full list of [features in the wiki](https://github.com/erezsh/lark/wiki/Features)

## Comparison to other parsers

### Lark does things a little differently

1. *Separates code from grammar*: Parsers written this way are cleaner and easier to read & work with.

2. *Automatically builds a parse tree (AST)*: Trees are always simpler to work with than state-machines. (But if you want to provide a callback for efficiency reasons, Lark lets you do that too)

3. *Follows Python's Idioms*: Beautiful is better than ugly. Readability counts.


### Lark is easier to use

- You can work with parse-trees instead of state-machines
- The grammar is simple to read and write
- There are no restrictions on grammar structure. Any grammar you write can be parsed.
    - Some structures are faster than others. If you care about speed, you can learn them gradually while the parser is already working
    - A well-written grammar is very fast
    - Note: Nondeterminstic grammars will run a little slower
    - Note: Ambiguous grammars (grammars that can be parsed in more than one way) are supported, but may cause significant slowdown if the ambiguity is too big)
- You don't have to worry about terminals (regexps) or rules colliding
- You can repeat expressions without losing efficiency (turns out that's a thing)

### Performance comparison

| Code | CPython Time | PyPy Time | CPython Mem | PyPy Mem
|:-----|:-------------|:------------|:----------|:---------
| **Lark - LALR(1)** | 4.7s | 1.2s | 70M | 134M |
| PyParsing | 32s | 3.5s | 443M | 225M |
| funcparserlib | 8.5s | 1.3s | 483M | 293M |
| Parsimonious | | 5.7s | | 1545M |

Check out the [JSON tutorial](/docs/json_tutorial.md#conclusion) for more details on how the comparison was made.


### Feature comparison

| Library | Algorithm | Grammar | Builds tree? | Supports ambiguity? | Can handle every CFG?
|:--------|:----------|:----|:--------|:------------|:------------
| **Lark** | Earley/LALR(1) | EBNF+ | Yes! | Yes! | Yes! |
| [PLY](http://www.dabeaz.com/ply/) | LALR(1) | Yacc-like BNF | No | No | No |
| [PyParsing](http://pyparsing.wikispaces.com/) | PEG | Parser combinators | No | No | No\* |
| [Parsley](https://pypi.python.org/pypi/Parsley) | PEG | EBNF-like | No | No | No\* |
| [funcparserlib](https://github.com/vlasovskikh/funcparserlib) | Recursive-Descent | Parser combinators | No | No | No |
| [Parsimonious](https://github.com/erikrose/parsimonious) | PEG | EBNF | Yes | No | No\* |


(\* *According to Wikipedia, it remains unanswered whether PEGs can really parse all deterministic CFGs*)

## License

Lark uses the [MIT license](LICENSE).

## Contribute

Lark is currently accepting pull-requests.

There are many ways you can help the project:

* Improve the performance of Lark's parsing algorithm
* Implement macros for grammars (important for grammar composition)
* Write new grammars for Lark's library
* Write & improve the documentation
* Write a blog post introducing Lark to your audience

If you're interested in taking one of these on, let me know and I will provide more details and assist you in the process.

## Contact

If you have any questions or want my assistance, you can email me at erezshin at gmail com.

I'm also available for contract work.
Improved README. Dynamic lexer now generates tokens. 2017-08-04 13:55:24 +00:00			`# Lark - a modern parsing library for Python`
Added a short README file 2017-02-05 11:23:18 +00:00
Improved README. Dynamic lexer now generates tokens. 2017-08-04 13:55:24 +00:00			`Parse any context-free grammar, FAST and EASY!`
Added a short README file 2017-02-05 11:23:18 +00:00
More README work 2017-08-04 13:58:55 +00:00			`Beginners: Forget everything you knew about parsers. Lark's algorithm can quickly parse any grammar you throw at it, no matter how complicated. It also constructs a parse-tree for you, without additional code on your part.`
Improved README 2017-03-08 23:06:43 +00:00
Improved README. Dynamic lexer now generates tokens. 2017-08-04 13:55:24 +00:00			`Experts: Lark lets you choose between Earley and LALR(1), to trade-off power and speed. It also contains experimental features such as a contextual-lexer.`
Added a short README file 2017-02-05 11:23:18 +00:00
Improved README. Dynamic lexer now generates tokens. 2017-08-04 13:55:24 +00:00			`Lark can:`
Added a short README file 2017-02-05 11:23:18 +00:00
Fixed README 2017-10-31 17:48:24 +00:00			`- Parse all context-free grammars, and handle all ambiguity`
More README work 2017-08-04 13:58:55 +00:00			`- Build a parse-tree automagically, no construction code required`
Fixed README 2017-10-31 17:48:24 +00:00			`- Outperform all other Python libraries when using LALR(1) (Yes, including PLY)`
More README work 2017-08-04 13:58:55 +00:00			`- Run on every Python interpreter (it's pure-python)`
Added a short README file 2017-02-05 11:23:18 +00:00
Improved README. Dynamic lexer now generates tokens. 2017-08-04 13:55:24 +00:00			`And many more features. Read ahead and find out.`
Added a short README file 2017-02-05 11:23:18 +00:00
Fixed README 2017-10-31 17:48:24 +00:00			`Most importantly, Lark will save you time and prevent you from getting parsing headaches.`

Main README is now shorter and to the point. Added a short examples README 2017-10-31 17:44:20 +00:00			`### Quick links`

Fixed README 2017-10-31 17:51:15 +00:00			`- [Documentation wiki](https://github.com/erezsh/lark/wiki)`
Main README is now shorter and to the point. Added a short examples README 2017-10-31 17:44:20 +00:00			`- [Tutorial](/docs/json_tutorial.md) for writing a JSON parser.`
			`- Blog post: [How to write a DSL with Lark](http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/)`
Added a short README file 2017-02-05 11:23:18 +00:00
Fixed stuff in README 2017-03-05 12:44:46 +00:00			`### Hello World`
README: Added hello world and parser comparison 2017-02-08 10:19:10 +00:00
			`Here is a little program to parse "Hello, World!" (Or any other similar phrase):`

			```python
			`from lark import Lark`
			`l = Lark('''start: WORD "," WORD "!"`
Improved README 2017-03-08 23:06:43 +00:00			`%import common.WORD`
Updated docs to match v0.2 2017-02-26 11:12:16 +00:00			`%ignore " "`
README: Added hello world and parser comparison 2017-02-08 10:19:10 +00:00			`''')`
			`print( l.parse("Hello, World!") )`
			```

			`And the output is:`

			```python
Small additions to the docs 2017-02-11 13:51:47 +00:00			`Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'World')])`
README: Added hello world and parser comparison 2017-02-08 10:19:10 +00:00			```

			`Notice punctuation doesn't appear in the resulting tree. It's automatically filtered away by Lark.`

Improved README and added tree-to-pydot utility function 2017-04-18 00:14:22 +00:00			`### Fruit Flies Like Bananas`

Main README is now shorter and to the point. Added a short examples README 2017-10-31 17:44:20 +00:00			`Lark is very good at handling ambiguity. Here's how it parses the phrase "fruit flies like bananas":`
Improved README and added tree-to-pydot utility function 2017-04-18 00:14:22 +00:00
			`![fruitflies.png](examples/fruitflies.png)`

Fixed README 2017-10-31 17:51:15 +00:00			`See more [examples in the wiki](https://github.com/erezsh/lark/wiki/Examples)`
Improved README and added tree-to-pydot utility function 2017-04-18 00:14:22 +00:00
Added a short README file 2017-02-05 11:23:18 +00:00
Small additions to the docs 2017-02-11 13:51:47 +00:00
Added Nearley and mappyfile to README 2017-04-14 08:10:20 +00:00			`### Install Lark`
Small additions to the docs 2017-02-11 13:51:47 +00:00
			`$ pip install lark-parser`

			`Lark has no dependencies.`
Improved lexer, added profiler option to Lark 2017-02-10 09:50:50 +00:00
Added Nearley and mappyfile to README 2017-04-14 08:10:20 +00:00			`### Projects using Lark`

README: Added 'pytreeview' to projects using lark 2017-05-24 13:12:07 +00:00			`- [mappyfile](https://github.com/geographika/mappyfile) - a MapFile parser for working with MapServer configuration`
			`- [pytreeview](https://gitlab.com/parmenti/pytreeview) - a lightweight tree-based grammar explorer`
Added Nearley and mappyfile to README 2017-04-14 08:10:20 +00:00
			`Using Lark? Send me a message and I'll add your project!`

			`### How to use Nearley grammars in Lark`

			`Lark comes with a tool to convert grammars from [Nearley](https://github.com/Hardmath123/nearley), a popular Earley library for Javascript. It uses [Js2Py](https://github.com/PiotrDabkowski/Js2Py) to convert and run the Javascript postprocessing code segments.`

			`Here's an example:`
			```bash
			`git clone https://github.com/Hardmath123/nearley`
			`python -m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley > ncalc.py`
			```

			`You can use the output as a regular python module:`

			```python
			`>>> import ncalc`
			`>>> ncalc.parse('sin(pi/4) ^ e')`
			`0.38981434460254655`
			```

Improved lexer, added profiler option to Lark 2017-02-10 09:50:50 +00:00			`## List of Features`
Added a short README file 2017-02-05 11:23:18 +00:00
Main README is now shorter and to the point. Added a short examples README 2017-10-31 17:44:20 +00:00			`- Builds a parse-tree (AST) automagically, based on the structure of the grammar`
Improved README 2017-03-08 23:06:43 +00:00			`- Earley parser`
			`- Can parse ALL context-free grammars`
Main README is now shorter and to the point. Added a short examples README 2017-10-31 17:44:20 +00:00			`- Full support for ambiguity in grammar`
Improved README 2017-03-08 23:06:43 +00:00			`- LALR(1) parser`
Main README is now shorter and to the point. Added a short examples README 2017-10-31 17:44:20 +00:00			`- Competitive with PLY`
			`- EBNF grammar`
Improved README 2017-03-08 23:06:43 +00:00			`- Unicode fully supported`
			`- Python 2 & 3 compatible`
Main README is now shorter and to the point. Added a short examples README 2017-10-31 17:44:20 +00:00			`- Automatic line & column tracking`
			`- Standard library of terminals (strings, numbers, names, etc.)`
			`- Import grammars from Nearley.js`
			`- Extensive test suite`
Added a short README file 2017-02-05 11:23:18 +00:00
Added codecov badge 2017-04-18 09:09:13 +00:00			`[![codecov](https://codecov.io/gh/erezsh/lark/branch/master/graph/badge.svg)](https://codecov.io/gh/erezsh/lark)`
Added Build Status to README 2017-04-20 23:56:11 +00:00			`[![Build Status](https://travis-ci.org/erezsh/lark.svg?branch=master)](https://travis-ci.org/erezsh/lark)`
Added codecov badge 2017-04-18 09:09:13 +00:00
Fixed README 2017-10-31 17:51:15 +00:00			`See the full list of [features in the wiki](https://github.com/erezsh/lark/wiki/Features)`
README: Added hello world and parser comparison 2017-02-08 10:19:10 +00:00
			`## Comparison to other parsers`

Improved README 2017-03-08 23:06:43 +00:00			`### Lark does things a little differently`

			`1. Separates code from grammar: Parsers written this way are cleaner and easier to read & work with.`

			`2. Automatically builds a parse tree (AST): Trees are always simpler to work with than state-machines. (But if you want to provide a callback for efficiency reasons, Lark lets you do that too)`

			`3. Follows Python's Idioms: Beautiful is better than ugly. Readability counts.`


Added stuff to README 2017-03-05 12:39:52 +00:00			`### Lark is easier to use`

Fixed stuff in README 2017-03-05 12:44:46 +00:00			`- You can work with parse-trees instead of state-machines`
Added stuff to README 2017-03-05 12:39:52 +00:00			`- The grammar is simple to read and write`
			`- There are no restrictions on grammar structure. Any grammar you write can be parsed.`
Re-wrote the Earley parser to use a parse-forest It now knows how to resolve ambiguity! And in a memory-efficient way! 2017-03-06 10:32:12 +00:00			`- Some structures are faster than others. If you care about speed, you can learn them gradually while the parser is already working`
Fixed stuff in README 2017-03-05 12:44:46 +00:00			`- A well-written grammar is very fast`
Added stuff to README 2017-03-05 12:39:52 +00:00			`- Note: Nondeterminstic grammars will run a little slower`
			`- Note: Ambiguous grammars (grammars that can be parsed in more than one way) are supported, but may cause significant slowdown if the ambiguity is too big)`
			`- You don't have to worry about terminals (regexps) or rules colliding`
			`- You can repeat expressions without losing efficiency (turns out that's a thing)`

			`### Performance comparison`

			`\| Code \| CPython Time \| PyPy Time \| CPython Mem \| PyPy Mem`
			`\|:-----\|:-------------\|:------------\|:----------\|:---------`
Updated benchmarks in README 2017-03-20 17:09:08 +00:00			`\| Lark - LALR(1) \| 4.7s \| 1.2s \| 70M \| 134M \|`
Another little update to the README benchmarks 2017-03-20 17:21:50 +00:00			`\| PyParsing \| 32s \| 3.5s \| 443M \| 225M \|`
			`\| funcparserlib \| 8.5s \| 1.3s \| 483M \| 293M \|`
			`\| Parsimonious \| \| 5.7s \| \| 1545M \|`
Added stuff to README 2017-03-05 12:39:52 +00:00
			`Check out the [JSON tutorial](/docs/json_tutorial.md#conclusion) for more details on how the comparison was made.`


			`### Feature comparison`
Optimized the tree builder, and updated docs 2017-02-10 11:19:32 +00:00
Main README is now shorter and to the point. Added a short examples README 2017-10-31 17:44:20 +00:00			`\| Library \| Algorithm \| Grammar \| Builds tree? \| Supports ambiguity? \| Can handle every CFG?`
Fixed README 2017-10-31 17:51:15 +00:00			`\|:--------\|:----------\|:----\|:--------\|:------------\|:------------`
Main README is now shorter and to the point. Added a short examples README 2017-10-31 17:44:20 +00:00			`\| Lark \| Earley/LALR(1) \| EBNF+ \| Yes! \| Yes! \| Yes! \|`
			`\| [PLY](http://www.dabeaz.com/ply/) \| LALR(1) \| Yacc-like BNF \| No \| No \| No \|`
			`\| [PyParsing](http://pyparsing.wikispaces.com/) \| PEG \| Parser combinators \| No \| No \| No\* \|`
			`\| [Parsley](https://pypi.python.org/pypi/Parsley) \| PEG \| EBNF-like \| No \| No \| No\* \|`
			`\| [funcparserlib](https://github.com/vlasovskikh/funcparserlib) \| Recursive-Descent \| Parser combinators \| No \| No \| No \|`
			`\| [Parsimonious](https://github.com/erikrose/parsimonious) \| PEG \| EBNF \| Yes \| No \| No\* \|`
README: Added hello world and parser comparison 2017-02-08 10:19:10 +00:00

			`(\* According to Wikipedia, it remains unanswered whether PEGs can really parse all deterministic CFGs)`

Added a short README file 2017-02-05 11:23:18 +00:00			`## License`

Added stuff to README 2017-03-05 12:39:52 +00:00			`Lark uses the [MIT license](LICENSE).`
Added a short README file 2017-02-05 11:23:18 +00:00
Added Contribute section to README 2017-05-27 14:19:36 +00:00			`## Contribute`

			`Lark is currently accepting pull-requests.`

			`There are many ways you can help the project:`

			`* Improve the performance of Lark's parsing algorithm`
			`* Implement macros for grammars (important for grammar composition)`
			`* Write new grammars for Lark's library`
			`* Write & improve the documentation`
			`* Write a blog post introducing Lark to your audience`

			`If you're interested in taking one of these on, let me know and I will provide more details and assist you in the process.`

Added a short README file 2017-02-05 11:23:18 +00:00			`## Contact`

Added Contribute section to README 2017-05-27 14:19:36 +00:00			`If you have any questions or want my assistance, you can email me at erezshin at gmail com.`

			`I'm also available for contract work.`