2 Philosophy
Erez Shinan edited this page 2017-11-04 21:10:46 +02:00

The philosophy behind Lark is simple: Parsers are innately difficult to write and understand. Even experts can become baffled by the nuances of this complicated state-machine. Therefore, Lark's main goal is to make the parser-writing process simple and abstract. It achieves this goal by following these main principles:

Design Principles

  1. Keep the grammar clean and simple

  2. Don't force the user to decide on things that the parser can figure out on its own

  3. It's okay to be opinionated

  4. Readability is more important than writability

  5. Usability is more important than performance*

* It's possible to achieve excellent performance with Lark, but that usually comes with a few extra steps.

Below is a list of the design choices I made, in accordance with these principles:

Separation of code from grammar

Grammars are the de-facto reference for your language, and the structure of your parse-tree. For any non-trivial language, the conflation of code and grammar always turns out convoluted and difficult to read.

The grammars in Lark are EBNF, so they are especially easy to read & work with.

Always build a parse-tree (unless told not to)

Trees are always simpler to work with than state-machines.

  1. Trees allow you to see the "state-machine" visually

  2. Trees allow your computation to be aware of previous and future states

  3. Trees allow you to process the parse in steps, instead of forcing you to do it at once.

See this answer in more detail here.

Since Lark can create trees automagically, according to the structure of your grammar, it always builds a tree. You can shape that tree in the grammar (to a degree), or in post-processing.

You can skip the creation of the tree for LALR(1), by providing a callback (see the JSON example).

Earley is the default

Although Earley is the slower alternative, it has the huge benefit of accepting any context-free grammar (i.e. any grammar you can write in EBNF, it can parse).

That means you can use Lark to toy around with your language, and worry about performance later. Don't forget: "Premature optimization is the root of all evil!"

Notes:

  • Some approaches and grammar structures are faster than others. If you care about speed, you can learn them gradually while your code is already working.
  • Nondeterminstic grammars and ambiguous grammars will run a little slower

Other design features

  • Automatically resolves terminal collisions (unless both are regular expressions)