7 Tree Construction
Erez Shinan edited this page 2018-04-21 17:13:57 +03:00

Lark builds a tree automatically based on the structure of the grammar. It follows the following rules:

  • Each rule is a branch (node) in the tree, and its children are its matches, in the order of matching.

  • Rules can be expanded (inlined). See "Shaping the tree" below.

  • Inside rules, using item+ or item* will result in a list of items.

  • Terminals (tokens) are always values in the tree, never branches.

  • Terminals that won't appear in the tree are:

    • Unnamed literals (like "keyword" or "+")
    • Terminals whose name starts with an underscore (like _DIGIT)
  • Terminals that will appear in the tree are:

    • Unnamed regular expressions (like /[0-9]/)
    • Named terminals whose name starts with a letter (like DIGIT)

The resulting parse-tree (when unshaped) is a direct equivalent of a classical parse-tree. Applying a Transformer to it is equivalent to providing a callback to the parser.

Example:

    expr: "(" expr ")"
        | NAME+

    NAME: /\w+/

    %ignore " "

Lark will parse "((hello world))" as:

expr
    expr
        expr
            "hello"
            "world"

The brackets do not appear in the tree by design. The words appear because they are matched by a named terminal.

However, it's possible to keep all the tokens of a rule, by prefixing it with !:

    !expr: "(" expr ")"
         | NAME+
    NAME: /\w+/
    %ignore " "

Will parse "((hello world))" as:

expr
  (
  expr
    (
    expr
      hello
      world
    )
  )

Shaping the tree

  1. Rules whose name begins with an underscore will be inlined into their containing rule.

Example:

    start: "(" _greet ")"
    _greet: /\w+/ /\w+/

Lark will parse "(hello world)" as:

start
    "hello"
    "world"
  1. Rules that receive a question mark (?) at the beginning of their definition, will be inlined if they have a single child.

Example:

    start: greet greet
    ?greet: "(" /\w+/ ")"
          | /\w+/ /\w+/

Lark will parse "hello world (planet)" as:

start
    greet
        "hello"
        "world"
    "planet"
  1. Rules that begin with an exclamation mark will keep all their terminals (they won't get filtered).

  2. Aliases - options in a rule can receive an alias. It will be then used as the branch name for the option.

Example:

    start: greet greet
    greet: "hello" -> hello
         | "world"

Lark will parse "hello world" as:

start
    hello
    greet