Table of Contents
Lark builds a tree automatically based on the structure of the grammar. It follows the following rules:
-
Each rule is a branch (node) in the tree, and its children are its matches, in the order of matching.
-
Rules can be expanded (inlined). See "Shaping the tree" below.
-
Inside rules, using item+ or item* will result in a list of items.
-
Terminals (tokens) are always values in the tree, never branches.
-
Terminals that won't appear in the tree are:
- Unnamed literals (like "keyword" or "+")
- Terminals whose name starts with an underscore (like _DIGIT)
-
Terminals that will appear in the tree are:
- Unnamed regular expressions (like /[0-9]/)
- Named terminals whose name starts with a letter (like DIGIT)
The resulting parse-tree (when unshaped) is a direct equivalent of a classical parse-tree. Applying a Transformer to it is equivalent to providing a callback to the parser.
Example:
expr: "(" expr ")"
| NAME+
NAME: /\w+/
%ignore " "
Lark will parse "((hello world))" as:
expr
expr
expr
"hello"
"world"
The brackets do not appear in the tree by design. The words appear because they are matched by a named terminal.
However, it's possible to keep all the tokens of a rule, by prefixing it with !
:
!expr: "(" expr ")"
| NAME+
NAME: /\w+/
%ignore " "
Will parse "((hello world))" as:
expr
(
expr
(
expr
hello
world
)
)
Shaping the tree
- Rules whose name begins with an underscore will be inlined into their containing rule.
Example:
start: "(" _greet ")"
_greet: /\w+/ /\w+/
Lark will parse "(hello world)" as:
start
"hello"
"world"
- Rules that receive a question mark (?) at the beginning of their definition, will be inlined if they have a single child.
Example:
start: greet greet
?greet: "(" /\w+/ ")"
| /\w+/ /\w+/
Lark will parse "hello world (planet)" as:
start
greet
"hello"
"world"
"planet"
-
Rules that begin with an exclamation mark will keep all their terminals (they won't get filtered).
-
Aliases - options in a rule can receive an alias. It will be then used as the branch name for the option.
Example:
start: greet greet
greet: "hello" -> hello
| "world"
Lark will parse "hello world" as:
start
hello
greet