mirror of https://github.com/lark-parser/lark.git
129 lines
2.9 KiB
Markdown
129 lines
2.9 KiB
Markdown
|
# Automatic Tree Construction - Reference
|
||
|
|
||
|
|
||
|
Lark builds a tree automatically based on the structure of the grammar, where each rule that is matched becomes a branch (node) in the tree, and its children are its matches, in the order of matching.
|
||
|
|
||
|
For example, the rule `node: child1 child2` will create a tree node with two children. If it is matched as part of another rule (i.e. if it isn't the root), the new rule's tree node will become its parent.
|
||
|
|
||
|
Using `item+` or `item*` will result in a list of items, equivalent to writing `item item item ..`.
|
||
|
|
||
|
### Terminals
|
||
|
|
||
|
Terminals are always values in the tree, never branches.
|
||
|
|
||
|
Lark filters out certain types of terminals by default, considering them punctuation:
|
||
|
|
||
|
- Terminals that won't appear in the tree are:
|
||
|
|
||
|
- Unnamed literals (like `"keyword"` or `"+"`)
|
||
|
- Terminals whose name starts with an underscore (like `_DIGIT`)
|
||
|
|
||
|
- Terminals that *will* appear in the tree are:
|
||
|
|
||
|
- Unnamed regular expressions (like `/[0-9]/`)
|
||
|
- Named terminals whose name starts with a letter (like `DIGIT`)
|
||
|
|
||
|
Rules prefixed with `!` will retain all their literals regardless.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
**Example:**
|
||
|
|
||
|
```perl
|
||
|
expr: "(" expr ")"
|
||
|
| NAME+
|
||
|
|
||
|
NAME: /\w+/
|
||
|
|
||
|
%ignore " "
|
||
|
```
|
||
|
|
||
|
Lark will parse "((hello world))" as:
|
||
|
|
||
|
expr
|
||
|
expr
|
||
|
expr
|
||
|
"hello"
|
||
|
"world"
|
||
|
|
||
|
The brackets do not appear in the tree by design. The words appear because they are matched by a named terminal.
|
||
|
|
||
|
|
||
|
# Shaping the tree
|
||
|
|
||
|
Users can alter the automatic construction of the tree using a collection of grammar features.
|
||
|
|
||
|
|
||
|
* Rules whose name begins with an underscore will be inlined into their containing rule.
|
||
|
|
||
|
**Example:**
|
||
|
|
||
|
```perl
|
||
|
start: "(" _greet ")"
|
||
|
_greet: /\w+/ /\w+/
|
||
|
```
|
||
|
|
||
|
Lark will parse "(hello world)" as:
|
||
|
|
||
|
start
|
||
|
"hello"
|
||
|
"world"
|
||
|
|
||
|
|
||
|
* Rules that receive a question mark (?) at the beginning of their definition, will be inlined if they have a single child, after filtering.
|
||
|
|
||
|
**Example:**
|
||
|
|
||
|
```ruby
|
||
|
start: greet greet
|
||
|
?greet: "(" /\w+/ ")"
|
||
|
| /\w+/ /\w+/
|
||
|
```
|
||
|
|
||
|
Lark will parse "hello world (planet)" as:
|
||
|
|
||
|
start
|
||
|
greet
|
||
|
"hello"
|
||
|
"world"
|
||
|
"planet"
|
||
|
|
||
|
* Rules that begin with an exclamation mark will keep all their terminals (they won't get filtered).
|
||
|
|
||
|
```perl
|
||
|
!expr: "(" expr ")"
|
||
|
| NAME+
|
||
|
NAME: /\w+/
|
||
|
%ignore " "
|
||
|
```
|
||
|
|
||
|
Will parse "((hello world))" as:
|
||
|
|
||
|
expr
|
||
|
(
|
||
|
expr
|
||
|
(
|
||
|
expr
|
||
|
hello
|
||
|
world
|
||
|
)
|
||
|
)
|
||
|
|
||
|
Using the `!` prefix is usually a "code smell", and may point to a flaw in your grammar design.
|
||
|
|
||
|
* Aliases - options in a rule can receive an alias. It will be then used as the branch name for the option, instead of the rule name.
|
||
|
|
||
|
**Example:**
|
||
|
|
||
|
```ruby
|
||
|
start: greet greet
|
||
|
greet: "hello"
|
||
|
| "world" -> planet
|
||
|
```
|
||
|
|
||
|
Lark will parse "hello world" as:
|
||
|
|
||
|
start
|
||
|
greet
|
||
|
planet
|