Regular tree grammar

In computer science, a regular tree grammar (RTG)^[1] is a formal grammar that describes a set of directed trees.

Definition

A regular tree grammar $G$ is defined by the tuple

$G=(N,\Sigma ,Z,P)$ ,

where

$N$ is a set of nonterminals,
$\Sigma$ is a ranked alphabet (i.e., an alphabet whose symbols have an associated arity) disjoint from $N$ ,
$Z$ is the starting nonterminal, with $Z\in N$ , and
$P$ is a set of productions of the form $A\rightarrow t$ , where $A\in N$ , and $t\in T_{\Sigma }(N)$ , where $T_{\Sigma }(N)$ is the associated term algebra, i.e. the set of all trees composed from symbols in $\Sigma \cup N$ according to their arities, where nonterminals are considered nullary.

Derivation of trees

The grammar $G$ implicitly defines a set of trees: any tree that can be derived from $Z$ using the rule set $P$ is said to be described by $G$ . This set of trees is known as the language of $G$ . To express this more formally, we define the relation $\Rightarrow _{G}$ on the set $T_{\Sigma }(N)$ as follows:

We say that $t_{1}\in T_{\Sigma }(N)$ can be derived in a single step into a tree $t_{2}\in T_{\Sigma }(N)$ (in short: $t_{1}\Rightarrow _{G}t_{2}$ ), if there is a context $S$ and a production $(A\rightarrow t)\in P$ such that:

$t_{1}=S[A]$ , and
$t_{2}=S[t]$ .

Here, a context means a tree with exactly one hole in it; if $S$ is such a context, we denote by $S[t]$ the result of filling the tree $t$ into the hole of $S$ .

The tree language generated by $G$ is the language $L(G)=\{t\in T_{\Sigma }|Z\Rightarrow _{G}^{*}t\}$ .

Here, $T_{\Sigma }$ denotes the set of all trees composed from symbols of $\Sigma$ , while $\Rightarrow _{G}^{*}$ denotes successive applications of $\Rightarrow _{G}$ .

A language generated by some regular tree grammar is called a regular tree language.

Examples

Example derivation from G₁

Let $G_{1}=(N_{1},\Sigma _{1},Z_{1},P_{1})$ , where

$N_{1}=\{Bool,BList\}$ is our set of nonterminals,
$\Sigma _{1}=\{true,false,nil,cons(.,.)\}$ is our ranked alphabet, arities indicated by dummy arguments (i.e. the symbol $cons$ has arity 2),
$Z_{1}=BList$ is our starting nonterminal, and
the set $P_{1}$ $P_{1}$ consists of the following productions:
- $Bool\rightarrow false$
- $Bool\rightarrow true$
- $BList\rightarrow nil$
- $BList\rightarrow cons(Bool,BList)$

An example derivation of the term $cons(false,cons(true,nil))$ from the grammar $G_{1}$ is shown in the image.

The tree language generated by $G_{1}$ is the set of all finite lists of boolean values, that is, $L(G_{1})$ happens to equal $T_{\Sigma _{1}}$ . The grammar $G_{1}$ corresponds to the algebraic data type declarations

  datatype Bool
    = false
    | true
  datatype BList
    = nil
    | cons of Bool * BList

in the Standard ML programming language: every member of $L(G_{1})$ corresponds to a Standard-ML value of type BList.

For another example, let $G_{2}=(N_{1},\Sigma _{1},BList1,P_{1}\cup P_{2})$ , using the nonterminal set and the alphabet from above, but extending the production set by $P_{2}$ , consisting of the following productions:

$BList1\rightarrow cons(true,BList)$
$BList1\rightarrow cons(false,BList1)$

The language $L(G_{2})$ is the set of all finite lists of boolean values that contain $true$ at least once. The set $L(G_{2})$ has no datatype counterpart in Standard ML, nor in any other functional language. It is a proper subset of $L(G_{1})$ . The above example term happens to be in $L(G_{2})$ , too, as the following derivation shows: $BList1\Rightarrow cons(false,BList1)\Rightarrow cons(false,cons(true,BList))\Rightarrow cons(false,cons(true,nil)).$

Language properties

If $L_{1},L_{2}$ both are regular tree languages, then the tree sets $L_{1}\cap L_{2}$ , $L_{1}\cup L_{2}$ , and $L_{1}\setminus L_{2}$ are also regular tree languages, and it is decidible whether $L_{1}\subseteq L_{2}$ , and whether $L_{1}=L_{2}$ .

Alternative characterizations and relation to other formal languages

As shown by Rajeev Alur and Parthasarathy Madhusudan^[2]^[3] the class of regular tree languages coincides with nested words and visibly pushdown languages.

The regular tree languages are also^[4] the languages recognized by bottom-up tree automata and nondeterministic top-down tree automata.

Regular tree grammars are a generalization of regular word grammars.

References

^ "Regular tree grammars as a formalism for scope underspecification". CiteSeer^x: 10.1.1.164.5484. {{cite journal}}: Cite journal requires |journal= (help)
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1007352.1007390, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/1007352.1007390 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1516512.1516518, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/1516512.1516518 instead.
^ Comon et al, Tree Automata Techniques and Applications, 1997

External links

Tree Automata Techniques and Applications

[1] "Regular tree grammars as a formalism for scope underspecification". CiteSeer^x: 10.1.1.164.5484. {{cite journal}}: Cite journal requires |journal= (help)

[Alur2004-2] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1007352.1007390, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/1007352.1007390 instead.

[Alur2009-3] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1516512.1516518, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/1516512.1516518 instead.

[Comon-4] Comon et al, Tree Automata Techniques and Applications, 1997

[1]

[2]

[3]

[4]