A Kleene Theorem with Atoms

There are many existing regular expressions for infinite alphabets (here, here). Below is another one.

One way of stating the Kleene theorem is that the finite automata recognize exactly the languages in the smallest class of languages that contains all finite languages, and is closed under language concatenation and Kleene star. Below is a similar theorem, but with an atomic version of the Kleene star. The theorem is formulated for equivariant languages.

Definition (Atomic star). A atomic star consists of a directed graph, with a distinguished source and target vertex, and a labelling $\lambda$ of edges by languages, subject to the following conditions:

the vertices, edges, and $\lambda$ are equivariant;
the vertices have one orbit.

A stronger requirement would be to require the edges to have one orbit. This stronger requirement will be discussed in a separate post.

The semantics of a generalized star is defined to be

$[G] = \bigcup_{v_1 \cdots v_n} \lambda(v_1) \cdots \lambda(v_n)$

with the union ranging over paths $v_1 \cdots v_n$ in the graph that begin in the source and end in the target vertex. Given these, $f^G \subseteq A^*$ is defined to be the union of languages

$f(v_1) \cdots f(v_n) \subseteq A^*$

ranging over source-to-target paths $v_1 \cdots v_n$ in $G$ .

An atomic star corresponds essentially to an automaton which has a one-orbit state space, a one initial and one accepting state, and where transitions are labelled by languages. Without atoms, this corresponds to the standard Kleene star.

Example 1. Consider the equality atoms. The input alphabet is the atoms, and the language is all sequences of atoms. This language is the orbit finite union

$\bigcup_{a \in \atoms} L_{a}$

where $L_{a}$ is the sequences of atoms that begin in $a$ . The set $L_{a}$ is defined by a single instance of the atomic star: the vertices are $\atoms$ , the edges are $\atoms \times \atoms$ , the source is $a$ , the target is $a$ (any other atom would do), and the labelling is $(c,d) \mapsto c$ .

Example 2. In the previous example, we used atomic stars where the set of edges had two orbits, namely $\atoms \times \atoms$ . For the particular language $\atoms^*$ , at the cost of making the expression more complicated, it suffices to use atomic stars where the edges have one orbit. For an atom $a \in \atoms^*$ , the language $a^*$ is generated by taking the vertices to be $\atoms$ , the edges to be the equality, the source being $a$ , the target being $a$ , and the labeling being $(b,b) \mapsto b$ . The set of all sequences in $\atoms^*$ that begin in $a$ is generated by taking the vertices to be $\atoms$ , the edges to be disequality, the source being $a$ , the target being $a$ , and the labelling being $(b,c) \mapsto b^*$ .

Definition (Atomic regular expression). An atomic regular expression is an expression (where the syntax tree is orbit-finitely branching) which uses: single letters, the empty word, binary concatenation, orbit-finite union, and atomic star with the labelling using already defined atomic regular expressions.

Theorem. Assume that the atoms are oligomorphic. Then an equivariant language is definable by a regular expression if and only if it is recognized by a nondeterministic orbit-finite automaton.

Proof. The same as usual. Let us do the more difficult direction from automata to expressions. If a language is equivariant and recognized by a nondeterministic automaton, then it is recognized by an equivariant nondeterministic automaton. Suppose that the automaton has states $Q$ . For an equivariant subset of states $P \subseteq Q$ and states $p,q \in Q$ , define define $L_{p,P,q}$ to be the set of words that are accepted by a run that begins in state $p$ , ends in state $q$ and uses states from $P$ in all other positions. By induction on the number of equivariant orbits in $P$ , we prove that $L_{p,P,q}$ is defined by a regular expression.

The main observation is that when $p$ and $q$ are in the same orbit (which is the atmoic version of $p=q$ ), then $L_{p,P,q}$ can be obtained by applying the atomic star to languages where instead of $P$ , we use $P$ minus the orbit of $p,q$ . More formally, suppose that $P_0 \subseteq P$ is an equivariant orbit. Given states $p,q \in P_0$ , consider the generalized star $G_{p,q}$ defined as follows:

the vertices are $P_0$ ;
the edges are $P_0 \times P_0$ ;
the source vertex is $p$ and the target vertex is $q$ ;
the labelling is $(r,s) \mapsto L_{r,P-P_0,s}$ .

The labels of the edges in $G_{p,q}$ are definable by regular expressions thanks to the induction assumption. It is also easy to see that $G_{p,q}$ describes exactly the language $L_{p,P-P_0,q}$ . Finally, for any states $r,s \in P$ , the language $L_{r,P,s}$ can be obtained, using orbit-finite union and concatenation, from languages of the form $G_{p,q}$ and $L_{p,P-P_0,q}$ . The assumption on oligomorphism is used to ensure that a set of pairs of states is orbit-finite. $\Box$

Atompress

A Kleene Theorem with Atoms

Leave a Reply Cancel reply

Computation with atoms