A Kleene Theorem with Atoms

There are many existing regular expressions for infinite alphabets (here, here). Below is another one.

One way of stating the Kleene theorem is that the finite automata recognize exactly the languages in the  smallest class of languages that contains all finite languages, and is closed under language concatenation and Kleene star. Below is a similar theorem, but with an atomic version of the Kleene star.  The theorem is formulated for equivariant languages.

Definition (Atomic star). A atomic star consists of a directed graph, with a distinguished source and target vertex, and a labelling \lambda of edges by languages, subject to the following conditions:

  • the vertices, edges, and \lambda are equivariant;
  • the vertices have one orbit.

A stronger requirement would be to require the edges to have one orbit. This stronger requirement will be discussed in a separate post.

The semantics of a generalized star is defined to be 

    \[[G] = \bigcup_{v_1 \cdots v_n} \lambda(v_1) \cdots \lambda(v_n)\]

with the union ranging over  paths v_1 \cdots v_n in the graph that begin in the source and end in the target vertex. Given these,  f^G \subseteq A^* is defined to be the union of languages  

    \[f(v_1) \cdots f(v_n) \subseteq A^*\]

ranging over source-to-target paths v_1 \cdots v_n in G

An atomic star corresponds essentially to an automaton which has a one-orbit state space, a one initial and one accepting state, and where transitions are labelled by languages. Without atoms, this corresponds to the  standard Kleene star. 

Example 1. Consider the equality atoms. The input alphabet is the atoms, and the language is all sequences of atoms. This language is the orbit finite union

    \[\bigcup_{a \in \atoms} L_{a}\]

where L_{a} is the sequences of atoms that begin in a. The set L_{a} is defined by a single instance of the atomic star:  the vertices are \atoms, the edges are \atoms \times \atoms, the source is a, the target is a (any other atom would do),  and the labelling is (c,d) \mapsto c.

Example 2. In the previous example, we used atomic stars where the set of edges had two orbits, namely \atoms \times \atoms. For the particular language \atoms^*, at the cost of making the expression more complicated, it suffices to use atomic stars where the edges have one orbit.  For an atom a \in \atoms^*, the language a^* is generated by taking the vertices to be \atoms, the edges to be the equality, the source being a, the target being a, and the labeling being (b,b) \mapsto b.  The set of all sequences  in \atoms^* that begin in a  is generated by taking the vertices to be \atoms, the edges to be disequality, the source being a, the target being a, and the labelling being (b,c) \mapsto b^*.  

 

Definition (Atomic regular expression).  An atomic regular expression is an expression (where the syntax tree is orbit-finitely branching) which uses: single letters, the empty word, binary concatenation, orbit-finite union, and atomic star with the labelling using already defined atomic regular expressions.

Theorem. Assume that the atoms are oligomorphic. Then an equivariant language is definable by a regular expression if and only if it is recognized by a nondeterministic orbit-finite automaton.

Proof. The same as usual. Let us do the more difficult direction from automata to expressions. If a language is equivariant and recognized by a nondeterministic automaton, then it is recognized by an equivariant nondeterministic automaton. Suppose that the automaton has states Q.   For an equivariant subset of states P \subseteq Q and states p,q \in Q, define  define L_{p,P,q} to be the set of words that are accepted by a run that  begins in state p, ends in state q and uses states from P in all other positions.  By induction on the number of equivariant orbits in P, we prove that L_{p,P,q} is defined by a regular expression.

The main observation is that when p and q are in the same orbit (which is the atmoic version of  p=q), then L_{p,P,q} can be obtained by applying the atomic star to languages where instead of P, we use P minus the orbit of p,q. More formally, suppose that  P_0 \subseteq P is an  equivariant orbit.  Given states p,q \in P_0, consider the generalized star G_{p,q} defined as follows:

  • the vertices are P_0;
  • the edges are P_0 \times P_0;
  • the source vertex is p and the target vertex is q;
  • the labelling is (r,s) \mapsto L_{r,P-P_0,s}.

The labels of the edges in G_{p,q} are definable by regular expressions thanks to the induction assumption. It is also easy to see that G_{p,q} describes exactly the language L_{p,P-P_0,q}.  Finally, for any states  r,s \in P, the language L_{r,P,s} can be obtained, using orbit-finite union and concatenation, from languages of the form G_{p,q} and L_{p,P-P_0,q}.  The assumption on oligomorphism is used to ensure that a set of pairs of states is orbit-finite. \Box

Leave a Reply

Your email address will not be published. Required fields are marked *