There are many existing regular expressions for infinite alphabets (here, here). Below is another one.
One way of stating the Kleene theorem is that the finite automata recognize exactly the languages in the smallest class of languages that contains all finite languages, and is closed under language concatenation and Kleene star. Below is a similar theorem, but with an atomic version of the Kleene star. The theorem is formulated for equivariant languages.
Definition (Atomic star). A atomic star consists of a directed graph, with a distinguished source and target vertex, and a labelling of edges by languages, subject to the following conditions:
- the vertices, edges, and
are equivariant;
- the vertices have one orbit.
A stronger requirement would be to require the edges to have one orbit. This stronger requirement will be discussed in a separate post.
The semantics of a generalized star is defined to be
with the union ranging over paths in the graph that begin in the source and end in the target vertex. Given these,
is defined to be the union of languages
ranging over source-to-target paths in
.
An atomic star corresponds essentially to an automaton which has a one-orbit state space, a one initial and one accepting state, and where transitions are labelled by languages. Without atoms, this corresponds to the standard Kleene star.
Example 1. Consider the equality atoms. The input alphabet is the atoms, and the language is all sequences of atoms. This language is the orbit finite union
where is the sequences of atoms that begin in
. The set
is defined by a single instance of the atomic star: the vertices are
, the edges are
, the source is
, the target is
(any other atom would do), and the labelling is
.
Example 2. In the previous example, we used atomic stars where the set of edges had two orbits, namely . For the particular language
, at the cost of making the expression more complicated, it suffices to use atomic stars where the edges have one orbit. For an atom
, the language
is generated by taking the vertices to be
, the edges to be the equality, the source being
, the target being
, and the labeling being
. The set of all sequences in
that begin in
is generated by taking the vertices to be
, the edges to be disequality, the source being
, the target being
, and the labelling being
.
Definition (Atomic regular expression). An atomic regular expression is an expression (where the syntax tree is orbit-finitely branching) which uses: single letters, the empty word, binary concatenation, orbit-finite union, and atomic star with the labelling using already defined atomic regular expressions.
Theorem. Assume that the atoms are oligomorphic. Then an equivariant language is definable by a regular expression if and only if it is recognized by a nondeterministic orbit-finite automaton.
Proof. The same as usual. Let us do the more difficult direction from automata to expressions. If a language is equivariant and recognized by a nondeterministic automaton, then it is recognized by an equivariant nondeterministic automaton. Suppose that the automaton has states . For an equivariant subset of states
and states
, define define
to be the set of words that are accepted by a run that begins in state
, ends in state
and uses states from
in all other positions. By induction on the number of equivariant orbits in
, we prove that
is defined by a regular expression.
The main observation is that when and
are in the same orbit (which is the atmoic version of
), then
can be obtained by applying the atomic star to languages where instead of
, we use
minus the orbit of
. More formally, suppose that
is an equivariant orbit. Given states
, consider the generalized star
defined as follows:
- the vertices are
;
- the edges are
;
- the source vertex is
and the target vertex is
;
- the labelling is
.
The labels of the edges in are definable by regular expressions thanks to the induction assumption. It is also easy to see that
describes exactly the language
. Finally, for any states
, the language
can be obtained, using orbit-finite union and concatenation, from languages of the form
and
. The assumption on oligomorphism is used to ensure that a set of pairs of states is orbit-finite.