Regular Languages and Finite Automata
Language
An alphabet \(\Sigma\) is a finite set of symbols, for example \(\Sigma=\{0,1\}\). A string is a finite sequence of symbols from \(\Sigma\). We denote the empty string by \(\epsilon\). The set of all strings over \(\Sigma\) is \(\Sigma^\star\), using the Kleene star operator.
A language \(L\) is a subset of strings, that is, \(L \subseteq \Sigma^\star\). Basic operations over languages include:
- union \(L_1 \cup L_2\),
- concatenation \(L_1 L_2\), and
- closure (Kleene-star) \(L^\star\).
A decision problem is a function from strings to boolean values \(f\colon \Sigma^\star \to \{0,1\}\). A function problem is a function from strings to strings \(f\colon \Sigma^\star \to \Sigma^\star \).
Languages and decision problems are equivalent formulations. A decision problem \(f\) defines a language \(L =\{x\mid f(x)=1 \}\), and a language \(L\) associates with a function \(f\) where \(f(x) = 1\) if \(x\in L\) and \(f(x)=0\) if \(x\notin L\).
Finite Automata: Deterministic and Nondeterministic
A deterministic finite automaton (DFA) is a tuple \(M = (Q, \Sigma, \delta, q_0, F)\), where
- \(Q\) is a finite set of states,
- \(\Sigma\) is a finite alphabet,
- \(\delta\colon Q\times \Sigma \to Q\) is the transition function,
- \(q_0 \in Q\) is the start state, and
- \(F\subseteq Q\) is a set of final states.
Let \(w=w_1w_2\cdots w_n\) be a string in \(\Sigma^*\) where each \(w_i \in \Sigma\). Then \(w\) determines a sequence of states \(r_0, r_1, \ldots, r_n\) where \(r_0 = q_0\) and \(r_i = \delta(r_{i-1}, w_i)\) for \(i=1, \ldots,n\). We say \(M\) accepts \(w\) if \(r_n\) is a final state, i.e., \(r_n \in F\). We call the set of all strings accepted by \(M\) the language of \(M\), denoted by \(L(M)\).
Let \(\epsilon\) be the empty string and write \(\Sigma_\epsilon = \Sigma \cup \{\epsilon\}\).
Let \({\cal P}(Q)\) be the power set of \(Q\).
A {\em nondeterministic finite automaton (NFA)} is a tuple \(N = (Q, \Sigma, \delta, q_0, F)\)
where
- \(Q\) is a finite set of states,
- \(\Sigma\) is a finite alphabet,
- \(\delta\colon Q\times \Sigma_\epsilon \to {\cal P}(Q)\) is the transition function,
- \(q_0 \in Q\) is the start state, and
- \(F\subseteq Q\) is a set of final states.
For a string \(w \in \Sigma^*\), we say \(N\) accepts \(w\) if we can write \(w=y_1y_2\cdots y_n\) where \(y_i \in \Sigma_\epsilon\) and there exists a sequence of states \(r_0, r_1, \ldots, r_n\) such that \(r_0 = q_0\), \(r_i \in \delta(r_{i-1}, y_i)\) for \(i=1,\ldots,n\), and \(r_n \in F\). We call the set of all strings accepted by \(N\) the language of \(N\), denoted by \(L(N)\).
Regular Expressions and Regular Languages
Let \(\Sigma\) be the alphabet. The following are called regular expressions:
- \(\emptyset\), and \(a\), where \(a \in \Sigma_\epsilon\),
- \(R_1 + R_2\), \(R_1 \circ R_2\), and \(R_1^\star\), where \(R_1\) and \(R_2\) are regular expressions.
A regular expression defines a language, where \(\emptyset\) representing the empty set, \(a\) represents the singleton set \(\{a\}\), and \(+, \circ, \star\) represents union, concatenation, and Kleene star operations. Such languages are called regular languages.
Note that \(\emptyset = \{\}\) and \(\{\epsilon\}\) are two distinct languages.
Equivalence of Regular Languages, DFA and NFA
There are three equivalent ways of defining regular languages: regular expression, DFA and NFA.
(Equivalence of RE, DFA an NFA)
If \(L\) is accepted by some DFA (or NFA), then there exists a regular expression \(R\) such that \(L(R) = L\).