# Regular Languages and Finite Automata

### Language

An alphabet \(\Sigma\) is a finite set of symbols, for example \(\Sigma=\{0,1\}\). A string is a finite sequence of symbols from \(\Sigma\). We denote the empty string by \(\epsilon\). The set of all strings over \(\Sigma\) is \(\Sigma^\star\), using the Kleene star operator.

A language \(L\) is a subset of strings, that is, \(L \subseteq \Sigma^\star\). Basic operations over languages include:

- union \(L_1 \cup L_2\),
- concatenation \(L_1 L_2\), and
- closure (Kleene-star) \(L^\star\).

A decision problem is a function from strings to boolean values \(f\colon \Sigma^\star \to \{0,1\}\). A function problem is a function from strings to strings \(f\colon \Sigma^\star \to \Sigma^\star \).

Languages and decision problems are equivalent formulations. A decision problem \(f\) defines a language \(L =\{x\mid f(x)=1 \}\), and a language \(L\) associates with a function \(f\) where \(f(x) = 1\) if \(x\in L\) and \(f(x)=0\) if \(x\notin L\).

### Finite Automata: Deterministic and Nondeterministic

A *deterministic finite automaton (DFA)* is a tuple \(M = (Q, \Sigma, \delta, q_0, F)\), where

- \(Q\) is a finite set of states,
- \(\Sigma\) is a finite alphabet,
- \(\delta\colon Q\times \Sigma \to Q\) is the transition function,
- \(q_0 \in Q\) is the start state, and
- \(F\subseteq Q\) is a set of final states.

Let \(w=w_1w_2\cdots w_n\) be a string in \(\Sigma^*\) where each \(w_i \in \Sigma\). Then \(w\) determines a sequence of states \(r_0, r_1, \ldots, r_n\) where \(r_0 = q_0\) and \(r_i = \delta(r_{i-1}, w_i)\) for \(i=1, \ldots,n\). We say \(M\) *accepts* \(w\) if \(r_n\) is a final state, i.e., \(r_n \in F\). We call the set of all strings accepted by \(M\) the *language of \(M\)*, denoted by \(L(M)\).

Let \(\epsilon\) be the empty string and write \(\Sigma_\epsilon = \Sigma \cup \{\epsilon\}\).

Let \({\cal P}(Q)\) be the power set of \(Q\).

A {\em nondeterministic finite automaton (NFA)} is a tuple \(N = (Q, \Sigma, \delta, q_0, F)\)

where

- \(Q\) is a finite set of states,
- \(\Sigma\) is a finite alphabet,
- \(\delta\colon Q\times \Sigma_\epsilon \to {\cal P}(Q)\) is the transition function,
- \(q_0 \in Q\) is the start state, and
- \(F\subseteq Q\) is a set of final states.

For a string \(w \in \Sigma^*\), we say \(N\) *accepts* \(w\) if we can write \(w=y_1y_2\cdots y_n\) where \(y_i \in \Sigma_\epsilon\) and there exists a sequence of states \(r_0, r_1, \ldots, r_n\) such that \(r_0 = q_0\), \(r_i \in \delta(r_{i-1}, y_i)\) for \(i=1,\ldots,n\), and \(r_n \in F\). We call the set of all strings accepted by \(N\) the * language of \(N\)*, denoted by \(L(N)\).

### Regular Expressions and Regular Languages

Let \(\Sigma\) be the alphabet. The following are called *regular expressions*:

- \(\emptyset\), and \(a\), where \(a \in \Sigma_\epsilon\),
- \(R_1 + R_2\), \(R_1 \circ R_2\), and \(R_1^\star\), where \(R_1\) and \(R_2\) are regular expressions.

A regular expression defines a language, where \(\emptyset\) representing the empty set, \(a\) represents the singleton set \(\{a\}\), and \(+, \circ, \star\) represents union, concatenation, and Kleene star operations. Such languages are called *regular languages*.

Note that \(\emptyset = \{\}\) and \(\{\epsilon\}\) are two distinct languages.

### Equivalence of Regular Languages, DFA and NFA

There are three equivalent ways of defining regular languages: regular expression, DFA and NFA.

**(Equivalence of RE, DFA an NFA) **

If \(L\) is accepted by some DFA (or NFA), then there exists a regular expression \(R\) such that \(L(R) = L\).