# Regular Languages and Finite Automata

### Language

An alphabet $$\Sigma$$ is a finite set of symbols, for example $$\Sigma=\{0,1\}$$. A string is a finite sequence of symbols from $$\Sigma$$. We denote the empty string by $$\epsilon$$. The set of all strings over $$\Sigma$$ is $$\Sigma^\star$$, using the Kleene star operator.

A language $$L$$ is a subset of strings, that is, $$L \subseteq \Sigma^\star$$. Basic operations over languages include:

• union $$L_1 \cup L_2$$,
• concatenation $$L_1 L_2$$, and
• closure (Kleene-star) $$L^\star$$.

A decision problem is a function from strings to boolean values $$f\colon \Sigma^\star \to \{0,1\}$$. A function problem is a function from strings to strings $$f\colon \Sigma^\star \to \Sigma^\star$$.

Languages and decision problems are equivalent formulations. A decision problem $$f$$ defines a language $$L =\{x\mid f(x)=1 \}$$, and a language $$L$$ associates with a function $$f$$ where $$f(x) = 1$$ if $$x\in L$$ and $$f(x)=0$$ if $$x\notin L$$.

### Finite Automata: Deterministic and Nondeterministic

A deterministic finite automaton (DFA) is a tuple $$M = (Q, \Sigma, \delta, q_0, F)$$, where

• $$Q$$ is a finite set of states,
• $$\Sigma$$ is a finite alphabet,
• $$\delta\colon Q\times \Sigma \to Q$$ is the transition function,
• $$q_0 \in Q$$ is the start state, and
• $$F\subseteq Q$$ is a set of final states.

Let $$w=w_1w_2\cdots w_n$$ be a string in $$\Sigma^*$$ where each $$w_i \in \Sigma$$. Then $$w$$ determines a sequence of states $$r_0, r_1, \ldots, r_n$$ where $$r_0 = q_0$$ and $$r_i = \delta(r_{i-1}, w_i)$$ for $$i=1, \ldots,n$$. We say $$M$$ accepts $$w$$ if $$r_n$$ is a final state, i.e., $$r_n \in F$$. We call the set of all strings accepted by $$M$$ the language of $$M$$, denoted by $$L(M)$$.

Let $$\epsilon$$ be the empty string and write $$\Sigma_\epsilon = \Sigma \cup \{\epsilon\}$$.
Let $${\cal P}(Q)$$ be the power set of $$Q$$.

A {\em nondeterministic finite automaton (NFA)} is a tuple $$N = (Q, \Sigma, \delta, q_0, F)$$
where

• $$Q$$ is a finite set of states,
• $$\Sigma$$ is a finite alphabet,
• $$\delta\colon Q\times \Sigma_\epsilon \to {\cal P}(Q)$$ is the transition function,
• $$q_0 \in Q$$ is the start state, and
• $$F\subseteq Q$$ is a set of final states.

For a string $$w \in \Sigma^*$$, we say $$N$$ accepts $$w$$ if we can write $$w=y_1y_2\cdots y_n$$ where $$y_i \in \Sigma_\epsilon$$ and there exists a sequence of states $$r_0, r_1, \ldots, r_n$$ such that $$r_0 = q_0$$, $$r_i \in \delta(r_{i-1}, y_i)$$ for $$i=1,\ldots,n$$, and $$r_n \in F$$. We call the set of all strings accepted by $$N$$ the language of $$N$$, denoted by $$L(N)$$.

### Regular Expressions and Regular Languages

Let $$\Sigma$$ be the alphabet. The following are called regular expressions:

• $$\emptyset$$, and $$a$$, where $$a \in \Sigma_\epsilon$$,
• $$R_1 + R_2$$, $$R_1 \circ R_2$$, and $$R_1^\star$$, where $$R_1$$ and $$R_2$$ are regular expressions.

A regular expression defines a language, where $$\emptyset$$ representing the empty set, $$a$$ represents the singleton set $$\{a\}$$, and $$+, \circ, \star$$ represents union, concatenation, and Kleene star operations. Such languages are called regular languages.

Note that $$\emptyset = \{\}$$ and $$\{\epsilon\}$$ are two distinct languages.

### Equivalence of Regular Languages, DFA and NFA

There are three equivalent ways of defining regular languages: regular expression, DFA and NFA.

(Equivalence of RE, DFA an NFA)
If $$L$$ is accepted by some DFA (or NFA), then there exists a regular expression $$R$$ such that $$L(R) = L$$.