Regular Languages and Finite Automata

Language

An alphabet $\Sigma$ is a finite set of symbols, for example $\Sigma=\{0,1\}$ . A string is a finite sequence of symbols from $\Sigma$ . We denote the empty string by $\epsilon$ . The set of all strings over $\Sigma$ is $\Sigma^\star$ , using the Kleene star operator.

A language $L$ is a subset of strings, that is, $L \subseteq \Sigma^\star$ . Basic operations over languages include:

union $L_1 \cup L_2$ ,
concatenation $L_1 L_2$ , and
closure (Kleene-star) $L^\star$ .

A decision problem is a function from strings to boolean values $f\colon \Sigma^\star \to \{0,1\}$ . A function problem is a function from strings to strings $f\colon \Sigma^\star \to \Sigma^\star$ .

Languages and decision problems are equivalent formulations. A decision problem $f$ defines a language $L =\{x\mid f(x)=1 \}$ , and a language $L$ associates with a function $f$ where $f(x) = 1$ if $x\in L$ and $f(x)=0$ if $x\notin L$ .

Finite Automata: Deterministic and Nondeterministic

A deterministic finite automaton (DFA) is a tuple $M = (Q, \Sigma, \delta, q_0, F)$ , where

$Q$ is a finite set of states,
$\Sigma$ is a finite alphabet,
$\delta\colon Q\times \Sigma \to Q$ is the transition function,
$q_0 \in Q$ is the start state, and
$F\subseteq Q$ is a set of final states.

Let $w=w_1w_2\cdots w_n$ be a string in $\Sigma^*$ where each $w_i \in \Sigma$ . Then $w$ determines a sequence of states $r_0, r_1, \ldots, r_n$ where $r_0 = q_0$ and $r_i = \delta(r_{i-1}, w_i)$ for $i=1, \ldots,n$ . We say $M$ accepts $w$ if $r_n$ is a final state, i.e., $r_n \in F$ . We call the set of all strings accepted by $M$ the language of $M$ , denoted by $L(M)$ .

Let $\epsilon$ be the empty string and write $\Sigma_\epsilon = \Sigma \cup \{\epsilon\}$ .
Let ${\cal P}(Q)$ be the power set of $Q$ .

A {\em nondeterministic finite automaton (NFA)} is a tuple $N = (Q, \Sigma, \delta, q_0, F)$
where

$Q$ is a finite set of states,
$\Sigma$ is a finite alphabet,
$\delta\colon Q\times \Sigma_\epsilon \to {\cal P}(Q)$ is the transition function,
$q_0 \in Q$ is the start state, and
$F\subseteq Q$ is a set of final states.

For a string $w \in \Sigma^*$ , we say $N$ accepts $w$ if we can write $w=y_1y_2\cdots y_n$ where $y_i \in \Sigma_\epsilon$ and there exists a sequence of states $r_0, r_1, \ldots, r_n$ such that $r_0 = q_0$ , $r_i \in \delta(r_{i-1}, y_i)$ for $i=1,\ldots,n$ , and $r_n \in F$ . We call the set of all strings accepted by $N$ the language of $N$ , denoted by $L(N)$ .

Regular Expressions and Regular Languages

Let $\Sigma$ be the alphabet. The following are called regular expressions:

$\emptyset$ , and $a$ , where $a \in \Sigma_\epsilon$ ,
$R_1 + R_2$ , $R_1 \circ R_2$ , and $R_1^\star$ , where $R_1$ and $R_2$ are regular expressions.

A regular expression defines a language, where $\emptyset$ representing the empty set, $a$ represents the singleton set $\{a\}$ , and $+, \circ, \star$ represents union, concatenation, and Kleene star operations. Such languages are called regular languages.

Note that $\emptyset = \{\}$ and $\{\epsilon\}$ are two distinct languages.

Equivalence of Regular Languages, DFA and NFA

There are three equivalent ways of defining regular languages: regular expression, DFA and NFA.

(Equivalence of RE, DFA an NFA)
If $L$ is accepted by some DFA (or NFA), then there exists a regular expression $R$ such that $L(R) = L$ .

Comments

comments