Regular Languages and Finite Automata
Language
An alphabet \Sigma is a finite set of symbols, for example \Sigma=\{0,1\}. A string is a finite sequence of symbols from \Sigma. We denote the empty string by \epsilon. The set of all strings over \Sigma is \Sigma^\star, using the Kleene star operator.
A language L is a subset of strings, that is, L \subseteq \Sigma^\star. Basic operations over languages include:
- union L_1 \cup L_2,
- concatenation L_1 L_2, and
- closure (Kleene-star) L^\star.
A decision problem is a function from strings to boolean values f\colon \Sigma^\star \to \{0,1\}. A function problem is a function from strings to strings f\colon \Sigma^\star \to \Sigma^\star .
Languages and decision problems are equivalent formulations. A decision problem f defines a language L =\{x\mid f(x)=1 \}, and a language L associates with a function f where f(x) = 1 if x\in L and f(x)=0 if x\notin L.
Finite Automata: Deterministic and Nondeterministic
A deterministic finite automaton (DFA) is a tuple M = (Q, \Sigma, \delta, q_0, F), where
- Q is a finite set of states,
- \Sigma is a finite alphabet,
- \delta\colon Q\times \Sigma \to Q is the transition function,
- q_0 \in Q is the start state, and
- F\subseteq Q is a set of final states.
Let w=w_1w_2\cdots w_n be a string in \Sigma^* where each w_i \in \Sigma. Then w determines a sequence of states r_0, r_1, \ldots, r_n where r_0 = q_0 and r_i = \delta(r_{i-1}, w_i) for i=1, \ldots,n. We say M accepts w if r_n is a final state, i.e., r_n \in F. We call the set of all strings accepted by M the language of M, denoted by L(M).
Let \epsilon be the empty string and write \Sigma_\epsilon = \Sigma \cup \{\epsilon\}.
Let {\cal P}(Q) be the power set of Q.
A {\em nondeterministic finite automaton (NFA)} is a tuple N = (Q, \Sigma, \delta, q_0, F)
where
- Q is a finite set of states,
- \Sigma is a finite alphabet,
- \delta\colon Q\times \Sigma_\epsilon \to {\cal P}(Q) is the transition function,
- q_0 \in Q is the start state, and
- F\subseteq Q is a set of final states.
For a string w \in \Sigma^*, we say N accepts w if we can write w=y_1y_2\cdots y_n where y_i \in \Sigma_\epsilon and there exists a sequence of states r_0, r_1, \ldots, r_n such that r_0 = q_0, r_i \in \delta(r_{i-1}, y_i) for i=1,\ldots,n, and r_n \in F. We call the set of all strings accepted by N the language of N, denoted by L(N).
Regular Expressions and Regular Languages
Let \Sigma be the alphabet. The following are called regular expressions:
- \emptyset, and a, where a \in \Sigma_\epsilon,
- R_1 + R_2, R_1 \circ R_2, and R_1^\star, where R_1 and R_2 are regular expressions.
A regular expression defines a language, where \emptyset representing the empty set, a represents the singleton set \{a\}, and +, \circ, \star represents union, concatenation, and Kleene star operations. Such languages are called regular languages.
Note that \emptyset = \{\} and \{\epsilon\} are two distinct languages.
Equivalence of Regular Languages, DFA and NFA
There are three equivalent ways of defining regular languages: regular expression, DFA and NFA.
(Equivalence of RE, DFA an NFA)
If L is accepted by some DFA (or NFA), then there exists a regular expression R such that L(R) = L.