Linear and Binary Search

Searching is to find a target in a collection of elements, or determine the target does not exist. Here we consider data in arrays stored in the memory; while in real problems data may be stored in disk files, databases, or even distributed over the Internet.

Linear Search

Linear search is to check each element one by one in sequence. The following method linearSearch() searches a target in an array and returns the index of the target; if not found, it returns -1, which indicates an invalid index.

public static int linearSearch(int[] arr, int target)
{
    for (int i = 0; i < arr.length; i++)
    {
        if (arr[i] == target)
            return i;
    }
    return -1;
}

Linear search loops through each element in the array; each loop body takes constant time. Therefore, it runs in linear time O(n).

Binary Search

For sorted arrays, binary search is more efficient than linear search. The process starts from the middle of the input array:

  • If the target equals the element in the middle, return its index.
  • If the target is larger than the element in the middle, search the right half.
  • If the target is smaller, search the left half.

In the following binarySearch() method, the two index variables first and last indicates the searching boundary at each round.

public static int binarySearch(int[] arr, int target)
{
    int first = 0, last = arr.length - 1;

    while (first <= last)
    {
        int mid = (first + last) / 2; 
        if (target == arr[mid]) 
            return mid; 
        if (target > arr[mid])
            first = mid + 1;
        else
            last = mid - 1;
    }
    return -1;
}
arr: {3, 9, 10, 27, 38, 43, 82}

target: 10
first: 0, last: 6, mid: 3, arr[mid]: 27   --  go left
first: 0, last: 2, mid: 1, arr[mid]: 9    --  go right
first: 2, last: 2, mid: 2, arr[mid]: 10   --  found

target: 40
first: 0, last: 6, mid: 3, arr[mid]: 27   --  go right
first: 4, last: 6, mid: 5, arr[mid]: 43   --  go left
first: 4, last: 4, mid: 4, arr[mid]: 38   --  go right
first: 5, last: 4                         --  not found

Binary search divides the array in the middle at each round of the loop. Suppose the array has length n and the loop runs in t rounds, then we have n * (1/2)^t = 1 since at each round the array length is divided by 2. Thus t = log(n). At each round, the loop body takes constant time. Therefore, binary search runs in logarithmic time O(log n).

The following code implements binary search using recursion. To call the method, we need provide with the boundary indexes, for example,
binarySearch(arr, 0, arr.length - 1, target);

public static int 
binarySearch(int[] arr, int first, int last, int target)
{
    if (first > last)
        return -1;

    int mid = (first + last) / 2;

    if (target == arr[mid]) 
        return mid;
    if (target > arr[mid])
        return binarySearch(arr, mid + 1, last, target);
    // target < arr[mid]
    return binarySearch(arr, first, mid - 1, target);
}

Application: Prefix Search

The task here is to find all words with a given prefix, for example, all words starting with “bina”. The following link is a text file containing 20,068 words, one word per line: http://introcs.cs.princeton.edu/java/data/words.txt

The program below contains two methods: readWords() loads all words in an array and sort it in order; searchWords() finds and prints all words with a given prefix. The searching algorithm is “binary search”.

Since all words with the same prefix are stored consecutively, the binary search algorithm identifies the first word with the prefix, and then scans the array until the last word with the given prefix. If the array contains the prefix exactly, the while loop breaks out and first points to the prefix; otherwise, the loop ends with first pointing to the first word “larger than” the prefix by the lexicographical order.

import java.io.IOException;
import java.net.URL;
import java.util.Arrays;
import java.util.Scanner;


public class WordSearch
{
    // Read all words into an array and sort
    public static String[] readWords() throws IOException
    {
        String urlString = 
         "http://introcs.cs.princeton.edu/java/data/words.txt";
        URL url = new URL(urlString);
        Scanner scan = new Scanner(url.openStream());

        String[] arr = new String[20068];
        int    i = 0;
        while (scan.hasNext())
            arr[i++] = scan.nextLine();
        scan.close();
        
        System.out.println("Read " + i + " words");
        Arrays.sort(arr);
        return arr;
    }
    
    // print all words with a given prefix
    public static void searchWords(String[] words, String prefix)
    {
        int first = 0, last = words.length - 1;
        int mid = 0;
        
        while (first <= last)
        {
            mid = (first + last) / 2;
            int c = prefix.compareTo(words[mid]);
            if (c == 0)
            {
            	first = mid;    // first indicates the beginning
            	break;
            }
            if (c > 0)
                first = mid + 1;
            else
                last = mid - 1;
        }
        
        int i;
        for (i = first; i < words.length; i++)
        {
            if (words[i].startsWith(prefix))
                System.out.println(i + words[i]);
            else
            	break;
        }

        System.out.println("Found " + (i-first) + " words");
        System.out.println("---------------------------");
    }

    
    public static void main(String[] args) throws IOException 
    {
        String[] words = readWords();
        Scanner scan = new Scanner(System.in);

        while (true)
        {
            System.out.println("Type a prefix (empty to exit):");
            String prefix = scan.nextLine().toLowerCase();
            if (prefix.length() == 0)
                break;
            
            searchWords(words, prefix);
        }
    }
}

Related Posts

Comments

comments

Tagged with: , , ,
Posted in Algorithm, Java