Read from URL in Java

Read Text Contents from a URL

The class represents a URL (Uniform Resource Locator), a pointer to a “resource” on the Internet. For example, the following is a URL:

  • “http” stands for HyperText Transfer Protocol
  • “” is the host machine name
  • “index.html” is the file we are looking for

The following code creates a URL object:

URL url = new URL("");

The class has the following method openStream() which returns an input stream for reading from the source. It opens a connection to the URL and returns an InputStream for reading from that connection. This method is a shorthand for openConnection().getInputStream().

public final InputStream openStream() throws IOException

Using the input stream we can define a java.util.Scanner object for reading text contents from the URL.

Scanner scan = new Scanner( url.openStream() );

The following code reads text contents from a URL and prints out line by line.

URL url = new URL("");
InputStream in = url.openStream();
Scanner scan = new Scanner(in);

int line = 1;
while (scan.hasNext())
    String str = scan.nextLine();
    System.out.println( (line++) + ": " + str);

Example: Finding the Title in HTML

We wish to design a program that (1) asks the user for a URL, (2) retrieves HTML contents from the URL, and (3) finds the “title” from HTML. The title in an HTML is delimited by the tags <title></title>. The data flow of this program is: URL → HTML content → Title.

In the following ReadURLTitle class, we define a method readURLContent() to retrieve HTML contents as a string, and a method findTitle() to find the title in HTML.

import java.util.Scanner;

public class ReadURLTitle
    // Read from a URL and return the content in a String
    public static String readURLContent(String urlString) 
                                    throws IOException
        URL url = new URL(urlString);
        Scanner scan = new Scanner(url.openStream());

        String content = new String();
        while (scan.hasNext())
            content += scan.nextLine();
        return content;
    // Find title within the HTML content
    public static String findTitle(String str)
        String tagOpen = "<title>";
        String tagClose = "</title>";
        int begin = str.indexOf(tagOpen) + tagOpen.length();
        int end = str.indexOf(tagClose);
        return str.substring(begin, end);
    public static void main(String[] args) throws IOException 
        Scanner scan = new Scanner(;
        System.out.println("Please type in a URL:");
        String	urlString = scan.nextLine();
        if (urlString.length() == 0)
        String content = readURLContent(urlString);
        String title = findTitle(content);




Tagged with:
Posted in Java