Read from URL in Java
Read Text Contents from a URL
The class java.net.URL
represents a URL (Uniform Resource Locator), a pointer to a “resource” on the Internet. For example, the following is a URL:
http://www.example.com/index.html
- “http” stands for HyperText Transfer Protocol
- “www.example.com” is the host machine name
- “index.html” is the file we are looking for
The following code creates a URL object:
URL url = new URL("http://www.example.com/");
The java.net.URL
class has the following method openStream()
which returns an input stream for reading from the source. It opens a connection to the URL and returns an InputStream
for reading from that connection. This method is a shorthand for openConnection().getInputStream()
.
public final InputStream openStream() throws IOException
Using the input stream we can define a java.util.Scanner
object for reading text contents from the URL.
Scanner scan = new Scanner( url.openStream() );
The following code reads text contents from a URL and prints out line by line.
URL url = new URL("http://www.example.com/"); InputStream in = url.openStream(); Scanner scan = new Scanner(in); int line = 1; while (scan.hasNext()) { String str = scan.nextLine(); System.out.println( (line++) + ": " + str); } scan.close();
Example: Finding the Title in HTML
We wish to design a program that (1) asks the user for a URL, (2) retrieves HTML contents from the URL, and (3) finds the “title” from HTML. The title in an HTML is delimited by the tags <title></title>
. The data flow of this program is: URL → HTML content → Title.
In the following ReadURLTitle
class, we define a method readURLContent()
to retrieve HTML contents as a string, and a method findTitle()
to find the title in HTML.
import java.io.IOException; import java.net.URL; import java.util.Scanner; public class ReadURLTitle { // Read from a URL and return the content in a String public static String readURLContent(String urlString) throws IOException { URL url = new URL(urlString); Scanner scan = new Scanner(url.openStream()); String content = new String(); while (scan.hasNext()) content += scan.nextLine(); scan.close(); return content; } // Find title within the HTML content public static String findTitle(String str) { String tagOpen = "<title>"; String tagClose = "</title>"; int begin = str.indexOf(tagOpen) + tagOpen.length(); int end = str.indexOf(tagClose); return str.substring(begin, end); } public static void main(String[] args) throws IOException { Scanner scan = new Scanner(System.in); System.out.println("Please type in a URL:"); String urlString = scan.nextLine(); if (urlString.length() == 0) break; String content = readURLContent(urlString); String title = findTitle(content); System.out.println(title); } }