Unix Commands for Large Files
This tutorial is about how to handle large files using Linux/Unix commands, such as viewing the first few lines, counting the number of lines or bytes, splitting the file, etc.
wc — word, line, character, and byte count
The wc utility displays the number of lines, words, and bytes in the file. It has the following options.
-
-c: The number of bytes. -
-l: The number of lines. -
-w: The number of words.
The following are some examples:
-
wc -c myfile.txt: counts the number of bytes. -
wc -l myfile.csv: counts the number of lines.
head — display first lines of a file
The head utility displays the first few lines or bytes of a file. A similar command tail displays the last lines. It has the following options.
-
-n count: display the first count lines. -
-c count: display the first count bytes.
The following are some examples:
-
head -n 10 myfile.csv: displays the first 10 lines.
split — split a file into pieces
The split utility breaks the given file up into files of 1000 lines each. It has the following options.
-
-l count: split into files of count lines each. -
-b count[k|m]: split into files of count [kilo|mega] bytes.
By default, the file is split into lexically ordered files named with the prefix “x”. The following are some examples:
-
split -l 10000 myfile.csv: splits the file into 10000 lines each, with names ‘xaa’, ‘xab’, etc -
split -l 100 myfile.csv newfile: splits the file into 100 lines each, with names ‘newfileaa’, ‘newfileab’, etc