This is the FAQ for the article entitled Huge Text File, Need to Extract Specific Lines? Here’s How. Working examples and more detailed information for each of these commands is available through that link. All these commands work in the Linux terminal, some of them will work in the Windows command line.
When using sed, add a “-i” switch and do not specify a destination file if you want the changes to be applied to the source file without producing a backup. For example:
sed -i 's/StringToTeplace/ReplacementString/g' source.txt
Would work directly inside source.txt without creating a backup.
How do I split a large file into small chunks?
split -l [number of lines] [filename]
Remember to remove the parentheses i.e [ and ]
How do I shuffle the data in a text file?
shuf input.txt > output.txt
How do I sort the data in a text file?
sort input.txt > output.txt
How do I delete specific characters or text from a file?
grep "criteria" sourcefile.txt > destinationfile.txt
How do I replace specific characters or text from a file?
sed 's/StringToTeplace/ReplacementString/g' source.txt > destination.txt
Sometimes the forward slashes need to be replaced with # e.g
sed 's#StringToTeplace#ReplacementString#g' source.txt > destination.txt
How do I delete the Nth character within every line of a file?
sed 's/^(.{#}).(.*)/12/' sourcefile > outputfile
Replace # with the character position. The first character you count has the value of 0, zero.
How do I delete the first N characters of every line within a file?
sed 's .{#} ' source.txt > destination.shtml
Replace # with the number of characters to be removed.
How do I delete the last N characters of every line within a file?
sed 's/.{#}$//g' source.txt > destination.shtml
Replace # with the number of characters to be removed.
How do I delete everything after a specific character in every line within a file?
sed 's/[character].*/[character]/g' source.txt > destination.text
Replace [character] with the demarcation character or characters (do not include the parentheses “[]”)
This can also be used to replace everything after the character with different characters (the second [character] designates the replacements). Leave out the second [character] to delete the [character] too.
How do I delete everything before a specific character in every line within a file?
sed 's/.*[character]/[character]/g' source.txt > destination.text
Replace [character] with the demarcation character or characters (do not include the parentheses “[]”)
This can also be used to replace everything before the character with different characters (the second [character] designates the replacements). Leave out the second [character] to delete the [character] too.
How do I add characters to the END of every line within a file?
sed 's/$/text to add/g' source.txt > destination.txt
Replace “text to add” with the characters to be added to the end of each data line.
How do I add characters to the BEGINNING of every line within a file?
sed 's/^/text to add/g' source.txt > destination.txt
Replace “text to add” with the characters to be added to the beginning of each data line.
How do I remove duplicate lines of data within a file?
uniq source.txt > destination.txt
The above command checks for repeated data in sequentially so requires the data to be sorted alphanumerically. Alternatively, and better, use the Awk command below which does not require the data to be presorted:
awk '!x[$0]++' source.txt > destination.txt
How do I extract lines within a file that contain specific data?
grep "specific data" source.txt > destination.txt
The quotation marks are essential parts of this command.
Replace specific data with the data the lines to be extracted contain. The quotation marks must be present.
How do I merge n files into one file to separate columns?
paste -d 'delimiter' file1 file2 > newfile
Replace delimiter with the column separation character or code e.g a comma (,) or space ( ) or set of characters (xxxxx)