This is the FAQ for the article entitled Huge Text File, Need to Extract Specific Lines? Here’s How. Working examples and more detailed information for each of these commands is available through that link. All these commands work in the Linux terminal, some of them will work in the Windows command line.

How do I  split a large file into small chunks?

split -l [number of lines] [filename]

Remember to remove the parentheses i.e [ and ]

How do I shuffle the data in a text file?

shuf input.txt > output.txt

How do I sort the data in a text file?

sort input.txt > output.txt

How do I delete specific characters or text from a file?

grep "criteria" sourcefile.txt > destinationfile.txt

How do I replace specific characters or text from a file?

sed 's/StringToTeplace/ReplacementString/g' source.txt > destination.txt

Sometimes the forward slashes need to be replaced with # e.g

sed 's#StringToTeplace#ReplacementString#g' source.txt > destination.txt

How do I delete the Nth character within every line of a file?

sed 's/^(.{#}).(.*)/12/' sourcefile > outputfile

Replace # with the character position. The first character you count has the value of 0, zero.

How do I delete the first N characters of every line within a file?

sed  's .{#}  ' source.txt > destination.shtml

Replace # with the number of characters to be removed.

How do I delete the last N characters of every line within a file?

sed 's/.{#}$//g' source.txt > destination.shtml

Replace # with the number of characters to be removed.

How do I delete everything after a specific character in every line within a file?

sed 's/[character].*/[character]/g' source.txt > destination.text

Replace [character] with the demarcation character or characters (do not include the parentheses “[]“)

This can also be used to replace everything after the character with different characters (the second [character] designates the replacements). Leave out the second [character] to delete the [character] too.

How do I delete everything before a specific character in every line within a file?

sed 's/.*
[character]/[character]/g' source.txt > destination.text

Replace [character] with the demarcation character or characters (do not include the parentheses “[]“)

This can also be used to replace everything before the character with different characters (the second [character] designates the replacements). Leave out the second [character] to delete the [character] too.

How do I add characters to the END of every line within a file?

sed 's/$/text to add/g' source.txt > destination.txt

Replace “text to add” with the characters to be added to the end of each data line.

How do I add characters to the BEGINNING of every line within a file?

sed 's/^/text to add/g' source.txt > destination.txt

Replace “text to add” with the characters to be added to the beginning of each data line.

How do I remove duplicate lines of data within a file?

uniq source.txt > destination.txt

The above command checks for repeated data in sequentially so requires the data to be sorted alphanumerically. Alternatively, and better, use the Awk command below which does not require the data to be presorted:

awk '!x[$0]++' source.txt > destination.txt

How do I extract lines within a file that contain specific data?

grep "specific data" source.txt > destination.txt

The quotation marks are essential parts of this command.

Replace specific data with the data the lines to be extracted contain. The quotation marks must be present.

How do I merge n files into one file to separate columns?

paste -d 'delimiter' file1 file2 > newfile

Replace delimiter with the column separation character or code e.g a comma (,) or space ( ) or set of characters (xxxxx)

dp seal trans 16x1616 Text Manipulation FAQ  Copyright secured by Digiprove © 2010

  • Share/Bookmark

Related posts:

  1. Huge Text File, Need to Extract Specific Lines? Here’s How
  2. Stop < pre > Text Overruns
  3. WordPress Tips, Tricks and Advice FAQ
  4. Navigating The Linux Command Line
  5. Hardware Discovery and Fault Diagnostics

No Responses to “Text Manipulation FAQ”

    Leave a Reply

    (required)

    (required)

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <img src="" alt="" class="">

    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_bye.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_good.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_negative.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_scratch.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_wacko.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_yahoo.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_cool.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_heart.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_rose.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_smile.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_whistle3.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_yes.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_cry.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_mail.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_sad.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_unsure.gif 
    http://journalxtra.com/wp-content/plugins/wp-monalisa/icons/wpml_wink.gif 
     
    © 2010 JournalXtraSuffusion WordPress theme by Sayontan Sinha