Search for: Use multiple CPU Cores with your Linux commands — awk, sed, bzip2, grep, wc, etc. cat bigfile.bin | bzip2 --best > compressedfile.bz2 Do this: cat bigfile.bin | parallel --pipe --recend '' -k bzip2 --best > compressedfile.bz2 Especially with bzip2, GNU parallel is dramatically faster on multiple core machines. Give it a whirl and you will be sold. GREP If you have an enormous text file, rather than this: grep pattern bigfile.txt or this: cat bigfile.txt | parallel --block 10M --pipe grep 'pattern' These second command shows you using –block with 10 MB of data from your file — you might play with this parameter to find our how many input record lines you want per CPU core. I gave a previous example of how to use grep with a large number of files , rather than just a single large file. AWK Here's an example of using awk to add up the numbers in a very large file. Rather than this: cat rands20M.txt | awk '{s+=$1} END {print s}' do this! cat rands20M.
Read full article from Use multiple CPU Cores with your Linux commands -- awk, sed, bzip2, grep, wc, etc. | RankFocus - Systems and Data
No comments:
Post a Comment