Use multiple CPU Cores with your Linux commands -- awk, sed, bzip2, grep, wc, etc. | RankFocus - Systems and Data



Use multiple CPU Cores with your Linux commands -- awk, sed, bzip2, grep, wc, etc. | RankFocus - Systems and Data

Search for: Use multiple CPU Cores with your Linux commands — awk, sed, bzip2, grep, wc, etc. cat bigfile.bin | bzip2 --best > compressedfile.bz2 Do this: cat bigfile.bin | parallel --pipe --recend '' -k bzip2 --best > compressedfile.bz2 Especially with bzip2, GNU parallel is dramatically faster on multiple core machines. Give it a whirl and you will be sold.     GREP If you have an enormous text file, rather than this: grep pattern bigfile.txt or this: cat bigfile.txt | parallel --block 10M --pipe grep 'pattern' These second command shows you using –block with 10 MB of data from your file — you might play with this parameter to find our how many input record lines you want per CPU core. I gave a previous example of how to use grep with a large number of files , rather than just a single large file. AWK Here's an example of using awk to add up the numbers in a very large file. Rather than this: cat rands20M.txt | awk '{s+=$1} END {print s}' do this! cat rands20M.

Read full article from Use multiple CPU Cores with your Linux commands -- awk, sed, bzip2, grep, wc, etc. | RankFocus - Systems and Data


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts