- Use Boyer-Moore (and unroll its inner loop a few times). - Roll your own unbuffered input using raw system calls. Avoid copying the input bytes before searching them. (Do, however, use buffered *output*. The normal grep scenario is that the amount of output is small compared to the amount of input, so the overhead of output buffer copying is small, while savings due to avoiding many small unbuffered writes can be large.) - Don't look for newlines in the input until after you've found a match. - Try to set things up (page-aligned buffers, page-sized read chunks, optionally use mmap) so the kernel can ALSO avoid copying the bytes. The key to making programs fast is to make them do practically nothing. ;-)
Read full article from why GNU grep is fast
No comments:
Post a Comment