Searching the Linux Source Tree with FindInFiles
To use FindInFiles I configured it to search the directory containing the Linux code, entered my search, and selected Find All. In this running example the user is searching for encryption algorithms, specifically those related to AES, and thus they use the regular expression query "encrypt*aes". Executing this search caused FindInFiles to run its regular expression matching algorithm against every line of every file in that directory, recursively. As you can see in "Starting the Search", this utilized about 50% of the CPU on an eight core machine for a considerable amount of time.
After about one minute and forty seconds the search completed, having searched 47,407 files. Unfortunately, no lines matched this particular search (see "Finishing the Search"). As often happens with a regular expression based search, the word ordering in the query did not match the word ordering in the code. In this situation the user would likely have to run another search with re-ordered search terms (e.g., "aes*encrypt") to find relevant code.
Searching the Linux Source Tree with Sando
Next we searched the same Linux source tree using Sando. Unlike FindInFiles, which is based on regular expression matching, Sando is built upon information retrieval technology (think Google). It leverages Lucene.NET to pre-index source code and provide ranked results almost instantly. Typing in the same query as before minus the regular expression syntax (i.e., "encrypt aes") you can see below that results are returned almost instantly. Just as importantly, the most relevant results are returned first with less relevant results toward the bottom. Additionally, in Sando's UI, selecting a result in the list provides a preview of the program element with matching terms in bold.
Read full article from Searching the Linux Source Tree in 0.5 Seconds - David C. Shepherd
No comments:
Post a Comment