Lucene and Hyphens | Entelligentsia Blog
I have used Apache Lucene for providing an intelligent intuitive search for my products many times over. Well that was till yesterday! No, I did not ditch Lucene. But what threw a monkey in the wrench was the ‘intuitive’ nature of the product. Let me explain. Say I have a string ‘abc-def’ to be indexed. The say Lucene works (well actually the StandardAnalyzer that delegates the tokenizing through StandardTokenizer works) is by splitting ‘abc-def’ into ‘abc’, ‘def’ to index the two words separately.
tandardAnalyzer under its hoods delegates most of the heavy lifting toStandardTokenizer - which to my shock was a generated java file. The source was inflex; written specifically for a generator unknown to me yesterday – JFlex. As we speak we are best of buddies. Alright then out came the power chord and I got down toRTFM of JFlex. An hour and a few test cases a later I was reasonably confident to perform an invasive surgery on the StandardTokenizerImpl.jflex. Or so I thought.
http://pastebin.com/vumxg01w , This is my complete Jflex file. corresponding java file is an auto generated one. It is not very helpful for reading. But still if it is of any use to you, her is the linkhttp://pastebin.com/QApDqeNU.
Read full article from Lucene and Hyphens | Entelligentsia Blog
I have used Apache Lucene for providing an intelligent intuitive search for my products many times over. Well that was till yesterday! No, I did not ditch Lucene. But what threw a monkey in the wrench was the ‘intuitive’ nature of the product. Let me explain. Say I have a string ‘abc-def’ to be indexed. The say Lucene works (well actually the StandardAnalyzer that delegates the tokenizing through StandardTokenizer works) is by splitting ‘abc-def’ into ‘abc’, ‘def’ to index the two words separately.
tandardAnalyzer under its hoods delegates most of the heavy lifting toStandardTokenizer - which to my shock was a generated java file. The source was inflex; written specifically for a generator unknown to me yesterday – JFlex. As we speak we are best of buddies. Alright then out came the power chord and I got down toRTFM of JFlex. An hour and a few test cases a later I was reasonably confident to perform an invasive surgery on the StandardTokenizerImpl.jflex. Or so I thought.
http://pastebin.com/vumxg01w , This is my complete Jflex file. corresponding java file is an auto generated one. It is not very helpful for reading. But still if it is of any use to you, her is the linkhttp://pastebin.com/QApDqeNU.
Read full article from Lucene and Hyphens | Entelligentsia Blog
No comments:
Post a Comment