Tharindu's Space: Apache Lucene 4.6 TokenStream contract violation error
If the above code is run with Lucene 4.6, following exception is thrown.
The workflow of the new
If the above code is run with Lucene 4.6, following exception is thrown.
Exception in thread "main" java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:110)
at org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:921)
at org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1128)
at org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:173)
at org.apache.lucene.analysis.standard.StandardFilter.incrementToken(StandardFilter.java:49)
at org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
at org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:82)The workflow of the new
TokenStream
API is as follows:- Instantiation of
TokenStream
/TokenFilter
s which add/get attributes to/from theAttributeSource
. - The consumer calls
reset()
. - The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access.
- The consumer calls
incrementToken()
until it returns false consuming the attributes after each call. - The consumer calls close() to release any resource when finished using the
TokenStream
.
FeatureVectorEncoder encoder= new StaticWordValueEncoder("text");
Analyzer analyzer= new StandardAnalyzer(Version.LUCENE_46);
StringReader in = new StringReader(text);
TokenStream ts= analyzer.tokenStream("content",in);
CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
ts.reset();
Vector v1= new RandomAccessSparseVector(100);
while(ts.incrementToken()){
char[] termBuff= termAtt.buffer();
int termLen=termAtt.length();
String w = new String(termBuff,0,termLen);
encoder.addToVector(w,1,v1);
}
ts.end();
ts.close();
Please read full article from Tharindu's Space: Apache Lucene 4.6 TokenStream contract violation error
No comments:
Post a Comment