lucene的缓存可分为两类：filter cache和field cache。

filter cache的实现类为CachingWrapperFilter，用来缓存其他Filter的查询结果。

field cache的实现类是FieldCache，缓存用于排序的field的值。

简单来说，filter Cache用于查询缓存，field cache用于排序。

这两种缓存的生存周期都是在一个IndexReader实例内，因此提高Lucene查询性能的关键在于如何维护和使用同一个IndexReader(即IndexSearcher)。

Filter Cache

从严格意义上来说，lucene没有查询类似数据库服务器的数据高速缓存。lucene的Filter缓存实现类是CachingWrapperFilter，它缓存了查出来的bits。另外lucene还提供了FilterManager，一个单例对象，用来缓存Filter本身。

public class CachingWrapperFilter extends Filter implements Accountable {
private final Filter filter;
private final Map<Object,DocIdSet> cache = Collections.synchronizedMap(new WeakHashMap<Object,DocIdSet>()); //这是作为缓存使用的map

public DocIdSet getDocIdSet(AtomicReaderContext context, final Bits acceptDocs) throws IOException {

final AtomicReader reader = context.reader();

final Object key = reader.getCoreCacheKey();

DocIdSet docIdSet = cache.get(key);

if (docIdSet != null) {

hitCount++;

} else {

missCount++;

docIdSet = docIdSetToCache(filter.getDocIdSet(context, null), reader);

assert docIdSet.isCacheable();

cache.put(key, docIdSet);

}

return docIdSet == EMPTY ? null : BitsFilteredDocIdSet.wrap(docIdSet, acceptDocs);

}

在FilterManager里，采用Filter.hashCode()作为key的，所以使用的时候应该在自定义的Filter类中重载hashCode()方法。

例子：Filter filter=FilterManager.getInstance().getFilter(new CachingWrapperFilter(new MyFilter()));如果该filter已经存在，在FilterManager返回该Filter的缓存（带有bit缓存），否则返回本身（不带bit缓存的）。

FilterManager里有个定时线程，会定期清理缓存，以防造成内存溢出错误。

field缓存

field缓存是用来排序用的。lucene会将需要排序的字段都读到内存来进行排序，所占内存大小和文档数目相关。经常有人用lucene做排序出现内存溢出的问题，一般是因为每次查询都启动新的searcher实例进行查询，当并发大的时候，造成多个Searcher实例同时装载排序字段，引起内存溢出。

org.apache.lucene.search.FieldCache
Expert: Maintains caches of term values.

缓存解决方案

Lucene缓存的生存周期都是在一个IndexReader实例内，因此提高Lucene查询性能的关键在于如何维护和使用同一个IndexReader(即IndexSearcher)。

Please read full article from lucene的缓存机制和实现方案

lucene的缓存机制和实现方案

Filter Cache

field缓存

缓存解决方案

No comments:

Post a Comment

Labels

Popular Posts