lucene的缓存机制和实现方案



lucene的缓存机制和实现方案
lucene的缓存可分为两类:filter cachefield cache
filter cache的实现类为CachingWrapperFilter,用来缓存其他Filter的查询结果。
field cache的实现类是FieldCache,缓存用于排序的field的值。
简单来说,filter Cache用于查询缓存,field cache用于排序。
这两种缓存的生存周期都是在一个IndexReader实例内,因此提高Lucene查询性能的关键在于如何维护和使用同一个IndexReader(IndexSearcher)

Filter Cache

从严格意义上来说,lucene没有查询类似数据库服务器的数据高速缓存。luceneFilter缓存实现类是CachingWrapperFilter,它缓存了查出来的bits。另外lucene还提供了FilterManager,一个单例对象,用来缓存Filter本身。

public class CachingWrapperFilter extends Filter implements Accountable {
  private final Filter filter;
  private final Map<Object,DocIdSet> cache = Collections.synchronizedMap(new WeakHashMap<Object,DocIdSet>()); //这是作为缓存使用的map
  public DocIdSet getDocIdSet(AtomicReaderContext context, final Bits acceptDocs) throws IOException {
    final AtomicReader reader = context.reader();
    final Object key = reader.getCoreCacheKey();

    DocIdSet docIdSet = cache.get(key);
    if (docIdSet != null) {
      hitCount++;
    } else {
      missCount++;
      docIdSet = docIdSetToCache(filter.getDocIdSet(context, null), reader);
      assert docIdSet.isCacheable();
      cache.put(key, docIdSet);
    }

    return docIdSet == EMPTY ? null : BitsFilteredDocIdSet.wrap(docIdSet, acceptDocs);
  }


在FilterManager里,采用Filter.hashCode()作为key的,所以使用的时候应该在自定义的Filter类中重载hashCode()方法。

例子:Filter filter=FilterManager.getInstance().getFilter(new CachingWrapperFilter(new MyFilter()));如果该filter已经存在,在FilterManager返回该Filter的缓存(带有bit缓存),否则返回本身(不带bit缓存的)。

FilterManager里有个定时线程,会定期清理缓存,以防造成内存溢出错误。

field缓存

field缓存是用来排序用的。lucene会将需要排序的字段都读到内存来进行排序,所占内存大小和文档数目相关。经常有人用lucene做排序出现内存溢出的问题,一般是因为每次查询都启动新的searcher实例进行查询,当并发大的时候,造成多个Searcher实例同时装载排序字段,引起内存溢出。

org.apache.lucene.search.FieldCache
Expert: Maintains caches of term values.

缓存解决方案

Lucene缓存的生存周期都是在一个IndexReader实例内,因此提高Lucene查询性能的关键在于如何维护和使用同一个IndexReader(IndexSearcher)
Please read full article from lucene的缓存机制和实现方案

No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts