Lucene4.3进阶开发之柳暗花明( 六) - IndexSearcher
本篇文章散仙要介绍的是IndexSearcher这个类,这个类是Lucene在进行检索时必不可少的一个组件,可以称为是检索的入口,通过这个入口之后,我们就可以获取与我们检索的关键词相关的一系列Doc,然后我们就可以进行后续相关的业务处理。
IndexSearcher的并行构造,如何使用多线程来提升检索性能。
大多数时候,我们默认使用的都是单线程检索,这时候的检索总耗时是顺序检索所有段文件的时间之和,而如果我们使用了并行检索,这时候,我们的检索总耗时,其实就是检索段文件里,耗时最大的那个线程的时间,因为我们是并行检索,所以影响耗时的其实就是检索耗时最长的那个线程的耗时.
这个并行优化的功能,最适合的场景就是我的索引非常大,然后我们把这份索引,压缩成了多个段文件,可能有5个,或者10个以上的段文件,这时候利用这个功能,检索就有很大优势了.
Please read full article from Lucene4.3进阶开发之柳暗花明( 六) - IndexSearcher
本篇文章散仙要介绍的是IndexSearcher这个类,这个类是Lucene在进行检索时必不可少的一个组件,可以称为是检索的入口,通过这个入口之后,我们就可以获取与我们检索的关键词相关的一系列Doc,然后我们就可以进行后续相关的业务处理。
- Directory directory=FSDirectory.open(new File("D:\\索引测试"));//获取一个索引目录
- IndexReader reader=DirectoryReader.open(directory);//返回一个复合Reader=》DirectoryReader
- //构造IndexSearcher 检索环境
- IndexSearcher searcher=new IndexSearcher(reader);
- final IndexReader reader; // package private for testing!
- // NOTE: these members might change in incompatible ways
- // in the next release
- protected final IndexReaderContext readerContext;
- protected final List<AtomicReaderContext> leafContexts;
- /** used with executor - each slice holds a set of leafs executed within one thread */
- protected final LeafSlice[] leafSlices;
- // These are only used for multi-threaded search
- private final ExecutorService executor;
- // the default Similarity
- private static final Similarity defaultSimilarity = new DefaultSimilarity();
- /** The Similarity implementation used by this searcher. */
- private Similarity similarity = defaultSimilarity;
- /** Creates a searcher searching the provided index. */
- public IndexSearcher(IndexReader r) {
- //调用的是2参的构造函数
- this(r,null);
- }
- /** Runs searches for each segment separately, using the
- * provided ExecutorService. IndexSearcher will not
- * shutdown/awaitTermination this ExecutorService on
- * close; you must do so, eventually, on your own. NOTE:
- * if you are using {@link NIOFSDirectory}, do not use
- * the shutdownNow method of ExecutorService as this uses
- * Thread.interrupt under-the-hood which can silently
- * close file descriptors (see <a
- * href="https://issues.apache.org/jira/browse/LUCENE-2239">LUCENE-2239</a>).
- *
- * @lucene.experimental */
- public IndexSearcher(IndexReader r, ExecutorService executor) {
- this(r.getContext(), executor);
- }
方法名 | 描述 |
IndexSearcher(IndexReader r) | 构建一个搜索实例,使用指定的Reader |
IndexSearcher(IndexReader r, ExecutorService executor) | 创建一个并行的检索实例,使用ExecutorService 提供的线程池 |
doc(int docID) | 通过一个docid获取一个对应的doc |
explain(Query query, int doc) | 获取query详细的评分依据信息 |
getIndexReader() | 获取IndexReader实例 |
search(Query query, int n) | 获取前N个检索的结果 |
search(Query query, Collector results) | 通过collector对检索结果进行自定义控制 |
search(Query query, Filter filter, Collector results) | 通过检索,过滤,以及收集,获取一个特定的检索结果 |
search(Query query, Filter filter, int n) | 经过滤后 的前N个结果 |
search(Query query, Filter filter, int n, Sort sort) | 经过滤,排序后的前n个结果 |
search(Query query, Filter filter, int n, Sort sort, boolean doDocScores, boolean doMaxScore) | 对排序后的结果,是否开启评分策略 |
searchAfter(ScoreDoc after, Query query, int n) | 检索上一次query后的数据,通常用来分页使用 |
setSimilarity(Similarity similarity) | 设置自定义的打分策略 |
search(Weight weight, int nDocs, Sort sort, boolean doDocScores, boolean doMaxScore) | 检索指定分数以上的结果 |
大多数时候,我们默认使用的都是单线程检索,这时候的检索总耗时是顺序检索所有段文件的时间之和,而如果我们使用了并行检索,这时候,我们的检索总耗时,其实就是检索段文件里,耗时最大的那个线程的时间,因为我们是并行检索,所以影响耗时的其实就是检索耗时最长的那个线程的耗时.
这个并行优化的功能,最适合的场景就是我的索引非常大,然后我们把这份索引,压缩成了多个段文件,可能有5个,或者10个以上的段文件,这时候利用这个功能,检索就有很大优势了.
- if (executor == null) {
- return search(leafContexts, weight, after, nDocs);
- } else {
- //通过一个公用的队列,来合并结果集
- final HitQueue hq = new HitQueue(nDocs, false);
- final Lock lock = new ReentrantLock();//锁
- final ExecutionHelper<TopDocs> runner = new ExecutionHelper<TopDocs>(executor);
- for (int i = 0; i < leafSlices.length; i++) { // search each sub
- runner.submit(new SearcherCallableNoSort(lock, this, leafSlices[i], weight, after, nDocs, hq));
- }
- int totalHits = 0;
- float maxScore = Float.NEGATIVE_INFINITY;
- for (final TopDocs topDocs : runner) {
- if(topDocs.totalHits != 0) {
- totalHits += topDocs.totalHits;
- maxScore = Math.max(maxScore, topDocs.getMaxScore());
- }
- }
- //最后从队列里,取值给ScoreDoc进行返回
- final ScoreDoc[] scoreDocs = new ScoreDoc[hq.size()];
- for (int i = hq.size() - 1; i >= 0; i--) // put docs in array
- scoreDocs[i] = hq.pop();
- private static final class SearcherCallableNoSort implements Callable<TopDocs> {
- private final Lock lock;
- private final IndexSearcher searcher;
- private final Weight weight;
- private final ScoreDoc after;
- private final int nDocs;
- private final HitQueue hq;
- private final LeafSlice slice;
- public SearcherCallableNoSort(Lock lock, IndexSearcher searcher, LeafSlice slice, Weight weight,
- ScoreDoc after, int nDocs, HitQueue hq) {
- this.lock = lock;
- this.searcher = searcher;
- this.weight = weight;
- this.after = after;
- this.nDocs = nDocs;
- this.hq = hq;
- this.slice = slice;
- }
- @Override
- public TopDocs call() throws IOException {
- final TopDocs docs = searcher.search(Arrays.asList(slice.leaves), weight, after, nDocs);
- final ScoreDoc[] scoreDocs = docs.scoreDocs;
- //it would be so nice if we had a thread-safe insert
- lock.lock();
- try {
- for (int j = 0; j < scoreDocs.length; j++) { // merge scoreDocs into hq
- final ScoreDoc scoreDoc = scoreDocs[j];
- if (scoreDoc == hq.insertWithOverflow(scoreDoc)) {
- break;
- }
- }
- } finally {
- lock.unlock();
- }
- return docs;
- }
- }
Please read full article from Lucene4.3进阶开发之柳暗花明( 六) - IndexSearcher
No comments:
Post a Comment