All About Programming: Lucene4.3进阶开发之漫漫修行( 四)

Lucene4.3进阶开发之漫漫修行( 四) - IndexWriterConfig
IndexWriterConfig这个类并不是一个顶级基类，在它的上面还有一个父类LiveIndexWriterConfig，我们先来分析下这个父类的一些作用，LiveIndexWriterConfig这个类是4.0以后新扩展的父类，在4.0之前并没有这个类，那么引入这个类的作用是什么呢？

我们先来看下LiveIndexWriterConfig里面的部分源码：

private final Analyzer analyzer;
private volatile int maxBufferedDocs;
private volatile double ramBufferSizeMB;
private volatile int maxBufferedDeleteTerms;
private volatile int readerTermsIndexDivisor;
private volatile IndexReaderWarmer mergedSegmentWarmer;
private volatile int termIndexInterval; // TODO: this should be private to the codec, not settable here
// modified by IndexWriterConfig
/** {@link IndexDeletionPolicy} controlling when commit
* points are deleted. */
protected volatile IndexDeletionPolicy delPolicy;
/** {@link IndexCommit} that {@link IndexWriter} is
* opened on. */
protected volatile IndexCommit commit;
/** {@link OpenMode} that {@link IndexWriter} is opened
* with. */
protected volatile OpenMode openMode;
/** {@link Similarity} to use when encoding norms. */
protected volatile Similarity similarity;
/** {@link MergeScheduler} to use for running merges. */
protected volatile MergeScheduler mergeScheduler;
/** Timeout when trying to obtain the write lock on init. */
protected volatile long writeLockTimeout;
/** {@link IndexingChain} that determines how documents are
* indexed. */
protected volatile IndexingChain indexingChain;
/** {@link Codec} used to write new segments. */
protected volatile Codec codec;
/** {@link InfoStream} for debugging messages. */
protected volatile InfoStream infoStream;
/** {@link MergePolicy} for selecting merges. */
protected volatile MergePolicy mergePolicy;
/** {@code DocumentsWriterPerThreadPool} to control how
* threads are allocated to {@code DocumentsWriterPerThread}. */
protected volatile DocumentsWriterPerThreadPool indexerThreadPool;
/** True if readers should be pooled. */
protected volatile boolean readerPooling;
/** {@link FlushPolicy} to control when segments are
* flushed. */
protected volatile FlushPolicy flushPolicy;
/** Sets the hard upper bound on RAM usage for a single
* segment, after which the segment is forced to flush. */
protected volatile int perThreadHardLimitMB;
/** {@link Version} that {@link IndexWriter} should emulate. */
protected final Version matchVersion;
/** True if segment flushes should use compound file format */
protected volatile boolean useCompoundFile = IndexWriterConfig.DEFAULT_USE_COMPOUND_FILE_SYSTEM;

看过之后，我们就会发现这个类里面，除了版本号和分词器是普通的成员变量外，其他的filed都有一个volite关键字修饰，从这个特点上，我们其实就可以看出点猫腻，这个类的主要作用，除了保存一个全局的配置信息外，其实就是抽象了一些IndexWriterConfig一些通用的全局变量，注意这个全局指的是基于JVM主存可见的，意思就是只要这个类的某个属性发生改变，那么这个变化就会立刻反映在主存中，这时候所有这个类的子类也就是IndexWriterConfig就会立刻获取最新的动态信息，从而做出相应的改变。
其中重要的方法有设置最大的文档数，设置最大的缓冲大小，设置删除合并策略，设置是否开启复合索引，以及设置一些自定义的打分策略等等。

IndexWriterConfig是LiveIndexWriterConfig的子类，里面的大部分filed都是静态的常量，这个类的作用直接继承自它的父类，也是起到一个全局配置的作用，给IndexWriter提供一系列初始化的配置参数。

IndexWriterConfig里面的一个静态内部类OpenMode的作用:
CREATE模式：这个模式下，每次新建的索引都会先清空上次索引的目录，然后在新建当前的索引，注意可以不用事先创建索引目录，这个模式一般是测试时候用的。

APPEND模式：这个模式下，每次新添加的索引，会被追加到原来的索引里，有一点需要注意的是，如果这个索引路径不存在的话，这个操作，将会导致报出一个异常，所以，使用此模式前，务必确定你有一个已经创建好的索引。

CREATE_OR_APPEND模式：这个模式就是我们默认的模式，也是比较安全或者比较通用的模式，如果这个索引不存在，那么在此模式下就会新建一个索引目录，如果已存在，那么在添加文档的时候，直接会以Append的方式追加到索引里，所以此模式下，并不会出现一些意外的情况，所以大多数时候，建议使用此方式，进行构建索引。
Please read full article from Lucene4.3进阶开发之漫漫修行( 四) - IndexWriterConfig

Lucene4.3进阶开发之漫漫修行( 四) - IndexWriterConfig

No comments:

Post a Comment

Labels

Popular Posts