Lucene Indexes Fields
Conceptually, Lucene provides indexing and search over documents, but implementation-wise, all indexing and search is carried out over fields. A document is a collection of fields. Each field has three parts: name, type, and value. At search time, the supplied field name restricts the search to particular fields.
For example, a MEDLINE citation can be represented as a series of fields: one field for the name of the article, another field for name of the journal in which it was published, another field for the authors of the article, a pub-date field for the date of publication, a field for the text of the article’s abstract, and another field for the list of topic keywords drawn from Medical Subject Headings (MeSH). Each of these fields is given a different name, and at search time, the client could specify that it was searching for authors or titles or both, potentially restricting to a date range and set of journals by constructing search terms for the appropriate fields and values.
Read full article from Lucene 4 Essentials for Text Search and Indexing | LingPipe Blog
No comments:
Post a Comment