All About Programming: Inverted Index

Inverted Index | Hello World

Inverted Index is used to find Documents that contains certain word. First for every document, we develop the forward index, which is basically all the words in the Document. Then we develop a inverted index. Here is an example

Given the texts

T[0] = "it is what it is"
T[1] = "what is it"
T[2] = "it is a banana"

we have the following inverted file index (where the integers in the set notation brackets refer to the indexes (or keys) of the text symbols, T[0], T[1] etc.):

"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}

We could contains more info to help us search for consecutive terms, i, e

With the same texts, we get the following full inverted index, where the pairs are document numbers and local word numbers. Like the document numbers, local word numbers also begin with zero. So, "banana": {(2, 3)} means the word "banana" is in the third document (T[2]), and it is the fourth word in that document (position 3).

"a": {(2, 2)}
"banana": {(2, 3)}
"is": {(0, 1), (0, 4), (1, 1), (2, 1)}
"it": {(0, 0), (0, 3), (1, 2), (2, 0)}
"what": {(0, 2), (1, 0)}

If we run a phrase search for "what is it" we get hits for all the words in both document 0 and 1. But the terms occur consecutively only in document 1.

Read full article from Inverted Index | Hello World

Inverted Index | Hello World

No comments:

Post a Comment

Labels

Popular Posts