A Hadoop {@code Writable} that encodes the position of term occurrences within a document. Termoccurrences are represented as an array of ints, where each int represents a term position. These objects serve as intermediate values in building document-sorted inverted indexes.
In serialized form, term positions are represented as first-order differences (i.e., position gaps or p-gaps) using Gamma encoding. As an example, let's say a term has a term frequency of 5, at token positions [3, 53, 58, 90, 101]. Such an object would be encoded as the following sequence of ints: 3, 50, 5, 32, 11, each of which is expressed using Gamma codes. Every int except the first represents the difference between the previous term position and the current term position.
@author Jimmy LinThe document and frequency are the same as for a TermDocs. The positions portion lists the ordinal positions of each occurrence of a term in a document. @see IndexReader#termPositions
The document and frequency are the same as for a TermDocs. The positions portion lists the ordinal positions of each occurrence of a term in a document. @see IndexReader#termPositions()
|
|