public class TFTermPruningPolicy extends TermPruningPolicy
Larger threshold value will produce a smaller index.
See TermPruningPolicy for size vs performance considerations.
This implementation uses simple term frequency thresholds to remove all postings from documents where a given term occurs rarely (i.e. its TF in a document is smaller than the threshold).
Threshold values in this method are expressed as absolute term frequencies.
| Modifier and Type | Field and Description |
|---|---|
protected int |
curThr |
protected int |
defThreshold |
protected java.util.Map<java.lang.String,java.lang.Integer> |
thresholds |
fieldFlags, inDEL_ALL, DEL_PAYLOADS, DEL_POSTINGS, DEL_STORED, DEL_VECTOR| Constructor and Description |
|---|
TFTermPruningPolicy(IndexReader in,
java.util.Map<java.lang.String,java.lang.Integer> fieldFlags,
java.util.Map<java.lang.String,java.lang.Integer> thresholds,
int defThreshold) |
| Modifier and Type | Method and Description |
|---|---|
void |
initPositionsTerm(TermPositions in,
Term t)
Called when moving
TermPositions to a new Term. |
boolean |
pruneAllPositions(TermPositions termPositions,
Term t)
Prune all postings per term (invoked once per term per doc)
|
int |
pruneSomePositions(int docNum,
int[] positions,
Term curTerm)
Prune some postings per term (invoked once per term per doc).
|
boolean |
pruneTermEnum(TermEnum te)
Pruning of all postings for a term (invoked once per term).
|
int |
pruneTermVectorTerms(int docNumber,
java.lang.String field,
java.lang.String[] terms,
int[] freqs,
TermFreqVector tfv)
Pruning of individual terms in term vectors.
|
pruneAllFieldPostings, prunePayload, pruneWholeTermVectorprotected java.util.Map<java.lang.String,java.lang.Integer> thresholds
protected int defThreshold
protected int curThr
public TFTermPruningPolicy(IndexReader in, java.util.Map<java.lang.String,java.lang.Integer> fieldFlags, java.util.Map<java.lang.String,java.lang.Integer> thresholds, int defThreshold)
public boolean pruneTermEnum(TermEnum te) throws java.io.IOException
TermPruningPolicypruneTermEnum in class TermPruningPolicyte - positioned term enum.java.io.IOExceptionpublic void initPositionsTerm(TermPositions in, Term t) throws java.io.IOException
TermPruningPolicyTermPositions to a new Term.initPositionsTerm in class TermPruningPolicyin - input term positionst - current termjava.io.IOExceptionpublic boolean pruneAllPositions(TermPositions termPositions, Term t) throws java.io.IOException
TermPruningPolicypruneAllPositions in class TermPruningPolicytermPositions - positioned term positions. Implementations MUST NOT
advance this by calling TermPositions methods that advance either
the position pointer (next, skipTo) or term pointer (seek).t - current termjava.io.IOExceptionpublic int pruneTermVectorTerms(int docNumber,
java.lang.String field,
java.lang.String[] terms,
int[] freqs,
TermFreqVector tfv)
throws java.io.IOException
TermPruningPolicypruneTermVectorTerms in class TermPruningPolicydocNumber - document numberfield - field nameterms - array of termsfreqs - array of term frequenciestfv - the original term frequency vectorjava.io.IOExceptionpublic int pruneSomePositions(int docNum,
int[] positions,
Term curTerm)
TermPruningPolicypruneSomePositions in class TermPruningPolicydocNum - current document numberpositions - original term positions in the document (and indirectly
term frequency)curTerm - current termCopyright © 2000-2022 Apache Software Foundation. All Rights Reserved.