public final class PolishAnalyzer extends StopwordAnalyzerBase
Analyzer for Polish.Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
DEFAULT_STEMMER_FILE
File containing default Polish stemmer table.
|
static java.lang.String |
DEFAULT_STOPWORD_FILE
File containing default Polish stopwords.
|
stopwordsGLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY| Constructor and Description |
|---|
PolishAnalyzer()
Builds an analyzer with the default stop words:
DEFAULT_STOPWORD_FILE. |
PolishAnalyzer(CharArraySet stopwords)
Builds an analyzer with the given stop words.
|
PolishAnalyzer(CharArraySet stopwords,
CharArraySet stemExclusionSet)
Builds an analyzer with the given stop words.
|
| Modifier and Type | Method and Description |
|---|---|
protected Analyzer.TokenStreamComponents |
createComponents(java.lang.String fieldName)
Creates a
Analyzer.TokenStreamComponents
which tokenizes all the text in the provided Reader. |
static CharArraySet |
getDefaultStopSet()
Returns an unmodifiable instance of the default stop words set.
|
static Trie |
getDefaultTable()
Returns an unmodifiable instance of the default stemmer table.
|
protected TokenStream |
normalize(java.lang.String fieldName,
TokenStream in)
Wrap the given
TokenStream in order to apply normalization filters. |
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSetattributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, getVersion, initReader, initReaderForNormalization, normalize, setVersion, tokenStream, tokenStreampublic static final java.lang.String DEFAULT_STOPWORD_FILE
public static final java.lang.String DEFAULT_STEMMER_FILE
public PolishAnalyzer()
DEFAULT_STOPWORD_FILE.public PolishAnalyzer(CharArraySet stopwords)
stopwords - a stopword setpublic PolishAnalyzer(CharArraySet stopwords, CharArraySet stemExclusionSet)
SetKeywordMarkerFilter before
stemming.stopwords - a stopword setstemExclusionSet - a set of terms not to be stemmedpublic static CharArraySet getDefaultStopSet()
public static Trie getDefaultTable()
protected Analyzer.TokenStreamComponents createComponents(java.lang.String fieldName)
Analyzer.TokenStreamComponents
which tokenizes all the text in the provided Reader.createComponents in class AnalyzerfieldName - the name of the fields content passed to the
Analyzer.TokenStreamComponents sink as a readerAnalyzer.TokenStreamComponents
built from an StandardTokenizer filtered with
LowerCaseFilter, StopFilter
, SetKeywordMarkerFilter if a stem exclusion set is
provided and StempelFilter.protected TokenStream normalize(java.lang.String fieldName, TokenStream in)
AnalyzerTokenStream in order to apply normalization filters.
The default implementation returns the TokenStream as-is. This is
used by Analyzer.normalize(String, String).Copyright © 2000–2025 The Apache Software Foundation. All rights reserved.