public final class CJKAnalyzer extends StopwordAnalyzerBase
Analyzer that tokenizes text with StandardTokenizer,
normalizes content with CJKWidthFilter, folds case with
LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter,
and filters stopwords with StopFilterAnalyzer.ReuseStrategy, Analyzer.TokenStreamComponents| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
DEFAULT_STOPWORD_FILE
File containing default CJK stopwords.
|
stopwordsGLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY| Constructor and Description |
|---|
CJKAnalyzer()
Builds an analyzer which removes words in
getDefaultStopSet(). |
CJKAnalyzer(CharArraySet stopwords)
Builds an analyzer with the given stop words
|
| Modifier and Type | Method and Description |
|---|---|
protected Analyzer.TokenStreamComponents |
createComponents(java.lang.String fieldName)
Creates a new
Analyzer.TokenStreamComponents instance for this analyzer. |
static CharArraySet |
getDefaultStopSet()
Returns an unmodifiable instance of the default stop-words set.
|
protected TokenStream |
normalize(java.lang.String fieldName,
TokenStream in)
Wrap the given
TokenStream in order to apply normalization filters. |
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSetattributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, getVersion, initReader, initReaderForNormalization, normalize, setVersion, tokenStream, tokenStreampublic static final java.lang.String DEFAULT_STOPWORD_FILE
Currently it contains some common English words that are not usually useful for searching and some double-byte interpunctions.
public CJKAnalyzer()
getDefaultStopSet().public CJKAnalyzer(CharArraySet stopwords)
stopwords - a stopword setpublic static CharArraySet getDefaultStopSet()
protected Analyzer.TokenStreamComponents createComponents(java.lang.String fieldName)
AnalyzerAnalyzer.TokenStreamComponents instance for this analyzer.createComponents in class AnalyzerfieldName - the name of the fields content passed to the
Analyzer.TokenStreamComponents sink as a readerAnalyzer.TokenStreamComponents for this analyzer.protected TokenStream normalize(java.lang.String fieldName, TokenStream in)
AnalyzerTokenStream in order to apply normalization filters.
The default implementation returns the TokenStream as-is. This is
used by Analyzer.normalize(String, String).Copyright © 2000–2025 The Apache Software Foundation. All rights reserved.