public final class ICUTokenizer
extends org.apache.lucene.analysis.Tokenizer
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the ICUTokenizerConfig
ICUTokenizerConfig| Constructor and Description |
|---|
ICUTokenizer(java.io.Reader input)
Construct a new ICUTokenizer that breaks text into words from the given
Reader.
|
ICUTokenizer(java.io.Reader input,
ICUTokenizerConfig config)
Construct a new ICUTokenizer that breaks text into words from the given
Reader, using a tailored BreakIterator configuration.
|
| Modifier and Type | Method and Description |
|---|---|
void |
end() |
boolean |
incrementToken() |
void |
reset() |
void |
reset(java.io.Reader input) |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toStringpublic ICUTokenizer(java.io.Reader input)
The default script-specific handling is used.
input - Reader containing text to tokenize.DefaultICUTokenizerConfigpublic ICUTokenizer(java.io.Reader input,
ICUTokenizerConfig config)
input - Reader containing text to tokenize.config - Tailored BreakIterator configurationpublic boolean incrementToken()
throws java.io.IOException
incrementToken in class org.apache.lucene.analysis.TokenStreamjava.io.IOExceptionpublic void reset()
throws java.io.IOException
reset in class org.apache.lucene.analysis.TokenStreamjava.io.IOExceptionpublic void reset(java.io.Reader input)
throws java.io.IOException
reset in class org.apache.lucene.analysis.Tokenizerjava.io.IOExceptionpublic void end()
throws java.io.IOException
end in class org.apache.lucene.analysis.TokenStreamjava.io.IOExceptionCopyright © 2000-2022 Apache Software Foundation. All Rights Reserved.