public class TrecContentSource extends ContentSource
ContentSource over the TREC collection.
Supports the following configuration parameters (on top of
ContentSource):
TrecDocParser class to use for
parsing the TREC documents content (default=TrecGov2Parser).
HTMLParser class to use for
parsing the HTML parts of the TREC documents content (default=DemoHTMLParser).
| Modifier and Type | Field and Description |
|---|---|
static java.lang.String |
DOC |
static java.lang.String |
DOCNO |
static java.lang.String |
NEW_LINE
separator between lines in the byffer
|
static java.lang.String |
TERMINATING_DOC |
static java.lang.String |
TERMINATING_DOCNO |
encoding, forever, logStep, verbose| Constructor and Description |
|---|
TrecContentSource() |
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Called when reading from this content source is no longer required.
|
DocData |
getNextDocData(DocData docData)
Returns the next
DocData from the content source. |
java.util.Date |
parseDate(java.lang.String dateStr) |
void |
resetInputs()
Resets the input for this content source, so that the test would behave as
if it was just started, input-wise.
|
void |
setConfig(Config config)
Sets the
Config for this content source. |
addBytes, addItem, collectFiles, getBytesCount, getConfig, getItemsCount, getTotalBytesCount, getTotalItemsCount, printStatistics, shouldLogpublic static final java.lang.String DOCNO
public static final java.lang.String TERMINATING_DOCNO
public static final java.lang.String DOC
public static final java.lang.String TERMINATING_DOC
public static final java.lang.String NEW_LINE
public java.util.Date parseDate(java.lang.String dateStr)
public void close()
throws java.io.IOException
ContentItemsSourceclose in interface java.io.Closeableclose in interface java.lang.AutoCloseableclose in class ContentItemsSourcejava.io.IOExceptionpublic DocData getNextDocData(DocData docData) throws NoMoreDataException, java.io.IOException
ContentSourceDocData from the content source.
Implementations must account for multi-threading, as multiple threads
can call this method simultaneously.getNextDocData in class ContentSourceNoMoreDataExceptionjava.io.IOExceptionpublic void resetInputs()
throws java.io.IOException
ContentItemsSourceNOTE: the default implementation resets the number of bytes and items generated since the last reset, so it's important to call super.resetInputs in case you override this method.
resetInputs in class ContentItemsSourcejava.io.IOExceptionpublic void setConfig(Config config)
ContentItemsSourceConfig for this content source. If you override this
method, you must call super.setConfig.setConfig in class ContentItemsSourceCopyright © 2000-2022 Apache Software Foundation. All Rights Reserved.