Package org.eclipse.rdf4j.sail.lucene
Class LuceneSail
- java.lang.Object
- 
- org.eclipse.rdf4j.sail.helpers.SailWrapper
- 
- org.eclipse.rdf4j.sail.helpers.NotifyingSailWrapper
- 
- org.eclipse.rdf4j.sail.lucene.LuceneSail
 
 
 
- 
- All Implemented Interfaces:
- FederatedServiceResolverClient,- NotifyingSail,- Sail,- StackableSail
 
 public class LuceneSail extends NotifyingSailWrapper A LuceneSail wraps an arbitrary existing Sail and extends it with support for full-text search on all Literals.Setting up a LuceneSailLuceneSail works in two modes: storing its data into a directory on the harddisk or into a RAMDirectory in RAM (which is discarded when the program ends). Example with storage in a folder:// create a sesame memory sail MemoryStore memoryStore = new MemoryStore(); // create a lucenesail to wrap the memorystore LuceneSail lucenesail = new LuceneSail(); // set this parameter to store the lucene index on disk lucenesail.setParameter(LuceneSail.LUCENE_DIR_KEY, "./data/mydirectory"); // wrap memorystore in a lucenesail lucenesail.setBaseSail(memoryStore); // create a Repository to access the sails SailRepository repository = new SailRepository(lucenesail); repository.initialize(); Example with storage in a RAM directory: // create a sesame memory sail MemoryStore memoryStore = new MemoryStore(); // create a lucenesail to wrap the memorystore LuceneSail lucenesail = new LuceneSail(); // set this parameter to let the lucene index store its data in ram lucenesail.setParameter(LuceneSail.LUCENE_RAMDIR_KEY, "true"); // wrap memorystore in a lucenesail lucenesail.setBaseSail(memoryStore); // create a Repository to access the sails SailRepository repository = new SailRepository(lucenesail); Asking full-text queriesText queries are expressed using the virtual properties of the LuceneSail.In SPARQL: 
 When defining queries, these properties type and query are mandatory. Also, the matches relation is mandatory. When one of these misses, the query will not be executed as expected. The failure behavior can be configured, setting the Sail property "incompletequeryfail" to true will throw a SailException when such patterns are found, this is the default behavior to help finding inaccurate queries. Set it to false to have warnings logged instead. Multiple queries can be issued to the sail, the results of the queries will be integrated. Note that you cannot use the same variable for multiple Text queries, if you want to combine text searches, use Lucenes query syntax.SELECT ?subject ?score ?snippet ?resource WHERE { ?subject <http://www.openrdf.org/contrib/lucenesail#matches> [ a <http://www.openrdf.org/contrib/lucenesail#LuceneQuery> ; <http://www.openrdf.org/contrib/lucenesail#query> "my Lucene query" ; <http://www.openrdf.org/contrib/lucenesail#score> ?score ; <http://www.openrdf.org/contrib/lucenesail#snippet> ?snippet ; <http://www.openrdf.org/contrib/lucenesail#resource> ?resource ] }Fields are stored/indexedAll fields are stored and indexed. The "text" fields (gathering all literals) have to be stored, because when a new literal is added to a document, the previous texts need to be copied from the existing document to the new Document, this does not work when they are only "indexed". Fields that are not stored, cannot be retrieved using full-text querying.Deleting a Lucene indexAt the moment, deleting the lucene index can be done in two ways:- Delete the folder where the data is stored while the application is not running
- Call the repository's
 RepositoryConnection.clear(org.eclipse.rdf4j.model.Resource[])clear(). This will delete the index.
 Handling of ContextsEach lucene document contains a field for every contextIDs that contributed to the document. NULL contexts are marked using the StringSearchFields.CONTEXT_NULL("null") and stored in the lucene fieldSearchFields.CONTEXT_FIELD_NAME("context"). This means that when adding/appending to a document, all additional context-uris are added to the document. When deleting individual triples, the context is ignored. In clear(Resource ...) we make a query on all Lucene-Documents that were possibly created by this context(s). Given a document D that context C(1-n) contributed to. D' is the new document after clear(). - if there is only one C then D can be safely removed. There is no D' (I hope this is the standard case: like in ontologies, where all triples about a resource are in one document) - if there are multiple C, remember the uri of D, delete D, and query (s,p,o, ?) from the underlying store after committing the operation- this returns the literals of D', add D' as new document This will probably be both fast in the common case and capable enough in the multiple-C case.Defining the indexed FieldsThe propertyINDEXEDFIELDSis to configure which fields to index and to project a property to another. Syntax:# only index label and comment index.1=http://www.w3.org/2000/01/rdf-schema#label index.2=http://www.w3.org/2000/01/rdf-schema#comment # project http://xmlns.com/foaf/0.1/name to rdfs:label http\://xmlns.com/foaf/0.1/name=http\://www.w3.org/2000/01/rdf-schema#label Set and select Lucene sail by idThe propertyINDEX_IDis to configure the id of the index and filter every request without the search:indexid predicate, the request would be:?subj search:matches [ search:indexid my:lucene_index_id; search:query "search terms..."; search:property my:property; search:score ?score; search:snippet ?snippet ] .If a LuceneSail is using another LuceneSail as a base sail, the evaluation mode should be set to TupleFunctionEvaluationMode.NATIVE.Defining the indexed Types/LanguagesThe propertiesINDEXEDTYPESandINDEXEDLANGare to configure which fields to index by their language or type.INDEXEDTYPESSyntax:# only index object of rdf:type ex:mytype1, rdf:type ex:mytype2 or ex:mytypedef ex:mytype3 http\://www.w3.org/1999/02/22-rdf-syntax-ns#type=http://example.org/mytype1 http://example.org/mytype2 http\://example.org/mytypedef=http://example.org/mytype3 INDEXEDLANGSyntax:# syntax to index only French(fr) and English(en) literals fr en DatatypesDatatypes are ignored in the LuceneSail.
- 
- 
Field SummaryFields Modifier and Type Field Description static StringANALYZER_CLASS_KEYSet this key as sail parameter to configure the Lucene analyzer class implementation to use for text analysis.static StringDEFAULT_INDEX_CLASSstatic StringDEFAULT_LUCENE_DIRSet the default directory of the Lucene index files.static StringEVALUATION_MODE_KEYstatic StringFUZZY_PREFIX_LENGTH_KEYSet this key as sail parameter to influence the fuzzy prefix length.static StringINCOMPLETE_QUERY_FAIL_KEYSet this key as sail parameter to influence whether incomplete queries are treated as failure (Malformed queries) or whether they are ignored.static StringINDEX_CLASS_KEYSet this key to configure the SearchIndex class implementation.static StringINDEX_IDSet this key to configure the filtering of queries, if this parameter is set, the match object should contain the search:indexid parameter, see the syntax abovestatic StringINDEX_TYPE_BACKTRACE_MODEstatic StringINDEXEDFIELDSSet the parameter "indexedfields=..." to configure a selection of fields to index, and projections of properties.static StringINDEXEDLANGSet the parameter "indexedlang=..." to configure a selection of field language to index.static StringINDEXEDTYPESSet the parameter "indexedtypes=..." to configure a selection of field type to index.static StringLUCENE_DIR_KEYSet the key "lucenedir=<path>" as sail parameter to configure the Lucene Directory on the filesystem where to store the lucene index.static StringLUCENE_RAMDIR_KEYSet the key "useramdir=true" as sail parameter to let the LuceneSail store its Lucene index in RAM.static StringMAX_DOCUMENTS_KEYSet the key "maxDocuments=<n>" as sail parameter to limit the maximum number of documents to return from a search query.protected Propertiesparametersstatic StringQUERY_ANALYZER_CLASS_KEYSet this key as sail parameter to configure the Lucene analyzer class implementation used for query analysis.static StringREINDEX_QUERY_KEYSet the parameter "reindexQuery=" to configure the statements to index over.static StringSIMILARITY_CLASS_KEYSet this key as sail parameter to configureSimilarityclass implementation to use for text analysis.static StringWKT_FIELDSSet this key to configure which fields contain WKT and should be spatially indexed.
 - 
Constructor SummaryConstructors Constructor Description LuceneSail()
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected booleanacceptStatementToIndex(Statement s)protected static SearchIndexcreateSearchIndex(Properties parameters)Deprecated.NotifyingSailConnectiongetConnection()Opens a connection on the Sail which can be used to query and update data.TupleFunctionEvaluationModegetEvaluationMode()See EVALUATION_MODE_KEY parameter.FederatedServiceResolvergetFederatedServiceResolver()Gets theFederatedServiceResolverused by this client.TypeBacktraceModegetIndexBacktraceMode()SeeINDEX_TYPE_BACKTRACE_MODEparameter.SearchIndexgetLuceneIndex()StringgetParameter(String key)Set<String>getParameterNames()StringgetReindexQuery()See REINDEX_QUERY_KEY parameter.protected Collection<SearchQueryInterpreter>getSearchQueryInterpreters()TupleFunctionRegistrygetTupleFunctionRegistry()voidinit()Initializes the Sail.protected voidinitializeLuceneIndex()booleanisIncompleteQueryFails()When this is true, incomplete queries will trigger a SailException.StatementmapStatement(Statement statement)voidregisterStatementFilter(IndexableStatementFilter filter)Sets a filter which determines whether a statement should be considered for indexing when performing complete reindexing.voidreindex()Starts a reindexation process of the whole sail.voidsetDataDir(File dataDir)Sets the data directory for the Sail.voidsetEvaluationMode(TupleFunctionEvaluationMode mode)See EVALUATION_MODE_KEY parameter.voidsetFederatedServiceResolver(FederatedServiceResolver resolver)Sets theFederatedServiceResolverto use for this client.voidsetFuzzyPrefixLength(int fuzzyPrefixLength)voidsetIncompleteQueryFails(boolean incompleteQueryFails)Set this to true, so that incomplete queries will trigger a SailException.voidsetIndexBacktraceMode(TypeBacktraceMode mode)SeeINDEX_TYPE_BACKTRACE_MODEparameter.voidsetLuceneIndex(SearchIndex luceneIndex)voidsetParameter(String key, String value)voidsetReindexQuery(String query)See REINDEX_QUERY_KEY parameter.voidsetTupleFunctionRegistry(TupleFunctionRegistry registry)voidshutDown()Shuts down the Sail, giving it the opportunity to synchronize any stale data.- 
Methods inherited from class org.eclipse.rdf4j.sail.helpers.NotifyingSailWrapperaddSailChangedListener, getBaseSail, removeSailChangedListener, setBaseSail
 - 
Methods inherited from class org.eclipse.rdf4j.sail.helpers.SailWrappergetDataDir, getDefaultIsolationLevel, getSupportedIsolationLevels, getValueFactory, isWritable, verifyBaseSailSet
 - 
Methods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 - 
Methods inherited from interface org.eclipse.rdf4j.sail.SailgetDataDir, getDefaultIsolationLevel, getSupportedIsolationLevels, getValueFactory, isWritable
 
- 
 
- 
- 
- 
Field Detail- 
REINDEX_QUERY_KEYpublic static final String REINDEX_QUERY_KEY Set the parameter "reindexQuery=" to configure the statements to index over. Default value is "SELECT ?s ?p ?o ?c WHERE {{?s ?p ?o} UNION {GRAPH ?c {?s ?p ?o.}}} ORDER BY ?s" . NB: the query must contain the bindings ?s, ?p, ?o and ?c and must be ordered by ?s.- See Also:
- Constant Field Values
 
 - 
INDEXEDFIELDSpublic static final String INDEXEDFIELDS Set the parameter "indexedfields=..." to configure a selection of fields to index, and projections of properties. Only the configured fields will be indexed. A property P projected to Q will cause the index to contain Q instead of P, when triples with P were indexed. Syntax of indexedfields - see above- See Also:
- Constant Field Values
 
 - 
INDEXEDTYPESpublic static final String INDEXEDTYPES Set the parameter "indexedtypes=..." to configure a selection of field type to index. Only the fields with the specific type will be indexed. Syntax of indexedtypes - see above- See Also:
- Constant Field Values
 
 - 
INDEXEDLANGpublic static final String INDEXEDLANG Set the parameter "indexedlang=..." to configure a selection of field language to index. Only the fields with the specific language will be indexed. Syntax of indexedlang - see above- See Also:
- Constant Field Values
 
 - 
INDEX_TYPE_BACKTRACE_MODEpublic static final String INDEX_TYPE_BACKTRACE_MODE - See Also:
- Constant Field Values
 
 - 
LUCENE_DIR_KEYpublic static final String LUCENE_DIR_KEY Set the key "lucenedir=<path>" as sail parameter to configure the Lucene Directory on the filesystem where to store the lucene index.- See Also:
- Constant Field Values
 
 - 
DEFAULT_LUCENE_DIRpublic static final String DEFAULT_LUCENE_DIR Set the default directory of the Lucene index files. The value is always relational to thedataDirlocation as a parent directory.- See Also:
- Constant Field Values
 
 - 
LUCENE_RAMDIR_KEYpublic static final String LUCENE_RAMDIR_KEY Set the key "useramdir=true" as sail parameter to let the LuceneSail store its Lucene index in RAM. This is not intended for production environments.- See Also:
- Constant Field Values
 
 - 
MAX_DOCUMENTS_KEYpublic static final String MAX_DOCUMENTS_KEY Set the key "maxDocuments=<n>" as sail parameter to limit the maximum number of documents to return from a search query. The default is to return all documents. NB: this may involve extra cost for some SearchIndex implementations as they may have to determine this number.- See Also:
- Constant Field Values
 
 - 
WKT_FIELDSpublic static final String WKT_FIELDS Set this key to configure which fields contain WKT and should be spatially indexed. The value should be a space-separated list of URIs. Default is http://www.opengis.net/ont/geosparql#asWKT.- See Also:
- Constant Field Values
 
 - 
INDEX_CLASS_KEYpublic static final String INDEX_CLASS_KEY Set this key to configure the SearchIndex class implementation. Default is org.eclipse.rdf4j.sail.lucene.LuceneIndex.- See Also:
- Constant Field Values
 
 - 
INDEX_IDpublic static final String INDEX_ID Set this key to configure the filtering of queries, if this parameter is set, the match object should contain the search:indexid parameter, see the syntax above- See Also:
- Constant Field Values
 
 - 
DEFAULT_INDEX_CLASSpublic static final String DEFAULT_INDEX_CLASS - See Also:
- Constant Field Values
 
 - 
ANALYZER_CLASS_KEYpublic static final String ANALYZER_CLASS_KEY Set this key as sail parameter to configure the Lucene analyzer class implementation to use for text analysis.- See Also:
- Constant Field Values
 
 - 
QUERY_ANALYZER_CLASS_KEYpublic static final String QUERY_ANALYZER_CLASS_KEY Set this key as sail parameter to configure the Lucene analyzer class implementation used for query analysis. In most cases this should be set to the same value asANALYZER_CLASS_KEY- See Also:
- Constant Field Values
 
 - 
SIMILARITY_CLASS_KEYpublic static final String SIMILARITY_CLASS_KEY Set this key as sail parameter to configureSimilarityclass implementation to use for text analysis.- See Also:
- Constant Field Values
 
 - 
INCOMPLETE_QUERY_FAIL_KEYpublic static final String INCOMPLETE_QUERY_FAIL_KEY Set this key as sail parameter to influence whether incomplete queries are treated as failure (Malformed queries) or whether they are ignored. Set to either "true" or "false". When omitted in the properties, true is default (failure on incomplete queries). seeisIncompleteQueryFails()- See Also:
- Constant Field Values
 
 - 
EVALUATION_MODE_KEYpublic static final String EVALUATION_MODE_KEY - See Also:
- Constant Field Values
 
 - 
FUZZY_PREFIX_LENGTH_KEYpublic static final String FUZZY_PREFIX_LENGTH_KEY Set this key as sail parameter to influence the fuzzy prefix length.- See Also:
- Constant Field Values
 
 - 
parametersprotected final Properties parameters 
 
- 
 - 
Method Detail- 
setLuceneIndexpublic void setLuceneIndex(SearchIndex luceneIndex) 
 - 
getLuceneIndexpublic SearchIndex getLuceneIndex() 
 - 
getConnectionpublic NotifyingSailConnection getConnection() throws SailException Description copied from interface:SailOpens a connection on the Sail which can be used to query and update data. Depending on how the implementation handles concurrent access, a call to this method might block when there is another open connection on this Sail.- Specified by:
- getConnectionin interface- NotifyingSail
- Specified by:
- getConnectionin interface- Sail
- Overrides:
- getConnectionin class- NotifyingSailWrapper
- Throws:
- SailException- If no transaction could be started, for example because the Sail is not writable.
 
 - 
shutDownpublic void shutDown() throws SailExceptionDescription copied from interface:SailShuts down the Sail, giving it the opportunity to synchronize any stale data. Care should be taken that all initialized Sails are being shut down before an application exits to avoid potential loss of data. Once shut down, a Sail can no longer be used until it is re-initialized.- Specified by:
- shutDownin interface- Sail
- Overrides:
- shutDownin class- SailWrapper
- Throws:
- SailException- If the Sail object encountered an error or unexpected situation internally.
 
 - 
setDataDirpublic void setDataDir(File dataDir) Description copied from interface:SailSets the data directory for the Sail. The Sail can use this directory for storage of data, parameters, etc. This directory must be set before the Sail isinitialized.- Specified by:
- setDataDirin interface- Sail
- Overrides:
- setDataDirin class- SailWrapper
 
 - 
initpublic void init() throws SailExceptionDescription copied from interface:SailInitializes the Sail. Care should be taken that required initialization parameters have been set before this method is called. Please consult the specific Sail implementation for information about the relevant parameters.- Specified by:
- initin interface- Sail
- Overrides:
- initin class- SailWrapper
- Throws:
- SailException- If the Sail could not be initialized.
 
 - 
createSearchIndex@Deprecated protected static SearchIndex createSearchIndex(Properties parameters) throws Exception Deprecated.The method is relocated toSearchIndexUtils.createSearchIndex(java.util.Properties).- Parameters:
- parameters-
- Returns:
- search index
- Throws:
- Exception
 
 - 
getReindexQuerypublic String getReindexQuery() See REINDEX_QUERY_KEY parameter.
 - 
setReindexQuerypublic void setReindexQuery(String query) See REINDEX_QUERY_KEY parameter.
 - 
isIncompleteQueryFailspublic boolean isIncompleteQueryFails() When this is true, incomplete queries will trigger a SailException. You can set this value either usingsetIncompleteQueryFails(boolean)or using the parameter "incompletequeryfail"- Returns:
- Returns the incompleteQueryFails.
 
 - 
setIncompleteQueryFailspublic void setIncompleteQueryFails(boolean incompleteQueryFails) Set this to true, so that incomplete queries will trigger a SailException. Otherwise, incomplete queries will be logged with level WARN. Default is true. You can set this value also using the parameter "incompletequeryfail".- Parameters:
- incompleteQueryFails- true or false
 
 - 
getEvaluationModepublic TupleFunctionEvaluationMode getEvaluationMode() See EVALUATION_MODE_KEY parameter.
 - 
setEvaluationModepublic void setEvaluationMode(TupleFunctionEvaluationMode mode) See EVALUATION_MODE_KEY parameter.
 - 
getIndexBacktraceModepublic TypeBacktraceMode getIndexBacktraceMode() SeeINDEX_TYPE_BACKTRACE_MODEparameter.
 - 
setIndexBacktraceModepublic void setIndexBacktraceMode(TypeBacktraceMode mode) SeeINDEX_TYPE_BACKTRACE_MODEparameter.
 - 
setFuzzyPrefixLengthpublic void setFuzzyPrefixLength(int fuzzyPrefixLength) 
 - 
getTupleFunctionRegistrypublic TupleFunctionRegistry getTupleFunctionRegistry() 
 - 
setTupleFunctionRegistrypublic void setTupleFunctionRegistry(TupleFunctionRegistry registry) 
 - 
getFederatedServiceResolverpublic FederatedServiceResolver getFederatedServiceResolver() Description copied from interface:FederatedServiceResolverClientGets theFederatedServiceResolverused by this client.- Specified by:
- getFederatedServiceResolverin interface- FederatedServiceResolverClient
- Overrides:
- getFederatedServiceResolverin class- SailWrapper
 
 - 
setFederatedServiceResolverpublic void setFederatedServiceResolver(FederatedServiceResolver resolver) Description copied from interface:FederatedServiceResolverClientSets theFederatedServiceResolverto use for this client.- Specified by:
- setFederatedServiceResolverin interface- FederatedServiceResolverClient
- Overrides:
- setFederatedServiceResolverin class- SailWrapper
- Parameters:
- resolver- The resolver to use.
 
 - 
reindexpublic void reindex() throws SailExceptionStarts a reindexation process of the whole sail. Basically, this will delete and add all data again, a long-lasting process.- Throws:
- SailException- If the Sail could not be reindex
 
 - 
registerStatementFilterpublic void registerStatementFilter(IndexableStatementFilter filter) Sets a filter which determines whether a statement should be considered for indexing when performing complete reindexing.
 - 
acceptStatementToIndexprotected boolean acceptStatementToIndex(Statement s) 
 - 
getSearchQueryInterpretersprotected Collection<SearchQueryInterpreter> getSearchQueryInterpreters() 
 
- 
 
-