Class SolrConnector
- java.lang.Object
-
- org.apache.manifoldcf.core.connector.BaseConnector
-
- org.apache.manifoldcf.agents.output.BaseOutputConnector
-
- org.apache.manifoldcf.agents.output.solr.SolrConnector
-
- All Implemented Interfaces:
org.apache.manifoldcf.agents.interfaces.IOutputConnector,org.apache.manifoldcf.agents.interfaces.IPipelineConnector,org.apache.manifoldcf.core.interfaces.IConnector
public class SolrConnector extends org.apache.manifoldcf.agents.output.BaseOutputConnectorThis is the output connector for SOLR. Currently, no frills.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected classSolrConnector.SpecPackerThis class handles Solr connector version string packing/unpacking/interpretation.
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String_rcsidprotected java.lang.StringallowAttributeNameThe allow attribute nameprotected static booleanallowCompressionAllow compression? Currently staticprotected java.lang.StringcollectionNameCollection name (non-empty only if SolrCloud)protected java.lang.StringcontentAttributeNameprotected java.lang.StringcreatedDateAttributeNameprotected java.lang.StringdenyAttributeNameThe deny attribute nameprotected booleandoCommitsWhether or not to commitprotected java.util.Set<java.lang.String>excludedMimeTypesExcluded mime typesprotected java.lang.StringexcludedMimeTypesStringExcluded mime types stringprotected static longEXPIRATION_INTERVALIdle connection expiration intervalprotected longexpirationTimeExpirationprotected java.lang.StringfileNameAttributeNameprotected java.lang.StringidAttributeNameprotected java.util.Set<java.lang.String>includedMimeTypesIncluded mime typesprotected java.lang.StringincludedMimeTypesStringIncluded mime types stringprotected java.lang.StringindexedDateAttributeNamestatic java.lang.StringINGEST_ACTIVITYIngestion activityprotected java.lang.LongmaxDocumentLengthThe maximum document lengthprotected java.lang.StringmimeTypeAttributeNameprotected java.lang.StringmodifiedDateAttributeNameprotected java.lang.StringoriginalSizeAttributeNameprotected HttpPosterposterLocal connectionstatic java.lang.StringREMOVE_ACTIVITYDocument removal activityprotected booleanuseExtractUpdateHandlerUse extractiing update handler?
-
Constructor Summary
Constructors Constructor Description SolrConnector()Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description intaddOrReplaceDocumentWithException(java.lang.String documentURI, org.apache.manifoldcf.core.interfaces.VersionContext pipelineDescription, org.apache.manifoldcf.agents.interfaces.RepositoryDocument document, java.lang.String authorityNameString, org.apache.manifoldcf.agents.interfaces.IOutputAddActivity activities)Add (or replace) a document in the output data store using the connector.java.lang.Stringcheck()Test the connection.booleancheckLengthIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, long length, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)Pre-determine whether a document's length is indexable by this connector.booleancheckMimeTypeIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.lang.String mimeType, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities)Detect if a mime type is indexable or not.voidconnect(org.apache.manifoldcf.core.interfaces.ConfigParams configParameters)Connect.voiddisconnect()Close the connection.java.lang.String[]getActivitiesList()Return the list of activities that this connector supports (i.e.org.apache.manifoldcf.core.interfaces.VersionContextgetPipelineDescription(org.apache.manifoldcf.core.interfaces.Specification spec)Get an output version string, given an output specification.protected voidgetSession()Set up a sessionbooleanisConnected()This method is called to assess whether to count this connector instance should actually be counted as being connected.voidnoteJobComplete(org.apache.manifoldcf.agents.interfaces.IOutputNotifyActivity activities)Notify the connector of a completed job.voidoutputConfigurationBody(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters, java.lang.String tabName)Output the configuration body section.voidoutputConfigurationHeader(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters, java.util.List<java.lang.String> tabsArray)Output the configuration header section.protected static java.util.Set<java.lang.String>parseMimeTypes(java.lang.String mimeTypes)Parse a mime type field into individual mime types in a hashvoidpoll()This method is periodically called for all connectors that are connected but not in active use.java.lang.StringprocessConfigurationPost(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters)Process a configuration post.voidremoveDocument(java.lang.String documentURI, java.lang.String outputDescription, org.apache.manifoldcf.agents.interfaces.IOutputRemoveActivity activities)Remove a document using the connector.voidviewConfiguration(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters)View configuration.-
Methods inherited from class org.apache.manifoldcf.agents.output.BaseOutputConnector
checkDateIndexable, checkDocumentIndexable, checkURLIndexable, getFormCheckJavascriptMethodName, getFormPresaveCheckJavascriptMethodName, noteAllRecordsRemoved, outputSpecificationBody, outputSpecificationHeader, processSpecificationPost, requestInfo, viewSpecification
-
Methods inherited from class org.apache.manifoldcf.core.connector.BaseConnector
clearThreadContext, deinstall, getConfiguration, install, outputConfigurationBody, outputConfigurationHeader, outputConfigurationHeader, pack, packFixedList, packList, packList, processConfigurationPost, setThreadContext, unpack, unpackFixedList, unpackList, viewConfiguration
-
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
INGEST_ACTIVITY
public static final java.lang.String INGEST_ACTIVITY
Ingestion activity- See Also:
- Constant Field Values
-
REMOVE_ACTIVITY
public static final java.lang.String REMOVE_ACTIVITY
Document removal activity- See Also:
- Constant Field Values
-
poster
protected HttpPoster poster
Local connection
-
expirationTime
protected long expirationTime
Expiration
-
allowAttributeName
protected java.lang.String allowAttributeName
The allow attribute name
-
denyAttributeName
protected java.lang.String denyAttributeName
The deny attribute name
-
maxDocumentLength
protected java.lang.Long maxDocumentLength
The maximum document length
-
includedMimeTypesString
protected java.lang.String includedMimeTypesString
Included mime types string
-
includedMimeTypes
protected java.util.Set<java.lang.String> includedMimeTypes
Included mime types
-
excludedMimeTypesString
protected java.lang.String excludedMimeTypesString
Excluded mime types string
-
excludedMimeTypes
protected java.util.Set<java.lang.String> excludedMimeTypes
Excluded mime types
-
idAttributeName
protected java.lang.String idAttributeName
-
originalSizeAttributeName
protected java.lang.String originalSizeAttributeName
-
modifiedDateAttributeName
protected java.lang.String modifiedDateAttributeName
-
createdDateAttributeName
protected java.lang.String createdDateAttributeName
-
indexedDateAttributeName
protected java.lang.String indexedDateAttributeName
-
fileNameAttributeName
protected java.lang.String fileNameAttributeName
-
mimeTypeAttributeName
protected java.lang.String mimeTypeAttributeName
-
contentAttributeName
protected java.lang.String contentAttributeName
-
useExtractUpdateHandler
protected boolean useExtractUpdateHandler
Use extractiing update handler?
-
allowCompression
protected static final boolean allowCompression
Allow compression? Currently static- See Also:
- Constant Field Values
-
doCommits
protected boolean doCommits
Whether or not to commit
-
collectionName
protected java.lang.String collectionName
Collection name (non-empty only if SolrCloud)
-
EXPIRATION_INTERVAL
protected static final long EXPIRATION_INTERVAL
Idle connection expiration interval- See Also:
- Constant Field Values
-
-
Method Detail
-
getActivitiesList
public java.lang.String[] getActivitiesList()
Return the list of activities that this connector supports (i.e. writes into the log).- Specified by:
getActivitiesListin interfaceorg.apache.manifoldcf.agents.interfaces.IOutputConnector- Overrides:
getActivitiesListin classorg.apache.manifoldcf.agents.output.BaseOutputConnector- Returns:
- the list.
-
connect
public void connect(org.apache.manifoldcf.core.interfaces.ConfigParams configParameters)
Connect.- Specified by:
connectin interfaceorg.apache.manifoldcf.core.interfaces.IConnector- Overrides:
connectin classorg.apache.manifoldcf.core.connector.BaseConnector- Parameters:
configParameters- is the set of configuration parameters, which in this case describe the target appliance, basic auth configuration, etc. (This formerly came out of the ini file.)
-
poll
public void poll() throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionThis method is periodically called for all connectors that are connected but not in active use.- Specified by:
pollin interfaceorg.apache.manifoldcf.core.interfaces.IConnector- Overrides:
pollin classorg.apache.manifoldcf.core.connector.BaseConnector- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
isConnected
public boolean isConnected()
This method is called to assess whether to count this connector instance should actually be counted as being connected.- Specified by:
isConnectedin interfaceorg.apache.manifoldcf.core.interfaces.IConnector- Overrides:
isConnectedin classorg.apache.manifoldcf.core.connector.BaseConnector- Returns:
- true if the connector instance is actually connected.
-
disconnect
public void disconnect() throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionClose the connection. Call this before discarding the connection.- Specified by:
disconnectin interfaceorg.apache.manifoldcf.core.interfaces.IConnector- Overrides:
disconnectin classorg.apache.manifoldcf.core.connector.BaseConnector- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
getSession
protected void getSession() throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionSet up a session- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
parseMimeTypes
protected static java.util.Set<java.lang.String> parseMimeTypes(java.lang.String mimeTypes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionParse a mime type field into individual mime types in a hash- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
check
public java.lang.String check() throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionTest the connection. Returns a string describing the connection integrity.- Specified by:
checkin interfaceorg.apache.manifoldcf.core.interfaces.IConnector- Overrides:
checkin classorg.apache.manifoldcf.core.connector.BaseConnector- Returns:
- the connection's status as a displayable string.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
getPipelineDescription
public org.apache.manifoldcf.core.interfaces.VersionContext getPipelineDescription(org.apache.manifoldcf.core.interfaces.Specification spec) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionGet an output version string, given an output specification. The output version string is used to uniquely describe the pertinent details of the output specification and the configuration, to allow the Connector Framework to determine whether a document will need to be output again. Note that the contents of the document cannot be considered by this method, and that a different version string (defined in IRepositoryConnector) is used to describe the version of the actual document. This method presumes that the connector object has been configured, and it is thus able to communicate with the output data store should that be necessary.- Specified by:
getPipelineDescriptionin interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
getPipelineDescriptionin classorg.apache.manifoldcf.agents.output.BaseOutputConnector- Parameters:
spec- is the current output specification for the job that is doing the crawling.- Returns:
- a string, of unlimited length, which uniquely describes output configuration and specification in such a way that if two such strings are equal, the document will not need to be sent again to the output data store.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
checkMimeTypeIndexable
public boolean checkMimeTypeIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, java.lang.String mimeType, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionDetect if a mime type is indexable or not. This method is used by participating repository connectors to pre-filter the number of unusable documents that will be passed to this output connector.- Specified by:
checkMimeTypeIndexablein interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
checkMimeTypeIndexablein classorg.apache.manifoldcf.agents.output.BaseOutputConnector- Parameters:
outputDescription- is the document's output version.mimeType- is the mime type of the document.- Returns:
- true if the mime type is indexable by this connector.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
checkLengthIndexable
public boolean checkLengthIndexable(org.apache.manifoldcf.core.interfaces.VersionContext outputDescription, long length, org.apache.manifoldcf.agents.interfaces.IOutputCheckActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionPre-determine whether a document's length is indexable by this connector. This method is used by participating repository connectors to help filter out documents that are too long to be indexable.- Specified by:
checkLengthIndexablein interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
checkLengthIndexablein classorg.apache.manifoldcf.agents.output.BaseOutputConnector- Parameters:
outputDescription- is the document's output version.length- is the length of the document.- Returns:
- true if the file is indexable.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
addOrReplaceDocumentWithException
public int addOrReplaceDocumentWithException(java.lang.String documentURI, org.apache.manifoldcf.core.interfaces.VersionContext pipelineDescription, org.apache.manifoldcf.agents.interfaces.RepositoryDocument document, java.lang.String authorityNameString, org.apache.manifoldcf.agents.interfaces.IOutputAddActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption, java.io.IOExceptionAdd (or replace) a document in the output data store using the connector. This method presumes that the connector object has been configured, and it is thus able to communicate with the output data store should that be necessary.- Specified by:
addOrReplaceDocumentWithExceptionin interfaceorg.apache.manifoldcf.agents.interfaces.IPipelineConnector- Overrides:
addOrReplaceDocumentWithExceptionin classorg.apache.manifoldcf.agents.output.BaseOutputConnector- Parameters:
documentURI- is the URI of the document. The URI is presumed to be the unique identifier which the output data store will use to process and serve the document. This URI is constructed by the repository connector which fetches the document, and is thus universal across all output connectors.pipelineDescription- includes the description string that was constructed for this document by the getOutputDescription() method.document- is the document data to be processed (handed to the output data store).authorityNameString- is the name of the authority responsible for authorizing any access tokens passed in with the repository document. May be null.activities- is the handle to an object that the implementer of a pipeline connector may use to perform operations, such as logging processing activity, or sending a modified document to the next stage in the pipeline.- Returns:
- the document status (accepted or permanently rejected).
- Throws:
java.io.IOException- only if there's a stream error reading the document data.org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
removeDocument
public void removeDocument(java.lang.String documentURI, java.lang.String outputDescription, org.apache.manifoldcf.agents.interfaces.IOutputRemoveActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionRemove a document using the connector. Note that the last outputDescription is included, since it may be necessary for the connector to use such information to know how to properly remove the document.- Specified by:
removeDocumentin interfaceorg.apache.manifoldcf.agents.interfaces.IOutputConnector- Overrides:
removeDocumentin classorg.apache.manifoldcf.agents.output.BaseOutputConnector- Parameters:
documentURI- is the URI of the document. The URI is presumed to be the unique identifier which the output data store will use to process and serve the document. This URI is constructed by the repository connector which fetches the document, and is thus universal across all output connectors.outputDescription- is the last description string that was constructed for this document by the getOutputDescription() method above.activities- is the handle to an object that the implementer of an output connector may use to perform operations, such as logging processing activity.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
noteJobComplete
public void noteJobComplete(org.apache.manifoldcf.agents.interfaces.IOutputNotifyActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionNotify the connector of a completed job. This is meant to allow the connector to flush any internal data structures it has been keeping around, or to tell the output repository that this is a good time to synchronize things. It is called whenever a job is either completed or aborted.- Specified by:
noteJobCompletein interfaceorg.apache.manifoldcf.agents.interfaces.IOutputConnector- Overrides:
noteJobCompletein classorg.apache.manifoldcf.agents.output.BaseOutputConnector- Parameters:
activities- is the handle to an object that the implementer of an output connector may use to perform operations, such as logging processing activity.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
outputConfigurationHeader
public void outputConfigurationHeader(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters, java.util.List<java.lang.String> tabsArray) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, java.io.IOExceptionOutput the configuration header section. This method is called in the head section of the connector's configuration page. Its purpose is to add the required tabs to the list, and to output any javascript methods that might be needed by the configuration editing HTML.- Specified by:
outputConfigurationHeaderin interfaceorg.apache.manifoldcf.core.interfaces.IConnector- Overrides:
outputConfigurationHeaderin classorg.apache.manifoldcf.core.connector.BaseConnector- Parameters:
threadContext- is the local thread context.out- is the output to which any HTML should be sent.parameters- are the configuration parameters, as they currently exist, for this connection being configured.tabsArray- is an array of tab names. Add to this array any tab names that are specific to the connector.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOException
-
outputConfigurationBody
public void outputConfigurationBody(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters, java.lang.String tabName) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, java.io.IOExceptionOutput the configuration body section. This method is called in the body section of the connector's configuration page. Its purpose is to present the required form elements for editing. The coder can presume that the HTML that is output from this configuration will be within appropriate <html>, <body>, and <form> tags. The name of the form is "editconnection".- Specified by:
outputConfigurationBodyin interfaceorg.apache.manifoldcf.core.interfaces.IConnector- Overrides:
outputConfigurationBodyin classorg.apache.manifoldcf.core.connector.BaseConnector- Parameters:
threadContext- is the local thread context.out- is the output to which any HTML should be sent.parameters- are the configuration parameters, as they currently exist, for this connection being configured.tabName- is the current tab name.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOException
-
processConfigurationPost
public java.lang.String processConfigurationPost(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IPostParameters variableContext, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionProcess a configuration post. This method is called at the start of the connector's configuration page, whenever there is a possibility that form data for a connection has been posted. Its purpose is to gather form information and modify the configuration parameters accordingly. The name of the posted form is "editconnection".- Specified by:
processConfigurationPostin interfaceorg.apache.manifoldcf.core.interfaces.IConnector- Overrides:
processConfigurationPostin classorg.apache.manifoldcf.core.connector.BaseConnector- Parameters:
threadContext- is the local thread context.variableContext- is the set of variables available from the post, including binary file post information.parameters- are the configuration parameters, as they currently exist, for this connection being configured.- Returns:
- null if all is well, or a string error message if there is an error that should prevent saving of the connection (and cause a redirection to an error page).
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
viewConfiguration
public void viewConfiguration(org.apache.manifoldcf.core.interfaces.IThreadContext threadContext, org.apache.manifoldcf.core.interfaces.IHTTPOutput out, java.util.Locale locale, org.apache.manifoldcf.core.interfaces.ConfigParams parameters) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, java.io.IOExceptionView configuration. This method is called in the body section of the connector's view configuration page. Its purpose is to present the connection information to the user. The coder can presume that the HTML that is output from this configuration will be within appropriate <html> and <body>tags.- Specified by:
viewConfigurationin interfaceorg.apache.manifoldcf.core.interfaces.IConnector- Overrides:
viewConfigurationin classorg.apache.manifoldcf.core.connector.BaseConnector- Parameters:
threadContext- is the local thread context.out- is the output to which any HTML should be sent.parameters- are the configuration parameters, as they currently exist, for this connection being configured.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionjava.io.IOException
-
-