Class ThrottledFetcher.ThrottledConnection
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher.ThrottledConnection
-
- All Implemented Interfaces:
IThrottledConnection
- Enclosing class:
- ThrottledFetcher
protected static class ThrottledFetcher.ThrottledConnection extends java.lang.Object implements IThrottledConnection
Throttled connections. Each instance of a connection describes the bins to which it belongs, along with the actual open connection itself, and the last time the connection was used.
-
-
Field Summary
Fields Modifier and Type Field Description protected AbortCheckerabortCheckAbort checkerprotected PageCredentialsauthenticationAuthenticationprotected intconnectionTimeoutMillisecondsConnection timeout millisecondsprotected org.apache.http.conn.HttpClientConnectionManagerconnManagerThe http connection manager.protected longexpireTimeThis is when the connection will expire.protected longfetchCounterThe current bytes in the current fetchprotected org.apache.http.client.methods.HttpRequestBasefetchMethodThe method objectprotected org.apache.manifoldcf.connectorcommon.interfaces.IFetchThrottlerfetchThrottlerFetch throttlerprotected java.lang.StringfetchTypeThe kind of fetch we are doingprotected org.apache.http.client.HttpClienthttpClientThe http client object.protected javax.net.ssl.SSLSocketFactoryhttpsSocketFactoryHttps protocolprotected LoginCookieslastFetchCookiesThe cookies from the last fetchprotected ThrottledFetcher.ExecuteMethodThreadmethodThreadThe thread that is actually doing the workprotected ThrottledFetcher.ConnectionPoolmyPoolConnection poolprotected java.lang.StringmyUrlThe current URL being fetchedprotected intportPortprotected java.lang.StringprotocolProtocolprotected java.lang.StringproxyAuthDomainProxy auth domainprotected java.lang.StringproxyAuthPasswordProxy auth passwordprotected java.lang.StringproxyAuthUsernameProxy auth user nameprotected java.lang.StringproxyHostProxy hostprotected intproxyPortProxy portprotected java.lang.StringserverServerprotected intsocketTimeoutMillisecondsSocket timeout millisecondsprotected longstartFetchTimeThe start of the current fetchprotected intstatusCodeThe status code fetched, if anyprotected booleanthreadStartedSet if thread has been startedprotected java.lang.ThrowablethrowableThe error trace, if any-
Fields inherited from interface org.apache.manifoldcf.crawler.connectors.webcrawler.IThrottledConnection
_rcsid, FETCH_BAD_URI, FETCH_CIRCULAR_REDIRECT, FETCH_INTERRUPTED, FETCH_IO_ERROR, FETCH_NOT_TRIED, FETCH_SEQUENCE_ERROR, FETCH_UNKNOWN_ERROR
-
-
Constructor Summary
Constructors Constructor Description ThrottledConnection(ThrottledFetcher.ConnectionPool myPool, org.apache.manifoldcf.connectorcommon.interfaces.IFetchThrottler fetchThrottler, java.lang.String protocol, java.lang.String server, int port, PageCredentials authentication, javax.net.ssl.SSLSocketFactory httpsSocketFactory, java.lang.String proxyHost, int proxyPort, java.lang.String proxyAuthDomain, java.lang.String proxyAuthUsername, java.lang.String proxyAuthPassword, int socketTimeoutMilliseconds, int connectionTimeoutMilliseconds)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidbeginFetch(java.lang.String fetchType)Begin the fetch process.voidclose()Close the connection.voiddestroy()Destroy the connection forevervoiddoneFetch(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities)Done with the fetch.voidexecuteFetch(java.lang.String urlPath, java.lang.String userAgent, java.lang.String from, boolean redirectOK, java.lang.String host, FormData formData, LoginCookies loginCookies)Execute the fetch and get the return code.LoginCookiesgetLastFetchCookies()Get the last fetch cookies.java.lang.StringgetLimitedResponseBody(int maxSize, java.lang.String encoding)Get limited response as a string.java.io.InputStreamgetResponseBodyStream()Get the response input stream.intgetResponseCode()Get the http response code.java.lang.StringgetResponseHeader(java.lang.String headerName)Get a specified response header, if it exists.java.util.Map<java.lang.String,java.util.List<java.lang.String>>getResponseHeaders()Get response headersprotected voidhandleHTTPException(org.apache.http.HttpException e, java.lang.String activity)protected voidhandleIOException(java.io.IOException e, java.lang.String activity)booleanhasExpired(long currentTime)Check whether the connection has expired.voidlogFetchCount(int count)Log the fetch of a number of bytes, from within a stream.voidnoteInterrupted(java.lang.Throwable e)Note that the connection fetch was interrupted by something.voidsetAbortChecker(AbortChecker abortCheck)Set the abort checker.
-
-
-
Field Detail
-
myPool
protected final ThrottledFetcher.ConnectionPool myPool
Connection pool
-
fetchThrottler
protected final org.apache.manifoldcf.connectorcommon.interfaces.IFetchThrottler fetchThrottler
Fetch throttler
-
protocol
protected final java.lang.String protocol
Protocol
-
server
protected final java.lang.String server
Server
-
port
protected final int port
Port
-
authentication
protected final PageCredentials authentication
Authentication
-
expireTime
protected long expireTime
This is when the connection will expire. Only valid if connection is in the pool.
-
connManager
protected org.apache.http.conn.HttpClientConnectionManager connManager
The http connection manager. The pool is of size 1.
-
httpClient
protected org.apache.http.client.HttpClient httpClient
The http client object.
-
fetchMethod
protected org.apache.http.client.methods.HttpRequestBase fetchMethod
The method object
-
throwable
protected java.lang.Throwable throwable
The error trace, if any
-
myUrl
protected java.lang.String myUrl
The current URL being fetched
-
statusCode
protected int statusCode
The status code fetched, if any
-
fetchType
protected java.lang.String fetchType
The kind of fetch we are doing
-
fetchCounter
protected long fetchCounter
The current bytes in the current fetch
-
startFetchTime
protected long startFetchTime
The start of the current fetch
-
lastFetchCookies
protected LoginCookies lastFetchCookies
The cookies from the last fetch
-
proxyHost
protected final java.lang.String proxyHost
Proxy host
-
proxyPort
protected final int proxyPort
Proxy port
-
proxyAuthDomain
protected final java.lang.String proxyAuthDomain
Proxy auth domain
-
proxyAuthUsername
protected final java.lang.String proxyAuthUsername
Proxy auth user name
-
proxyAuthPassword
protected final java.lang.String proxyAuthPassword
Proxy auth password
-
httpsSocketFactory
protected final javax.net.ssl.SSLSocketFactory httpsSocketFactory
Https protocol
-
socketTimeoutMilliseconds
protected final int socketTimeoutMilliseconds
Socket timeout milliseconds
-
connectionTimeoutMilliseconds
protected final int connectionTimeoutMilliseconds
Connection timeout milliseconds
-
methodThread
protected ThrottledFetcher.ExecuteMethodThread methodThread
The thread that is actually doing the work
-
threadStarted
protected boolean threadStarted
Set if thread has been started
-
abortCheck
protected AbortChecker abortCheck
Abort checker
-
-
Constructor Detail
-
ThrottledConnection
public ThrottledConnection(ThrottledFetcher.ConnectionPool myPool, org.apache.manifoldcf.connectorcommon.interfaces.IFetchThrottler fetchThrottler, java.lang.String protocol, java.lang.String server, int port, PageCredentials authentication, javax.net.ssl.SSLSocketFactory httpsSocketFactory, java.lang.String proxyHost, int proxyPort, java.lang.String proxyAuthDomain, java.lang.String proxyAuthUsername, java.lang.String proxyAuthPassword, int socketTimeoutMilliseconds, int connectionTimeoutMilliseconds)
Constructor. Create a connection with a specific server and port, and register it as active against all bins.
-
-
Method Detail
-
setAbortChecker
public void setAbortChecker(AbortChecker abortCheck)
Set the abort checker. This must be done before the connection is actually used.- Specified by:
setAbortCheckerin interfaceIThrottledConnection
-
hasExpired
public boolean hasExpired(long currentTime)
Check whether the connection has expired.- Specified by:
hasExpiredin interfaceIThrottledConnection- Parameters:
currentTime- is the current time to use to judge if a connection has expired.- Returns:
- true if the connection has expired, and should be closed.
-
logFetchCount
public void logFetchCount(int count)
Log the fetch of a number of bytes, from within a stream.
-
destroy
public void destroy()
Destroy the connection forever- Specified by:
destroyin interfaceIThrottledConnection
-
beginFetch
public void beginFetch(java.lang.String fetchType) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionBegin the fetch process.- Specified by:
beginFetchin interfaceIThrottledConnection- Parameters:
fetchType- is a short descriptive string describing the kind of fetch being requested. This is used solely for logging purposes.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
executeFetch
public void executeFetch(java.lang.String urlPath, java.lang.String userAgent, java.lang.String from, boolean redirectOK, java.lang.String host, FormData formData, LoginCookies loginCookies) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionExecute the fetch and get the return code. This method uses the standard logging mechanism to keep track of the fetch attempt. It also signals the following conditions: ServiceInterruption (if a dynamic error occurs), or ManifoldCFException if a fatal error occurs, or nothing if a standard protocol error occurs. Note that, for proxies etc, the idea is for this fetch request to handle whatever redirections are needed to support proxies.- Specified by:
executeFetchin interfaceIThrottledConnection- Parameters:
urlPath- is the path part of the url, e.g. "/robots.txt"userAgent- is the value of the userAgent header to use.from- is the value of the from header to use.redirectOK- should be set to true if you want redirects to be automatically followed.host- is the value to use as the "Host" header, or null to use the default.formData- describes additional form arguments and how to fetch the page.loginCookies- describes the cookies that should be in effect for this page fetch.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
getResponseCode
public int getResponseCode() throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionGet the http response code.- Specified by:
getResponseCodein interfaceIThrottledConnection- Returns:
- the response code. This is either an HTTP response code, or one of the codes above.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
getLastFetchCookies
public LoginCookies getLastFetchCookies() throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruption
Get the last fetch cookies.- Specified by:
getLastFetchCookiesin interfaceIThrottledConnection- Returns:
- the cookies now in effect from the last fetch.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
getResponseHeaders
public java.util.Map<java.lang.String,java.util.List<java.lang.String>> getResponseHeaders() throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionGet response headers- Specified by:
getResponseHeadersin interfaceIThrottledConnection- Returns:
- a map keyed by header name containing a list of values.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
getResponseHeader
public java.lang.String getResponseHeader(java.lang.String headerName) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionGet a specified response header, if it exists.- Specified by:
getResponseHeaderin interfaceIThrottledConnection- Parameters:
headerName- is the name of the header.- Returns:
- the header value, or null if it doesn't exist.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
getResponseBodyStream
public java.io.InputStream getResponseBodyStream() throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionGet the response input stream. It is the responsibility of the caller to close this stream when done.- Specified by:
getResponseBodyStreamin interfaceIThrottledConnection- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
getLimitedResponseBody
public java.lang.String getLimitedResponseBody(int maxSize, java.lang.String encoding) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException, org.apache.manifoldcf.agents.interfaces.ServiceInterruptionGet limited response as a string.- Specified by:
getLimitedResponseBodyin interfaceIThrottledConnection- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionorg.apache.manifoldcf.agents.interfaces.ServiceInterruption
-
noteInterrupted
public void noteInterrupted(java.lang.Throwable e)
Note that the connection fetch was interrupted by something.- Specified by:
noteInterruptedin interfaceIThrottledConnection
-
doneFetch
public void doneFetch(org.apache.manifoldcf.crawler.interfaces.IProcessActivity activities) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionDone with the fetch. Call this when the fetch has been completed. A log entry will be generated describing what was done.- Specified by:
doneFetchin interfaceIThrottledConnection- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
close
public void close()
Close the connection. Call this to return the connection to its pool.- Specified by:
closein interfaceIThrottledConnection
-
handleHTTPException
protected void handleHTTPException(org.apache.http.HttpException e, java.lang.String activity) throws org.apache.manifoldcf.agents.interfaces.ServiceInterruption, org.apache.manifoldcf.core.interfaces.ManifoldCFException- Throws:
org.apache.manifoldcf.agents.interfaces.ServiceInterruptionorg.apache.manifoldcf.core.interfaces.ManifoldCFException
-
handleIOException
protected void handleIOException(java.io.IOException e, java.lang.String activity) throws org.apache.manifoldcf.agents.interfaces.ServiceInterruption, org.apache.manifoldcf.core.interfaces.ManifoldCFException- Throws:
org.apache.manifoldcf.agents.interfaces.ServiceInterruptionorg.apache.manifoldcf.core.interfaces.ManifoldCFException
-
-