Class RSSConnector.Filter
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.rss.RSSConnector.Filter
-
- Enclosing class:
- RSSConnector
protected static class RSSConnector.Filter extends java.lang.ObjectClass that handles parsing and interpretation of the document specification. Note that I believe it to be faster to do this once, gathering all the data, than to scan the document specification multiple times. Therefore, this class contains the *entire* interpreted set of data from a document specification.
-
-
Field Summary
Fields Modifier and Type Field Description protected java.util.Set<java.lang.String>aclsprotected java.lang.IntegerbadFeedRescanIntervalprotected RSSConnector.CanonicalizationPoliciescanonicalizationPoliciesprotected intchromedContentModeprotected intdechromedContentModeprotected java.lang.IntegerdefaultRescanIntervalprotected java.util.List<java.util.regex.Pattern>excludePatternsThe arraylist of exclude patternsprotected intfeedTimeoutValueprotected RSSConnector.MappingRulesmappingsprotected java.lang.IntegerminimumRescanIntervalprotected java.util.Set<java.lang.String>seeds
-
Constructor Summary
Constructors Constructor Description Filter(org.apache.manifoldcf.core.interfaces.Specification spec, boolean warnOnBadSeed)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String[]getAcls()Get the aclsjava.lang.LonggetBadFeedRescanTime(long currentTime)Get the next time a "bad feed" should be rescannedRSSConnector.CanonicalizationPoliciesgetCanonicalizationPolicies()Get canonicalization policiesintgetChromedContentMode()Get the chromed content modeintgetDechromedContentMode()Get the dechromed content modejava.lang.LonggetDefaultRescanTime(long currentTime)Get the next time (by default) a feed should be scannedintgetFeedTimeoutValue()Get the feed timeout valuejava.lang.LonggetMinimumRescanTime(long currentTime)Get the minimum next time a feed should be scannedjava.util.Iterator<java.lang.String>getSeeds()Iterate over all canonicalized seedsbooleanisLegalURL(java.lang.String url)Check for legality of a url.booleanisSeed(java.lang.String canonicalUrl)Check if document is a seedjava.lang.StringmapDocumentURL(java.lang.String url)Scan patterns and return the one that matches first.
-
-
-
Field Detail
-
mappings
protected final RSSConnector.MappingRules mappings
-
seeds
protected final java.util.Set<java.lang.String> seeds
-
defaultRescanInterval
protected java.lang.Integer defaultRescanInterval
-
minimumRescanInterval
protected java.lang.Integer minimumRescanInterval
-
badFeedRescanInterval
protected java.lang.Integer badFeedRescanInterval
-
dechromedContentMode
protected int dechromedContentMode
-
chromedContentMode
protected int chromedContentMode
-
feedTimeoutValue
protected int feedTimeoutValue
-
acls
protected final java.util.Set<java.lang.String> acls
-
canonicalizationPolicies
protected final RSSConnector.CanonicalizationPolicies canonicalizationPolicies
-
excludePatterns
protected final java.util.List<java.util.regex.Pattern> excludePatterns
The arraylist of exclude patterns
-
-
Method Detail
-
isSeed
public boolean isSeed(java.lang.String canonicalUrl)
Check if document is a seed
-
getSeeds
public java.util.Iterator<java.lang.String> getSeeds()
Iterate over all canonicalized seeds
-
getAcls
public java.lang.String[] getAcls()
Get the acls
-
getFeedTimeoutValue
public int getFeedTimeoutValue()
Get the feed timeout value
-
getDechromedContentMode
public int getDechromedContentMode()
Get the dechromed content mode
-
getChromedContentMode
public int getChromedContentMode()
Get the chromed content mode
-
getDefaultRescanTime
public java.lang.Long getDefaultRescanTime(long currentTime)
Get the next time (by default) a feed should be scanned
-
getMinimumRescanTime
public java.lang.Long getMinimumRescanTime(long currentTime)
Get the minimum next time a feed should be scanned
-
getBadFeedRescanTime
public java.lang.Long getBadFeedRescanTime(long currentTime)
Get the next time a "bad feed" should be rescanned
-
isLegalURL
public boolean isLegalURL(java.lang.String url)
Check for legality of a url.- Returns:
- true if the passed-in url is either a seed, or a legal url, according to this filter.
-
mapDocumentURL
public java.lang.String mapDocumentURL(java.lang.String url) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionScan patterns and return the one that matches first.- Returns:
- null if the url doesn't match or should not be ingested, or the new string if it does.
- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
getCanonicalizationPolicies
public RSSConnector.CanonicalizationPolicies getCanonicalizationPolicies()
Get canonicalization policies
-
-