A B C D E F G H I K L M N O P Q R S T U V W X Y _ 
All Classes All Packages

A

AbstractSiteMap - Class in crawlercommons.sitemaps
SiteMap or SiteMapIndex
AbstractSiteMap() - Constructor for class crawlercommons.sitemaps.AbstractSiteMap
 
AbstractSiteMap.SitemapType - Enum in crawlercommons.sitemaps
Various Sitemap types
acceptedNamespaces - Variable in class crawlercommons.sitemaps.SiteMapParser
Set of namespaces (if SiteMapParser.strictNamespace) accepted by the parser.
addAcceptedNamespace(String) - Method in class crawlercommons.sitemaps.SiteMapParser
Add namespace URI to set of accepted namespaces.
addAcceptedNamespace(String[]) - Method in class crawlercommons.sitemaps.SiteMapParser
Add namespace URIs to set of accepted namespaces.
addAttributesForExtension(Extension, ExtensionMetadata[]) - Method in class crawlercommons.sitemaps.SiteMapURL
Add attributes of a specific sitemap extension
addChild(char, V) - Method in class crawlercommons.domains.SuffixTrie.Node
 
addPrice(VideoAttributes.VideoPrice) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
addRule(String, boolean) - Method in class crawlercommons.robots.SimpleRobotRules
 
addSitemap(AbstractSiteMap) - Method in class crawlercommons.sitemaps.SiteMapIndex
Add this Sitemap to the list of Sitemaps,
addSitemap(String) - Method in class crawlercommons.robots.BaseRobotRules
Add sitemap URL to rules if not a duplicate
addSiteMapUrl(SiteMapURL) - Method in class crawlercommons.sitemaps.SiteMap
 
addTag(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
ALLOW_ALL - crawlercommons.robots.SimpleRobotRules.RobotRulesMode
 
ALLOW_NONE - crawlercommons.robots.SimpleRobotRules.RobotRulesMode
 
ALLOW_SOME - crawlercommons.robots.SimpleRobotRules.RobotRulesMode
 
ALLOWED_COUNTRIES - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
ALLOWED_PLATFORMS - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
ALWAYS - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
 
appendCharacterBuffer(char[], int, int) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
appendCharacterBuffer(String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
asMap() - Method in class crawlercommons.sitemaps.extension.ExtensionMetadata
 
asMap() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
asMap() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
 
asMap() - Method in class crawlercommons.sitemaps.extension.MobileAttributes
 
asMap() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
asMap() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
ATOM - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
 
ATOM_0_3 - Static variable in class crawlercommons.sitemaps.Namespace
 
ATOM_1_0 - Static variable in class crawlercommons.sitemaps.Namespace
 
attributes - Variable in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
 

B

BaseRobotRules - Class in crawlercommons.robots
Result from parsing a single robots.txt file - which means we get a set of rules, and a crawl-delay.
BaseRobotRules() - Constructor for class crawlercommons.robots.BaseRobotRules
 
BaseRobotsParser - Class in crawlercommons.robots
 
BaseRobotsParser() - Constructor for class crawlercommons.robots.BaseRobotsParser
 
BasicURLNormalizer - Class in crawlercommons.filters.basic
Code borrowed from Apache Nutch.
BasicURLNormalizer() - Constructor for class crawlercommons.filters.basic.BasicURLNormalizer
 
BasicURLNormalizer(BasicURLNormalizer.Builder) - Constructor for class crawlercommons.filters.basic.BasicURLNormalizer
 
BasicURLNormalizer.Builder - Class in crawlercommons.filters.basic
A builder class for the BasicURLNormalizer.
BasicURLNormalizer.IdnNormalization - Enum in crawlercommons.filters.basic
 
beforeRead(int) - Method in class crawlercommons.sitemaps.SkipLeadingWhiteSpaceInputStream
 
Blog - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
 
build() - Method in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
Constructs the custom URL normalizer instance.

C

CAPTION - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
 
CATEGORY - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
 
characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
 
characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
 
clearRules() - Method in class crawlercommons.robots.SimpleRobotRules
 
commaSeparated - Static variable in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
 
COMMENT - Static variable in class crawlercommons.domains.EffectiveTldFinder
 
compareTo(SimpleRobotRules.RobotRule) - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
 
contains(String) - Method in class crawlercommons.domains.SuffixTrie
Checks whether trie contains a suffix string.
CONTENT_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
convertToDate(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
convertToZonedDateTime(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
Convert the given date (given in an acceptable DateFormat), return null if the date is not in the correct format.
crawlercommons - package crawlercommons
 
CrawlerCommons - Class in crawlercommons
 
CrawlerCommons() - Constructor for class crawlercommons.CrawlerCommons
 
crawlercommons.domains - package crawlercommons.domains
Classes contained within the domains package relate to the definition of Top Level Domain's, various domain registrars and the effective handling of such domains.
crawlercommons.filters - package crawlercommons.filters
The filters package contains code and resources for URL filtering.
crawlercommons.filters.basic - package crawlercommons.filters.basic
 
crawlercommons.mimetypes - package crawlercommons.mimetypes
 
crawlercommons.robots - package crawlercommons.robots
The robots package contains all of the robots.txt rule inference, parsing and utilities contained within Crawler Commons.
crawlercommons.sitemaps - package crawlercommons.sitemaps
Sitemaps package provides all classes relevant to focused sitemap parsing, url definition and processing.
crawlercommons.sitemaps.extension - package crawlercommons.sitemaps.extension
 
crawlercommons.sitemaps.sax - package crawlercommons.sitemaps.sax
 
crawlercommons.sitemaps.sax.extension - package crawlercommons.sitemaps.sax.extension
 
crawlercommons.utils - package crawlercommons.utils
 
create(Extension) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
 
currentElement() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
currentElementParent() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 

D

DAILY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
 
DEFAULT_MAX_CRAWL_DELAY - Static variable in class crawlercommons.robots.SimpleRobotRulesParser
Default max Crawl-Delay in milliseconds, see SimpleRobotRulesParser.setMaxCrawlDelay(long)
DEFAULT_MAX_WARNINGS - Static variable in class crawlercommons.robots.SimpleRobotRulesParser
Default max number of warnings logged during parse of any one robots.txt file, see SimpleRobotRulesParser.setMaxWarnings(int)
DEFAULT_PRIORITY - Static variable in class crawlercommons.sitemaps.SiteMapURL
 
DelegatorHandler - Class in crawlercommons.sitemaps.sax
Provides a base SAX handler for parsing of XML documents representing sub-classes of AbstractSiteMap.
DelegatorHandler(URL, boolean) - Constructor for class crawlercommons.sitemaps.sax.DelegatorHandler
 
DelegatorHandler(LinkedList<String>, boolean) - Constructor for class crawlercommons.sitemaps.sax.DelegatorHandler
 
DESCRIPTION - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
detect(byte[]) - Method in class crawlercommons.mimetypes.MimeTypeDetector
 
detect(byte[], int) - Method in class crawlercommons.mimetypes.MimeTypeDetector
 
detect(InputStream) - Method in class crawlercommons.mimetypes.MimeTypeDetector
 
DOT - Static variable in class crawlercommons.domains.EffectiveTldFinder
 
DOT_REGEX - Static variable in class crawlercommons.domains.EffectiveTldFinder
 
DURATION - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 

E

EffectiveTLD(String, boolean) - Constructor for class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
Parse one non-empty, non-comment line in the public suffix list and hold the public suffix and its properties in the created object.
EffectiveTldFinder - Class in crawlercommons.domains
To determine the actual domain name of a host name or URL requires knowledge of the various domain registrars and their assignment policies.
EffectiveTldFinder.EffectiveTLD - Class in crawlercommons.domains
EffectiveTLD objects hold one line of the public suffix list: the suffix (com, co.uk, etc.) for IDN suffixes: both the ASCII and IDN variant (xn--p1ai and рф) and the properties required to parse host/domain names given in the public suffix list (wildcard suffix, exception, in private domain section)
EMPTY - Static variable in class crawlercommons.sitemaps.Namespace
In contradiction to the protocol specification ("The Sitemap must ...
enableExtension(Extension) - Method in class crawlercommons.sitemaps.SiteMapParser
Enable a support for a sitemap extension in the parser.
enableExtensions() - Method in class crawlercommons.sitemaps.SiteMapParser
Enable all supported sitemap extensions in the parser.
endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
 
endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
 
endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
 
equals(Object) - Method in class crawlercommons.robots.BaseRobotRules
 
equals(Object) - Method in class crawlercommons.robots.SimpleRobotRules
 
equals(Object) - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
 
equals(Object) - Method in class crawlercommons.sitemaps.extension.ExtensionMetadata
 
equals(Object) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
equals(Object) - Method in class crawlercommons.sitemaps.extension.LinkAttributes
 
equals(Object) - Method in class crawlercommons.sitemaps.extension.MobileAttributes
 
equals(Object) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
equals(Object) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
equals(Object) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
 
equals(Object) - Method in class crawlercommons.sitemaps.SiteMapURL
 
error(SAXParseException) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
ETLD_DATA - Static variable in class crawlercommons.domains.EffectiveTldFinder
 
EXCEPTION - Static variable in class crawlercommons.domains.EffectiveTldFinder
 
EXPIRATION_DATE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
Extension - Enum in crawlercommons.sitemaps.extension
Sitemap extensions supported by the parser.
ExtensionHandler - Class in crawlercommons.sitemaps.sax.extension
Handler to be called for elements in the namespace of a sitemap extension.
ExtensionHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.ExtensionHandler
 
ExtensionMetadata - Class in crawlercommons.sitemaps.extension
Container for attributes of a SiteMapURL defined by a sitemap extension.
ExtensionMetadata() - Constructor for class crawlercommons.sitemaps.extension.ExtensionMetadata
 
extensionNamespaces - Variable in class crawlercommons.sitemaps.sax.DelegatorHandler
 
extensionNamespaces - Variable in class crawlercommons.sitemaps.SiteMapParser
Map of sitemap extension namespaces required to find the right extension handler.

F

failedFetch(int) - Method in class crawlercommons.robots.BaseRobotsParser
The fetch of robots.txt failed, so return rules appropriate give the HTTP status code.
failedFetch(int) - Method in class crawlercommons.robots.SimpleRobotRulesParser
 
FAMILY_FRIENDLY - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
fatalError(SAXParseException) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
filter(String) - Method in class crawlercommons.filters.basic.BasicURLNormalizer
 
filter(String) - Method in class crawlercommons.filters.URLFilter
Returns a modified version of the input URL or null if the URL should be removed
formatQueryParameters(List<BasicURLNormalizer.NameValuePair>) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
Formats a list of query parameter name-value pairs into a query parameter string.

G

GALLERY_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
GALLERY_TITLE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
GENRES - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
 
GEO_LOCATION - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
 
get(String) - Method in class crawlercommons.domains.SuffixTrie
Get value associated with suffix string in trie.
getAllowedCountries() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getAllowedPlatforms() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getAndResetCharacterBuffer() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
getAssignedDomain(String) - Static method in class crawlercommons.domains.EffectiveTldFinder
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").
getAssignedDomain(String, boolean) - Static method in class crawlercommons.domains.EffectiveTldFinder
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").
getAssignedDomain(String, boolean, boolean) - Static method in class crawlercommons.domains.EffectiveTldFinder
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name.
getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
 
getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
 
getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.MobileHandler
 
getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
 
getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
 
getAttributes() - Method in class crawlercommons.sitemaps.SiteMapURL
Get attributes of sitemap extensions (news, images, videos, etc.)
getAttributesForExtension(Extension) - Method in class crawlercommons.sitemaps.SiteMapURL
Get attributes of a specific sitemap extension
getBaseUrl() - Method in class crawlercommons.sitemaps.SiteMap
 
getCaption() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
getCategory() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getChangeFrequency() - Method in class crawlercommons.sitemaps.SiteMapURL
Return the URL's change frequency
getChild(char) - Method in class crawlercommons.domains.SuffixTrie.Node
 
getContentLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getCrawlDelay() - Method in class crawlercommons.robots.BaseRobotRules
 
getCurrency() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
 
getDateValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
 
getDescription() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getDomain() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
 
getDuration() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getEffectiveTLD(String) - Static method in class crawlercommons.domains.EffectiveTldFinder
Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.
getEffectiveTLD(String, boolean) - Static method in class crawlercommons.domains.EffectiveTldFinder
Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.
getEffectiveTLDs() - Static method in class crawlercommons.domains.EffectiveTldFinder
 
getException() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
getExpirationDate() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getExpirationDateTime() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getFamilyFriendly() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getFloatValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
 
getGalleryLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getGalleryTitle() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getGenres() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
getGeoLocation() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
getHref() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
 
getInstance() - Static method in class crawlercommons.domains.EffectiveTldFinder
Get singleton instance of EffectiveTldFinder with default configuration.
getIntegerValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
 
getKeywords() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
getLanguage() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
getLastModified() - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
getLastModified() - Method in class crawlercommons.sitemaps.SiteMapURL
Return when this URL was last modified.
getLicense() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
getLive() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getLoc() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
getLongestSuffix(String) - Method in class crawlercommons.domains.SuffixTrie
Match the longest suffix of a string contained in trie.
getMaxCrawlDelay() - Method in class crawlercommons.robots.SimpleRobotRulesParser
Get configured max crawl delay.
getMaxWarnings() - Method in class crawlercommons.robots.SimpleRobotRulesParser
Get max number of logged warnings per robots.txt
getName() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
getNameVariants() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
Generate name variants caused by Internationalized Domain Names: every IDN part of a eTLD can be replaced by its punycoded ASCII variant.
getNumWarnings() - Method in class crawlercommons.robots.SimpleRobotRulesParser
Get the number of warnings due to invalid rules/lines in the latest processed robots.txt file (see SimpleRobotRulesParser.parseContent(String, byte[], String, String).
getParams() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
 
getPlayerLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getPLD(String) - Static method in class crawlercommons.domains.PaidLevelDomain
Extract the PLD (paid-level domain) from the hostname.
getPLD(URL) - Static method in class crawlercommons.domains.PaidLevelDomain
Extract the PLD (paid-level domain) from the URL.
getPrefix() - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
 
getPrice() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
 
getPrices() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getPriority() - Method in class crawlercommons.sitemaps.SiteMapURL
Return this URL's priority (a value between [0.0 - 1.0]).
getPublicationDate() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
getPublicationDate() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getPublicationDateTime() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
getPublicationDateTime() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getRating() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getRequiresSubscription() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getResolution() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
 
getRestrictedCountries() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getRestrictedPlatforms() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getRobotRules() - Method in class crawlercommons.robots.SimpleRobotRules
 
getSitemap(URL) - Method in class crawlercommons.sitemaps.SiteMapIndex
Returns the Sitemap that has the given URL.
getSiteMap() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
getSitemaps() - Method in class crawlercommons.robots.BaseRobotRules
Get URLs of sitemap links found in robots.txt
getSitemaps() - Method in class crawlercommons.sitemaps.SiteMapIndex
 
getSitemaps(boolean) - Method in class crawlercommons.sitemaps.SiteMapIndex
 
getSiteMapUrls() - Method in class crawlercommons.sitemaps.SiteMap
 
getStockTickers() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
getSuffixes(String) - Method in class crawlercommons.domains.SuffixTrie
Match all suffixes of a string contained in trie.
getTags() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getThumbnailLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getTitle() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
getTitle() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
getTitle() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getType() - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
getType() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
 
getUploader() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getUploaderInfo() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getUrl() - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
getUrl() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
getUrl() - Method in class crawlercommons.sitemaps.SiteMapURL
Return the URL.
getURLValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
 
getVersion() - Static method in class crawlercommons.CrawlerCommons
 
getViewCount() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
getYesNoBooleanValue(String, String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
 

H

hashCode() - Method in class crawlercommons.robots.BaseRobotRules
 
hashCode() - Method in class crawlercommons.robots.SimpleRobotRules
 
hashCode() - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
 
hashCode() - Method in class crawlercommons.sitemaps.SiteMapURL
 
hasUnprocessedSitemap() - Method in class crawlercommons.sitemaps.SiteMapIndex
 
HD - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
 
help() - Static method in class crawlercommons.domains.EffectiveTldFinder
 
HOURLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
 
HREF - Static variable in class crawlercommons.sitemaps.extension.LinkAttributes
 

I

idnNormalization - Variable in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
 
idnNormalization(BasicURLNormalizer.IdnNormalization) - Method in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
Configures whether internationalized domain names (IDNs) should be converted to ASCII/Punycode or Unicode.
IMAGE - crawlercommons.sitemaps.extension.Extension
Google Image sitemaps, see https://support.google.com/webmasters/answer/178636
IMAGE - Static variable in class crawlercommons.sitemaps.Namespace
 
ImageAttributes - Class in crawlercommons.sitemaps.extension
Data model for Google extension to the sitemap protocol regarding images indexing, as per http://www.google.com/schemas/sitemap-image/1.1
ImageAttributes() - Constructor for class crawlercommons.sitemaps.extension.ImageAttributes
 
ImageAttributes(URL) - Constructor for class crawlercommons.sitemaps.extension.ImageAttributes
 
ImageHandler - Class in crawlercommons.sitemaps.sax.extension
Handle SAX events in the Google Image sitemap extension namespace.
ImageHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.ImageHandler
 
INDEX - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
 
initialize(InputStream) - Method in class crawlercommons.domains.EffectiveTldFinder
(Re)initialize EffectiveTldFinder with custom public suffix list.
IS_LIVE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
isAcceptedNamespace(String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
isAllBlank(CharSequence) - Static method in class crawlercommons.sitemaps.sax.DelegatorHandler
Return true if character sequence contains only white space including Unicode whitespace, cf.
isAllow() - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
 
isAllowAll() - Method in class crawlercommons.robots.BaseRobotRules
 
isAllowAll() - Method in class crawlercommons.robots.SimpleRobotRules
Is our ruleset set up to allow all access?
isAllowed(String) - Method in class crawlercommons.robots.BaseRobotRules
 
isAllowed(String) - Method in class crawlercommons.robots.SimpleRobotRules
 
isAllowNone() - Method in class crawlercommons.robots.BaseRobotRules
 
isAllowNone() - Method in class crawlercommons.robots.SimpleRobotRules
Is our ruleset set up to disallow all access?
isBlank(String) - Static method in class crawlercommons.utils.Strings
 
isConfigured() - Method in class crawlercommons.domains.EffectiveTldFinder
 
isDeferVisits() - Method in class crawlercommons.robots.BaseRobotRules
 
isException() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
 
isExtensionNamespace(String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
isGzip(String) - Method in class crawlercommons.mimetypes.MimeTypeDetector
 
isIndex() - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
isIndex() - Method in class crawlercommons.sitemaps.SiteMap
 
isIndex() - Method in class crawlercommons.sitemaps.SiteMapIndex
 
isProcessed() - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
isStrict() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
isStrict() - Method in class crawlercommons.sitemaps.SiteMapParser
 
isStrictNamespace() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
isStrictNamespace() - Method in class crawlercommons.sitemaps.SiteMapParser
 
isSupported(String) - Static method in class crawlercommons.sitemaps.Namespace
 
isText(String) - Method in class crawlercommons.mimetypes.MimeTypeDetector
 
isValid() - Method in class crawlercommons.sitemaps.extension.ExtensionMetadata
 
isValid() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
isValid() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
isValid() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
isValid() - Method in class crawlercommons.sitemaps.SiteMapURL
Is the siteMapURL under the base url ?
isWhitespace(char) - Static method in class crawlercommons.sitemaps.sax.DelegatorHandler
Check whether character is any Unicode whitespace, including the space characters not covered by Character.isWhitespace(char)
isWild() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
 
isXml(String) - Method in class crawlercommons.mimetypes.MimeTypeDetector
 

K

KEYWORDS - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
 

L

LANGUAGE - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
 
LICENSE - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
 
LinkAttributes - Class in crawlercommons.sitemaps.extension
Data model for Google extension to the sitemap protocol regarding alternate links indexing.
LinkAttributes() - Constructor for class crawlercommons.sitemaps.extension.LinkAttributes
 
LinkAttributes(URL) - Constructor for class crawlercommons.sitemaps.extension.LinkAttributes
 
LINKS - crawlercommons.sitemaps.extension.Extension
Usage of <xhtml:links> in sitemaps to include localized page versions/variants, see https://support.google.com/webmasters/answer/189077
LINKS - Static variable in class crawlercommons.sitemaps.Namespace
 
LinksHandler - Class in crawlercommons.sitemaps.sax.extension
Handle SAX events in the Google Image sitemap extension namespace.
LinksHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.LinksHandler
 
LOC - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
 
LOG - Static variable in class crawlercommons.filters.basic.BasicURLNormalizer
 
LOG - Static variable in class crawlercommons.sitemaps.SiteMapParser
 
LookupResult(int, V) - Constructor for class crawlercommons.domains.SuffixTrie.LookupResult
 

M

main(String[]) - Static method in class crawlercommons.domains.EffectiveTldFinder
 
main(String[]) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
 
main(String[]) - Static method in class crawlercommons.robots.SimpleRobotRulesParser
 
main(String[]) - Static method in class crawlercommons.sitemaps.SiteMapTester
 
MAX_BYTES_ALLOWED - Static variable in class crawlercommons.sitemaps.SiteMapParser
Sitemaps (including sitemap index files) "must be no larger than 50MB (52,428,800 bytes)" as specified in the Sitemaps XML format (before Nov.
MAX_DOMAIN_LENGTH_PART - Static variable in class crawlercommons.domains.EffectiveTldFinder
Max.
MimeTypeDetector - Class in crawlercommons.mimetypes
 
MimeTypeDetector() - Constructor for class crawlercommons.mimetypes.MimeTypeDetector
 
MOBILE - crawlercommons.sitemaps.extension.Extension
Mobile sitemaps just contain an empty "mobile" tag to identify a URL as having mobile content, cf.
MOBILE - Static variable in class crawlercommons.sitemaps.Namespace
 
MobileAttributes - Class in crawlercommons.sitemaps.extension
Google mobile sitemap attributes, see http://www.google.de/schemas/sitemap-mobile/1.0/ and https://www.google.com/schemas/sitemap-mobile/1.0/sitemap-mobile.xsd: Mobile sitemaps just contain an empty "mobile" tag to identify a URL as having mobile content.
MobileAttributes() - Constructor for class crawlercommons.sitemaps.extension.MobileAttributes
 
MobileHandler - Class in crawlercommons.sitemaps.sax.extension
Handle SAX events in the Google Mobile sitemap extension namespace.
MobileHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.MobileHandler
 
MONTHLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
 

N

NAME - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
 
Namespace - Class in crawlercommons.sitemaps
supported sitemap formats: https://www.sitemaps.org/protocol.html#otherformats
Namespace() - Constructor for class crawlercommons.sitemaps.Namespace
 
NEVER - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
 
newBuilder() - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
Create a new builder object for creating a customized BasicURLNormalizer object.
NEWS - crawlercommons.sitemaps.extension.Extension
Google News sitemaps, see https://support.google.com/news/publisher-center/answer/74288
NEWS - Static variable in class crawlercommons.sitemaps.Namespace
 
NewsAttributes - Class in crawlercommons.sitemaps.extension
Data model for Google's extension to the sitemap protocol regarding news indexing, as per http ://www.google.com/schemas/sitemap-news/0.9.
NewsAttributes() - Constructor for class crawlercommons.sitemaps.extension.NewsAttributes
 
NewsAttributes(String, String, ZonedDateTime, String) - Constructor for class crawlercommons.sitemaps.extension.NewsAttributes
 
NewsAttributes.NewsGenre - Enum in crawlercommons.sitemaps.extension
 
NewsHandler - Class in crawlercommons.sitemaps.sax.extension
Handle SAX events in the Google News sitemap extension namespace.
NewsHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.NewsHandler
 
nextUnprocessedSitemap() - Method in class crawlercommons.sitemaps.SiteMapIndex
 
Node() - Constructor for class crawlercommons.domains.SuffixTrie.Node
 
NONE - crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
 
normalize(String, byte[]) - Method in class crawlercommons.mimetypes.MimeTypeDetector
 
normalizeRSSTimestamp(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
Converts pubDate of RSS to the ISO-8601 instant format, e.g., '2017-01-05T12:34:54Z' in UTC / GMT time zone, see DateTimeFormatter.ISO_INSTANT.

O

OpEd - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
 
Opinion - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
 
own - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
 

P

PaidLevelDomain - Class in crawlercommons.domains
Routines to extract the PLD (paid-level domain, as per the IRLbot paper) from a hostname or URL.
PaidLevelDomain() - Constructor for class crawlercommons.domains.PaidLevelDomain
 
parseContent(String, byte[], String, String) - Method in class crawlercommons.robots.BaseRobotsParser
Parse the robots.txt file in content, and return rules appropriate for processing paths by userAgent.
parseContent(String, byte[], String, String) - Method in class crawlercommons.robots.SimpleRobotRulesParser
 
parseQueryParameters(String, int, Set<String>) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
Receives the URL query string and parses it into a list of name-value pairs.
parseRSSTimestamp(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
Parse pubDate of RSS feeds.
parseSiteMap(byte[], URL) - Method in class crawlercommons.sitemaps.SiteMapParser
Parse a sitemap, given the content bytes and the URL.
parseSiteMap(String, byte[], AbstractSiteMap) - Method in class crawlercommons.sitemaps.SiteMapParser
Returns a processed copy of an unprocessed sitemap object, i.e.
parseSiteMap(String, byte[], URL) - Method in class crawlercommons.sitemaps.SiteMapParser
Parse a sitemap, given the MIME type, the content bytes, and the URL.
parseSiteMap(URL) - Method in class crawlercommons.sitemaps.SiteMapParser
Returns a SiteMap or SiteMapIndex given an online sitemap URL Please note that this method is a static method which goes online and fetches the sitemap then parses it This method is a convenience method for a user who has a sitemap URL and wants a "Keep it simple" way to parse it.
PLAYER_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
PressRelease - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
 
PRICES - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
processGzippedXML(URL, byte[]) - Method in class crawlercommons.sitemaps.SiteMapParser
Decompress the gzipped content and process the resulting XML Sitemap.
processText(URL, byte[]) - Method in class crawlercommons.sitemaps.SiteMapParser
Process a text-based Sitemap.
processText(URL, InputStream) - Method in class crawlercommons.sitemaps.SiteMapParser
Process a text-based Sitemap.
processXml(URL, byte[]) - Method in class crawlercommons.sitemaps.SiteMapParser
Parse the given XML content.
processXml(URL, InputSource) - Method in class crawlercommons.sitemaps.SiteMapParser
Parse the given XML content.
PUBLICATION_DATE - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
 
PUBLICATION_DATE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
PUNYCODE - crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
 
put(String, V) - Method in class crawlercommons.domains.SuffixTrie
Insert a string and an associated value into the trie.

Q

queryParamsToRemove(Collection<String>) - Method in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
A collection of names of query parameters that should be removed from the URL query.

R

RATING - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
rent - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
 
REQUIRES_SUBSCRIPTION - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
reset() - Method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
 
reset() - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
 
reset() - Method in class crawlercommons.sitemaps.sax.extension.MobileHandler
 
reset() - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
 
reset() - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
 
resetCharacterBuffer() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
RESTRICTED_COUNTRIES - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
RESTRICTED_PLATFORMS - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
RobotRule(String, boolean) - Constructor for class crawlercommons.robots.SimpleRobotRules.RobotRule
 
root - Variable in class crawlercommons.domains.SuffixTrie
 
RSS - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
 
RSS_2_0 - Static variable in class crawlercommons.sitemaps.Namespace
RSS and Atom sitemap formats do not have strict definition.

S

Satire - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
 
SD - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
 
setAcceptedNamespaces(Set<String>) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
setAllowDocTypeDefinitions(boolean) - Method in class crawlercommons.sitemaps.SiteMapParser
Sets if the parser allows a DTD in sitemaps or feeds.
setAllowedCountries(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setAllowedPlatforms(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setCaption(String) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
setCategory(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setChangeFrequency(SiteMapURL.ChangeFrequency) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL's change frequency
setChangeFrequency(String) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL's change frequency In case of a bad ChangeFrequency, the current frequency in this instance will be set to NULL
setContentLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setCrawlDelay(long) - Method in class crawlercommons.robots.BaseRobotRules
 
setDeferVisits(boolean) - Method in class crawlercommons.robots.BaseRobotRules
 
setDescription(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setDuration(Integer) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setException(UnknownFormatException) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
setExpirationDate(ZonedDateTime) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setExtensionNamespaces(Map<String, Extension>) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
setFamilyFriendly(Boolean) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setGalleryLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setGalleryTitle(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setGenres(NewsAttributes.NewsGenre[]) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
setGeoLocation(String) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
setHref(URL) - Method in class crawlercommons.sitemaps.extension.LinkAttributes
 
setKeywords(String[]) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
setLanguage(String) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
setLastModified(String) - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
setLastModified(String) - Method in class crawlercommons.sitemaps.SiteMapURL
Set when this URL was last modified.
setLastModified(ZonedDateTime) - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
setLastModified(ZonedDateTime) - Method in class crawlercommons.sitemaps.SiteMapURL
Set when this URL was last modified.
setLastModified(Date) - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
setLastModified(Date) - Method in class crawlercommons.sitemaps.SiteMapURL
Set when this URL was last modified.
setLicense(URL) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
setLive(Boolean) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setLoc(URL) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
setMaxCrawlDelay(long) - Method in class crawlercommons.robots.SimpleRobotRulesParser
Set the max value in milliseconds accepted for the Crawl-Delay directive.
setMaxWarnings(int) - Method in class crawlercommons.robots.SimpleRobotRulesParser
Set the max number of warnings about parse errors logged per robots.txt
setName(String) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
setParams(Map<String, String>) - Method in class crawlercommons.sitemaps.extension.LinkAttributes
 
setPlayerLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setPrice(Float) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
 
setPrices(VideoAttributes.VideoPrice[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setPriority(double) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL's priority to a value between [0.0 - 1.0] (Default Priority is used if the given priority is out of range).
setPriority(String) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL's priority to a value between [0.0 - 1.0] (Default Priority is used if the given priority missing or is out of range).
setProcessed(boolean) - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
setPublicationDate(ZonedDateTime) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
setPublicationDate(ZonedDateTime) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setRating(Float) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setRequiresSubscription(Boolean) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setRestrictedCountries(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setRestrictedPlatforms(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setStockTickers(String[]) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
setStrictNamespace(boolean) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
setStrictNamespace(boolean) - Method in class crawlercommons.sitemaps.SiteMapParser
Sets the parser to allow any XML namespace or just the one from the specification, or any accepted namespace (see SiteMapParser.addAcceptedNamespace(String)).
setTags(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setThumbnailLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setTitle(String) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
setTitle(String) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
setTitle(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setType(AbstractSiteMap.SitemapType) - Method in class crawlercommons.sitemaps.AbstractSiteMap
 
setUploader(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setUploaderInfo(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
setUrl(String) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL.
setUrl(URL) - Method in class crawlercommons.sitemaps.SiteMapURL
Set the URL.
setURLFilter(URLFilter) - Method in class crawlercommons.sitemaps.SiteMapParser
Use URLFilter to filter URLs, eg.
setURLFilter(Function<String, String>) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
setURLFilter(Function<String, String>) - Method in class crawlercommons.sitemaps.SiteMapParser
Set URL filter function to normalize URLs found in sitemaps or filter URLs away if the function returns null.
setValid(boolean) - Method in class crawlercommons.sitemaps.SiteMapURL
Valid means that it follows the official guidelines that the siteMapURL must be under the base url
setViewCount(Integer) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
SimpleRobotRules - Class in crawlercommons.robots
Result from parsing a single robots.txt file - which means we get a set of rules, and an optional crawl-delay, and an optional sitemap URL.
SimpleRobotRules() - Constructor for class crawlercommons.robots.SimpleRobotRules
 
SimpleRobotRules(SimpleRobotRules.RobotRulesMode) - Constructor for class crawlercommons.robots.SimpleRobotRules
 
SimpleRobotRules.RobotRule - Class in crawlercommons.robots
Single rule that maps from a path prefix to an allow flag.
SimpleRobotRules.RobotRulesMode - Enum in crawlercommons.robots
 
SimpleRobotRulesParser - Class in crawlercommons.robots
This implementation of BaseRobotsParser retrieves a set of rules for an agent with the given name from the robots.txt file of a given domain.
SimpleRobotRulesParser() - Constructor for class crawlercommons.robots.SimpleRobotRulesParser
 
SimpleRobotRulesParser(long, int) - Constructor for class crawlercommons.robots.SimpleRobotRulesParser
 
SiteMap - Class in crawlercommons.sitemaps
 
SiteMap() - Constructor for class crawlercommons.sitemaps.SiteMap
 
SiteMap(String) - Constructor for class crawlercommons.sitemaps.SiteMap
 
SiteMap(String, String) - Constructor for class crawlercommons.sitemaps.SiteMap
 
SiteMap(URL) - Constructor for class crawlercommons.sitemaps.SiteMap
 
SiteMap(URL, Date) - Constructor for class crawlercommons.sitemaps.SiteMap
 
SITEMAP - Static variable in class crawlercommons.sitemaps.Namespace
 
SITEMAP_EXTENSION_NAMESPACES - Static variable in class crawlercommons.sitemaps.Namespace
 
SITEMAP_LEGACY - Static variable in class crawlercommons.sitemaps.Namespace
Legacy schema URIs from prior sitemap protocol versions and frequent variants.
SITEMAP_SUPPORTED_NAMESPACES - Static variable in class crawlercommons.sitemaps.Namespace
 
SiteMapIndex - Class in crawlercommons.sitemaps
 
SiteMapIndex() - Constructor for class crawlercommons.sitemaps.SiteMapIndex
 
SiteMapIndex(URL) - Constructor for class crawlercommons.sitemaps.SiteMapIndex
 
SiteMapParser - Class in crawlercommons.sitemaps
 
SiteMapParser() - Constructor for class crawlercommons.sitemaps.SiteMapParser
SiteMapParser with strict location validation (SiteMapParser.isStrict()) and not allowing partially parsed content.
SiteMapParser(boolean) - Constructor for class crawlercommons.sitemaps.SiteMapParser
SiteMapParser with configurable location validation, not allowing partially parsed content.
SiteMapParser(boolean, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapParser
 
SiteMapTester - Class in crawlercommons.sitemaps
Sitemap Tool for recursively fetching all URL's from a sitemap (and all of it's children)
SiteMapTester() - Constructor for class crawlercommons.sitemaps.SiteMapTester
 
SiteMapURL - Class in crawlercommons.sitemaps
The SitemapUrl class represents a URL found in a Sitemap.
SiteMapURL(String, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
 
SiteMapURL(String, String, String, String, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
 
SiteMapURL(URL, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
 
SiteMapURL(URL, ZonedDateTime, SiteMapURL.ChangeFrequency, double, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
 
SiteMapURL(URL, Date, SiteMapURL.ChangeFrequency, double, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
 
SiteMapURL.ChangeFrequency - Enum in crawlercommons.sitemaps
Allowed change frequencies
SkipLeadingWhiteSpaceInputStream - Class in crawlercommons.sitemaps
Wraps a stream and skips over leading whitespace (at beginning of file) in the wrapped stream.
SkipLeadingWhiteSpaceInputStream(InputStream) - Constructor for class crawlercommons.sitemaps.SkipLeadingWhiteSpaceInputStream
 
sortRules() - Method in class crawlercommons.robots.SimpleRobotRules
In order to match up with Google's convention, we want to match rules from longest to shortest.
startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
 
startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
 
startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.LinksHandler
 
startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.MobileHandler
 
startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
 
startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
 
STOCK_TICKERS - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
 
strict - Variable in class crawlercommons.sitemaps.SiteMapParser
True (by default) meaning that invalid URLs should be rejected, as the official docs allow the siteMapURLs to be only under the base url: https://www.sitemaps.org/protocol.html#location
strictNamespace - Variable in class crawlercommons.sitemaps.SiteMapParser
Indicates whether the parser should work with the namespace from the specifications or any namespace.
Strings - Class in crawlercommons.utils
Util functions for manipulating strings.
Strings() - Constructor for class crawlercommons.utils.Strings
 
stripAllBlank(CharSequence) - Static method in class crawlercommons.sitemaps.sax.DelegatorHandler
Trim all whitespace including Unicode whitespace
SuffixTrie<V> - Class in crawlercommons.domains
 
SuffixTrie() - Constructor for class crawlercommons.domains.SuffixTrie
 
SuffixTrie.LookupResult<V> - Class in crawlercommons.domains
Wrapper for results when a string is checked for suffixes contained in the suffix trie.
SuffixTrie.Node<V> - Class in crawlercommons.domains
 

T

TAGS - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
TEXT - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
 
THUMBNAIL_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
TIME_ZONE_UTC - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
 
TITLE - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
 
TITLE - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
 
TITLE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
toString() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
 
toString() - Method in class crawlercommons.robots.BaseRobotRules
Returns a string with the crawl delay as well as a list of sitemaps if they exist (and aren't more than 10)
toString() - Method in class crawlercommons.robots.SimpleRobotRules
 
toString() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
 
toString() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
 
toString() - Method in class crawlercommons.sitemaps.extension.MobileAttributes
 
toString() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
 
toString() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
 
toString() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
 
toString() - Method in class crawlercommons.sitemaps.SiteMap
 
toString() - Method in class crawlercommons.sitemaps.SiteMapIndex
 
toString() - Method in class crawlercommons.sitemaps.SiteMapURL
 

U

unescapePath(String) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
Remove % encoding from path segment in URL for characters which should be unescaped according to RFC3986.
UNICODE - crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
 
UnknownFormatException - Exception in crawlercommons.sitemaps
Exception thrown if the format of a sitemap failed to parse.
UnknownFormatException() - Constructor for exception crawlercommons.sitemaps.UnknownFormatException
 
UnknownFormatException(String) - Constructor for exception crawlercommons.sitemaps.UnknownFormatException
 
UnknownFormatException(String, Throwable) - Constructor for exception crawlercommons.sitemaps.UnknownFormatException
 
UNSET_CRAWL_DELAY - Static variable in class crawlercommons.robots.BaseRobotRules
 
UPLOADER - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
UPLOADER_INFO - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 
url - Variable in class crawlercommons.sitemaps.AbstractSiteMap
 
urlEquals(URL, URL) - Static method in class crawlercommons.sitemaps.extension.ExtensionMetadata
Compare URLs by their string representation because calling URL.equals(Object) may trigger an unwanted and potentially slow DNS lookup to resolve the host part
urlFilter - Variable in class crawlercommons.sitemaps.sax.DelegatorHandler
 
URLFilter - Class in crawlercommons.filters
 
URLFilter() - Constructor for class crawlercommons.filters.URLFilter
 
urlIsValid(String, String) - Static method in class crawlercommons.sitemaps.SiteMapParser
See if testUrl is under sitemapBaseUrl.
UserGenerated - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
 

V

valueOf(String) - Static method in enum crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.robots.SimpleRobotRules.RobotRulesMode
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.sitemaps.AbstractSiteMap.SitemapType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.Extension
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
Returns the enum constant of this type with the specified name.
valueOf(String) - Static method in enum crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
Returns the enum constant of this type with the specified name.
values() - Static method in enum crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum crawlercommons.robots.SimpleRobotRules.RobotRulesMode
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum crawlercommons.sitemaps.AbstractSiteMap.SitemapType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum crawlercommons.sitemaps.extension.Extension
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
Returns an array containing the constants of this enum type, in the order they are declared.
values() - Static method in enum crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
Returns an array containing the constants of this enum type, in the order they are declared.
VIDEO - crawlercommons.sitemaps.extension.Extension
Google Video sitemaps, see https://support.google.com/webmasters/answer/80471
VIDEO - Static variable in class crawlercommons.sitemaps.Namespace
 
VideoAttributes - Class in crawlercommons.sitemaps.extension
Data model for Google extension to the sitemap protocol regarding images indexing, as per http://www.google.com/schemas/sitemap-video/1.1
VideoAttributes() - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes
 
VideoAttributes(URL, String, String, URL, URL) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes
 
VideoAttributes.VideoPrice - Class in crawlercommons.sitemaps.extension
 
VideoAttributes.VideoPriceResolution - Enum in crawlercommons.sitemaps.extension
 
VideoAttributes.VideoPriceType - Enum in crawlercommons.sitemaps.extension
 
VideoHandler - Class in crawlercommons.sitemaps.sax.extension
Handle SAX events in the Google Video sitemap extension namespace.
VideoHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.VideoHandler
 
VideoPrice(String, Float) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
 
VideoPrice(String, Float, VideoAttributes.VideoPriceType) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
 
VideoPrice(String, Float, VideoAttributes.VideoPriceType, VideoAttributes.VideoPriceResolution) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
 
VIEW_COUNT - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
 

W

W3C_FULLDATE_FORMATTER - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
DateTimeFormatter for parsing dates in ISO-8601 format
W3C_FULLDATE_FORMATTER_UTC - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
DateTimeFormatter to format dates in ISO-8601 format (UTC time zone 'Z')
W3C_SHORTDATE_FORMATTER - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
DateTimeFormatter for parsing short dates ('1997', '1997-07', '1997-07-16') without daytime and time zone
walkSiteMap(AbstractSiteMap, Consumer<SiteMapURL>) - Method in class crawlercommons.sitemaps.SiteMapParser
Traverse a sitemap, recursively fetching and traversing the content of any enclosed sitemap index, and performing the specified action for each sitemap URL until all URLs have been processed or the action throws an exception.
walkSiteMap(URL, Consumer<SiteMapURL>) - Method in class crawlercommons.sitemaps.SiteMapParser
Fetch a sitemap from the specified URL, recursively fetching and traversing the content of any enclosed sitemap index, and performing the specified action for each sitemap URL until all URLs have been processed or the action throws an exception.
WEEKLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
 
WILD_CARD - Static variable in class crawlercommons.domains.EffectiveTldFinder
 

X

XML - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
 

Y

YEARLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
 

_

_mode - Variable in class crawlercommons.robots.SimpleRobotRules
 
_rules - Variable in class crawlercommons.robots.SimpleRobotRules
 
A B C D E F G H I K L M N O P Q R S T U V W X Y _ 
All Classes All Packages