A B C D E F G H I K L M N O P Q R S T U V W X Y _
All Classes All Packages
All Classes All Packages
All Classes All Packages
A
- AbstractSiteMap - Class in crawlercommons.sitemaps
-
SiteMap or SiteMapIndex
- AbstractSiteMap() - Constructor for class crawlercommons.sitemaps.AbstractSiteMap
- AbstractSiteMap.SitemapType - Enum in crawlercommons.sitemaps
-
Various Sitemap types
- acceptedNamespaces - Variable in class crawlercommons.sitemaps.SiteMapParser
-
Set of namespaces (if
SiteMapParser.strictNamespace
) accepted by the parser. - addAcceptedNamespace(String) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Add namespace URI to set of accepted namespaces.
- addAcceptedNamespace(String[]) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Add namespace URIs to set of accepted namespaces.
- addAttributesForExtension(Extension, ExtensionMetadata[]) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Add attributes of a specific sitemap extension
- addChild(char, V) - Method in class crawlercommons.domains.SuffixTrie.Node
- addPrice(VideoAttributes.VideoPrice) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- addRule(String, boolean) - Method in class crawlercommons.robots.SimpleRobotRules
- addSitemap(AbstractSiteMap) - Method in class crawlercommons.sitemaps.SiteMapIndex
-
Add this Sitemap to the list of Sitemaps,
- addSitemap(String) - Method in class crawlercommons.robots.BaseRobotRules
-
Add sitemap URL to rules if not a duplicate
- addSiteMapUrl(SiteMapURL) - Method in class crawlercommons.sitemaps.SiteMap
- addTag(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- ALLOW_ALL - crawlercommons.robots.SimpleRobotRules.RobotRulesMode
- ALLOW_NONE - crawlercommons.robots.SimpleRobotRules.RobotRulesMode
- ALLOW_SOME - crawlercommons.robots.SimpleRobotRules.RobotRulesMode
- ALLOWED_COUNTRIES - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- ALLOWED_PLATFORMS - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- ALWAYS - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
- appendCharacterBuffer(char[], int, int) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- appendCharacterBuffer(String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- asMap() - Method in class crawlercommons.sitemaps.extension.ExtensionMetadata
- asMap() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- asMap() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- asMap() - Method in class crawlercommons.sitemaps.extension.MobileAttributes
- asMap() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- asMap() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- ATOM - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
- ATOM_0_3 - Static variable in class crawlercommons.sitemaps.Namespace
- ATOM_1_0 - Static variable in class crawlercommons.sitemaps.Namespace
- attributes - Variable in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
B
- BaseRobotRules - Class in crawlercommons.robots
-
Result from parsing a single robots.txt file – a set of allow/disallow rules to check whether a given URL is allowed, and optionally a Crawl-delay and Sitemap URLs.
- BaseRobotRules() - Constructor for class crawlercommons.robots.BaseRobotRules
- BaseRobotsParser - Class in crawlercommons.robots
-
Robots.txt parser definition.
- BaseRobotsParser() - Constructor for class crawlercommons.robots.BaseRobotsParser
- BasicURLNormalizer - Class in crawlercommons.filters.basic
-
Code borrowed from Apache Nutch.
- BasicURLNormalizer() - Constructor for class crawlercommons.filters.basic.BasicURLNormalizer
- BasicURLNormalizer(BasicURLNormalizer.Builder) - Constructor for class crawlercommons.filters.basic.BasicURLNormalizer
- BasicURLNormalizer.Builder - Class in crawlercommons.filters.basic
-
A builder class for the
BasicURLNormalizer
. - BasicURLNormalizer.IdnNormalization - Enum in crawlercommons.filters.basic
- beforeRead(int) - Method in class crawlercommons.sitemaps.SkipLeadingWhiteSpaceInputStream
- Blog - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
- build() - Method in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
-
Constructs the custom URL normalizer instance.
C
- CAPTION - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
- CATEGORY - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
- characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
- characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
- clearRules() - Method in class crawlercommons.robots.SimpleRobotRules
- commaSeparated - Static variable in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- COMMENT - Static variable in class crawlercommons.domains.EffectiveTldFinder
- compareTo(SimpleRobotRules.RobotRule) - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
- contains(String) - Method in class crawlercommons.domains.SuffixTrie
-
Checks whether trie contains a suffix string.
- CONTENT_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- convertToDate(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
- convertToZonedDateTime(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
-
Convert the given date (given in an acceptable DateFormat), return null if the date is not in the correct format.
- crawlercommons - package crawlercommons
- CrawlerCommons - Class in crawlercommons
- CrawlerCommons() - Constructor for class crawlercommons.CrawlerCommons
- crawlercommons.domains - package crawlercommons.domains
-
Classes contained within the domains package relate to the definition of Top Level Domain's, various domain registrars and the effective handling of such domains.
- crawlercommons.filters - package crawlercommons.filters
-
The filters package contains code and resources for URL filtering.
- crawlercommons.filters.basic - package crawlercommons.filters.basic
- crawlercommons.mimetypes - package crawlercommons.mimetypes
- crawlercommons.robots - package crawlercommons.robots
-
The robots package contains all of the robots.txt rule inference, parsing and utilities contained within Crawler Commons.
- crawlercommons.sitemaps - package crawlercommons.sitemaps
-
Sitemaps package provides all classes relevant to focused sitemap parsing, url definition and processing.
- crawlercommons.sitemaps.extension - package crawlercommons.sitemaps.extension
- crawlercommons.sitemaps.sax - package crawlercommons.sitemaps.sax
- crawlercommons.sitemaps.sax.extension - package crawlercommons.sitemaps.sax.extension
- crawlercommons.utils - package crawlercommons.utils
- create(Extension) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- currentElement() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- currentElementParent() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
D
- DAILY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
- DEFAULT_MAX_CRAWL_DELAY - Static variable in class crawlercommons.robots.SimpleRobotRulesParser
-
Default max Crawl-Delay in milliseconds, see
SimpleRobotRulesParser.setMaxCrawlDelay(long)
- DEFAULT_MAX_WARNINGS - Static variable in class crawlercommons.robots.SimpleRobotRulesParser
-
Default max number of warnings logged during parse of any one robots.txt file, see
SimpleRobotRulesParser.setMaxWarnings(int)
- DEFAULT_PRIORITY - Static variable in class crawlercommons.sitemaps.SiteMapURL
- DelegatorHandler - Class in crawlercommons.sitemaps.sax
-
Provides a base SAX handler for parsing of XML documents representing sub-classes of AbstractSiteMap.
- DelegatorHandler(URL, boolean) - Constructor for class crawlercommons.sitemaps.sax.DelegatorHandler
- DelegatorHandler(LinkedList<String>, boolean) - Constructor for class crawlercommons.sitemaps.sax.DelegatorHandler
- DESCRIPTION - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- detect(byte[]) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- detect(byte[], int) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- detect(InputStream) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- DOT - Static variable in class crawlercommons.domains.EffectiveTldFinder
- DOT_REGEX - Static variable in class crawlercommons.domains.EffectiveTldFinder
- DURATION - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
E
- EffectiveTLD(String, boolean) - Constructor for class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
-
Parse one non-empty, non-comment line in the public suffix list and hold the public suffix and its properties in the created object.
- EffectiveTldFinder - Class in crawlercommons.domains
-
To determine the actual domain name of a host name or URL requires knowledge of the various domain registrars and their assignment policies.
- EffectiveTldFinder.EffectiveTLD - Class in crawlercommons.domains
-
EffectiveTLD objects hold one line of the public suffix list: the suffix (
com
,co.uk
, etc.) for IDN suffixes: both the ASCII and IDN variant (xn--p1ai
andрф
) and the properties required to parse host/domain names given in the public suffix list (wildcard suffix, exception, in private domain section) - EMPTY - Static variable in class crawlercommons.sitemaps.Namespace
-
In contradiction to the protocol specification ("The Sitemap must ...
- enableExtension(Extension) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Enable a support for a sitemap extension in the parser.
- enableExtensions() - Method in class crawlercommons.sitemaps.SiteMapParser
-
Enable all supported sitemap extensions in the parser.
- endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
- endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
- endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
- equals(Object) - Method in class crawlercommons.robots.BaseRobotRules
- equals(Object) - Method in class crawlercommons.robots.SimpleRobotRules
- equals(Object) - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
- equals(Object) - Method in class crawlercommons.sitemaps.extension.ExtensionMetadata
- equals(Object) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- equals(Object) - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- equals(Object) - Method in class crawlercommons.sitemaps.extension.MobileAttributes
- equals(Object) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- equals(Object) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- equals(Object) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- equals(Object) - Method in class crawlercommons.sitemaps.SiteMapURL
- error(SAXParseException) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- escapePath(String) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
-
Convert path segment of URL from Unicode to UTF-8 and escape all characters which should be escaped according to RFC3986.
- escapePath(String, boolean[]) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
- escapePath(String, boolean[]) - Static method in class crawlercommons.robots.SimpleRobotRules
-
Encode/decode (using percent-encoding) all characters where necessary: encode Unicode/non-ASCII characters) and decode printable ASCII characters without special semantics.
- ETLD_DATA - Static variable in class crawlercommons.domains.EffectiveTldFinder
- EXCEPTION - Static variable in class crawlercommons.domains.EffectiveTldFinder
- EXPIRATION_DATE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- Extension - Enum in crawlercommons.sitemaps.extension
-
Sitemap extensions supported by the parser.
- ExtensionHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handler to be called for elements in the namespace of a sitemap extension.
- ExtensionHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- ExtensionMetadata - Class in crawlercommons.sitemaps.extension
-
Container for attributes of a
SiteMapURL
defined by a sitemap extension. - ExtensionMetadata() - Constructor for class crawlercommons.sitemaps.extension.ExtensionMetadata
- extensionNamespaces - Variable in class crawlercommons.sitemaps.sax.DelegatorHandler
- extensionNamespaces - Variable in class crawlercommons.sitemaps.SiteMapParser
-
Map of sitemap extension namespaces required to find the right extension handler.
F
- failedFetch(int) - Method in class crawlercommons.robots.BaseRobotsParser
-
The fetch of robots.txt failed, so return rules appropriate for the given HTTP status code.
- failedFetch(int) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
The fetch of robots.txt failed, so return rules appropriate for the given HTTP status code.
- FAMILY_FRIENDLY - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- fatalError(SAXParseException) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- filter(String) - Method in class crawlercommons.filters.basic.BasicURLNormalizer
- filter(String) - Method in class crawlercommons.filters.URLFilter
-
Returns a modified version of the input URL or null if the URL should be removed
- formatQueryParameters(List<BasicURLNormalizer.NameValuePair>) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
-
Formats a list of query parameter name-value pairs into a query parameter string.
G
- GALLERY_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- GALLERY_TITLE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- GENRES - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
- GEO_LOCATION - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
- get(String) - Method in class crawlercommons.domains.SuffixTrie
-
Get value associated with suffix string in trie.
- getAllowedCountries() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getAllowedPlatforms() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getAndResetCharacterBuffer() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- getAssignedDomain(String) - Static method in class crawlercommons.domains.EffectiveTldFinder
-
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").
- getAssignedDomain(String, boolean) - Static method in class crawlercommons.domains.EffectiveTldFinder
-
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").
- getAssignedDomain(String, boolean, boolean) - Static method in class crawlercommons.domains.EffectiveTldFinder
-
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name.
- getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
- getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.MobileHandler
- getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
- getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
- getAttributes() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Get attributes of sitemap extensions (news, images, videos, etc.)
- getAttributesForExtension(Extension) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Get attributes of a specific sitemap extension
- getBaseUrl() - Method in class crawlercommons.sitemaps.SiteMap
- getCaption() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- getCategory() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getChangeFrequency() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Return the URL's change frequency
- getChild(char) - Method in class crawlercommons.domains.SuffixTrie.Node
- getContentLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getCrawlDelay() - Method in class crawlercommons.robots.BaseRobotRules
-
Get Crawl-delay (in milliseconds)
- getCurrency() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- getDateValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- getDescription() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getDomain() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
- getDuration() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getEffectiveTLD(String) - Static method in class crawlercommons.domains.EffectiveTldFinder
-
Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.
- getEffectiveTLD(String, boolean) - Static method in class crawlercommons.domains.EffectiveTldFinder
-
Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.
- getEffectiveTLDs() - Static method in class crawlercommons.domains.EffectiveTldFinder
- getException() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- getExpirationDate() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getExpirationDateTime() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getFamilyFriendly() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getFloatValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- getGalleryLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getGalleryTitle() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getGenres() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getGeoLocation() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- getHref() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- getInstance() - Static method in class crawlercommons.domains.EffectiveTldFinder
-
Get singleton instance of EffectiveTldFinder with default configuration.
- getIntegerValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- getKeywords() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getLanguage() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getLastModified() - Method in class crawlercommons.sitemaps.AbstractSiteMap
- getLastModified() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Return when this URL was last modified.
- getLicense() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- getLive() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getLoc() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- getLongestSuffix(String) - Method in class crawlercommons.domains.SuffixTrie
-
Match the longest suffix of a string contained in trie.
- getMaxCrawlDelay() - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Get configured max crawl delay.
- getMaxWarnings() - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Get max number of logged warnings per robots.txt
- getName() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getNameVariants() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
-
Generate name variants caused by Internationalized Domain Names: every IDN part of a eTLD can be replaced by its punycoded ASCII variant.
- getNumWarnings() - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Get the number of warnings due to invalid rules/lines in the latest processed robots.txt file (see
SimpleRobotRulesParser.parseContent(String, byte[], String, String)
. - getParams() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- getPlayerLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getPLD(String) - Static method in class crawlercommons.domains.PaidLevelDomain
-
Extract the PLD (paid-level domain) from the hostname.
- getPLD(URL) - Static method in class crawlercommons.domains.PaidLevelDomain
-
Extract the PLD (paid-level domain) from the URL.
- getPrefix() - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
- getPrice() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- getPrices() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getPriority() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Return this URL's priority (a value between [0.0 - 1.0]).
- getPublicationDate() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getPublicationDate() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getPublicationDateTime() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getPublicationDateTime() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getRating() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getRequiresSubscription() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getResolution() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- getRestrictedCountries() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getRestrictedPlatforms() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getRobotRules() - Method in class crawlercommons.robots.SimpleRobotRules
- getSitemap(URL) - Method in class crawlercommons.sitemaps.SiteMapIndex
-
Returns the Sitemap that has the given URL.
- getSiteMap() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- getSitemaps() - Method in class crawlercommons.robots.BaseRobotRules
-
Get URLs of sitemap links found in robots.txt
- getSitemaps() - Method in class crawlercommons.sitemaps.SiteMapIndex
- getSitemaps(boolean) - Method in class crawlercommons.sitemaps.SiteMapIndex
- getSiteMapUrls() - Method in class crawlercommons.sitemaps.SiteMap
- getStockTickers() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getSuffixes(String) - Method in class crawlercommons.domains.SuffixTrie
-
Match all suffixes of a string contained in trie.
- getTags() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getThumbnailLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getTitle() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- getTitle() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getTitle() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getType() - Method in class crawlercommons.sitemaps.AbstractSiteMap
- getType() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- getUploader() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getUploaderInfo() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getUrl() - Method in class crawlercommons.sitemaps.AbstractSiteMap
- getUrl() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- getUrl() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Return the URL.
- getURLValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- getVersion() - Static method in class crawlercommons.CrawlerCommons
- getViewCount() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getYesNoBooleanValue(String, String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
H
- hashCode() - Method in class crawlercommons.robots.BaseRobotRules
- hashCode() - Method in class crawlercommons.robots.SimpleRobotRules
- hashCode() - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
- hashCode() - Method in class crawlercommons.sitemaps.SiteMapURL
- hasUnprocessedSitemap() - Method in class crawlercommons.sitemaps.SiteMapIndex
- HD - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
- help() - Static method in class crawlercommons.domains.EffectiveTldFinder
- HOURLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
- HREF - Static variable in class crawlercommons.sitemaps.extension.LinkAttributes
I
- idnNormalization - Variable in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
- idnNormalization(BasicURLNormalizer.IdnNormalization) - Method in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
-
Configures whether internationalized domain names (IDNs) should be converted to ASCII/Punycode or Unicode.
- IMAGE - crawlercommons.sitemaps.extension.Extension
-
Google Image sitemaps, see https://support.google.com/webmasters/answer/178636
- IMAGE - Static variable in class crawlercommons.sitemaps.Namespace
- ImageAttributes - Class in crawlercommons.sitemaps.extension
-
Data model for Google extension to the sitemap protocol regarding images indexing, as per http://www.google.com/schemas/sitemap-image/1.1
- ImageAttributes() - Constructor for class crawlercommons.sitemaps.extension.ImageAttributes
- ImageAttributes(URL) - Constructor for class crawlercommons.sitemaps.extension.ImageAttributes
- ImageHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handle SAX events in the Google Image sitemap extension namespace.
- ImageHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.ImageHandler
- INDEX - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
- initialize(InputStream) - Method in class crawlercommons.domains.EffectiveTldFinder
-
(Re)initialize EffectiveTldFinder with custom public suffix list.
- IS_LIVE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- isAcceptedNamespace(String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- isAllBlank(CharSequence) - Static method in class crawlercommons.sitemaps.sax.DelegatorHandler
-
Return true if character sequence contains only white space including Unicode whitespace, cf.
- isAllow() - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
- isAllowAll() - Method in class crawlercommons.robots.BaseRobotRules
- isAllowAll() - Method in class crawlercommons.robots.SimpleRobotRules
-
Is our ruleset set up to allow all access?
- isAllowed(String) - Method in class crawlercommons.robots.BaseRobotRules
- isAllowed(String) - Method in class crawlercommons.robots.SimpleRobotRules
- isAllowNone() - Method in class crawlercommons.robots.BaseRobotRules
- isAllowNone() - Method in class crawlercommons.robots.SimpleRobotRules
-
Is our ruleset set up to disallow all access?
- isBlank(String) - Static method in class crawlercommons.utils.Strings
- isConfigured() - Method in class crawlercommons.domains.EffectiveTldFinder
- isDeferVisits() - Method in class crawlercommons.robots.BaseRobotRules
- isExactUserAgentMatching() - Method in class crawlercommons.robots.SimpleRobotRulesParser
- isException() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
- isExtensionNamespace(String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- isGzip(String) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- isIndex() - Method in class crawlercommons.sitemaps.AbstractSiteMap
- isIndex() - Method in class crawlercommons.sitemaps.SiteMap
- isIndex() - Method in class crawlercommons.sitemaps.SiteMapIndex
- isProcessed() - Method in class crawlercommons.sitemaps.AbstractSiteMap
- isStrict() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- isStrict() - Method in class crawlercommons.sitemaps.SiteMapParser
- isStrictNamespace() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- isStrictNamespace() - Method in class crawlercommons.sitemaps.SiteMapParser
- isSupported(String) - Static method in class crawlercommons.sitemaps.Namespace
- isText(String) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- isValid() - Method in class crawlercommons.sitemaps.extension.ExtensionMetadata
- isValid() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- isValid() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- isValid() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- isValid() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Is the siteMapURL under the base url ?
- isValidUserAgentToObey(String) - Static method in class crawlercommons.robots.SimpleRobotRulesParser
-
Validate a user-agent product token as defined in RFC 9309, section 2.2.1
- isWhitespace(char) - Static method in class crawlercommons.sitemaps.sax.DelegatorHandler
-
Check whether character is any Unicode whitespace, including the space characters not covered by
Character.isWhitespace(char)
- isWild() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
- isXml(String) - Method in class crawlercommons.mimetypes.MimeTypeDetector
K
- KEYWORDS - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
L
- LANGUAGE - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
- LICENSE - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
- LinkAttributes - Class in crawlercommons.sitemaps.extension
-
Data model for Google extension to the sitemap protocol regarding alternate links indexing.
- LinkAttributes() - Constructor for class crawlercommons.sitemaps.extension.LinkAttributes
- LinkAttributes(URL) - Constructor for class crawlercommons.sitemaps.extension.LinkAttributes
- LINKS - crawlercommons.sitemaps.extension.Extension
-
Usage of
<xhtml:links>
in sitemaps to include localized page versions/variants, see https://support.google.com/webmasters/answer/189077 - LINKS - Static variable in class crawlercommons.sitemaps.Namespace
- LinksHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handle SAX events in the Google Image sitemap extension namespace.
- LinksHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.LinksHandler
- LOC - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
- LOG - Static variable in class crawlercommons.filters.basic.BasicURLNormalizer
- LOG - Static variable in class crawlercommons.sitemaps.SiteMapParser
- LookupResult(int, V) - Constructor for class crawlercommons.domains.SuffixTrie.LookupResult
M
- main(String[]) - Static method in class crawlercommons.domains.EffectiveTldFinder
- main(String[]) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
- main(String[]) - Static method in class crawlercommons.robots.SimpleRobotRulesParser
- main(String[]) - Static method in class crawlercommons.sitemaps.SiteMapTester
- MAX_BYTES_ALLOWED - Static variable in class crawlercommons.sitemaps.SiteMapParser
-
Sitemaps (including sitemap index files) "must be no larger than 50MB (52,428,800 bytes)" as specified in the Sitemaps XML format (before Nov.
- MAX_DOMAIN_LENGTH_PART - Static variable in class crawlercommons.domains.EffectiveTldFinder
-
Max.
- MimeTypeDetector - Class in crawlercommons.mimetypes
- MimeTypeDetector() - Constructor for class crawlercommons.mimetypes.MimeTypeDetector
- MOBILE - crawlercommons.sitemaps.extension.Extension
-
Mobile sitemaps just contain an empty "mobile" tag to identify a URL as having mobile content, cf.
- MOBILE - Static variable in class crawlercommons.sitemaps.Namespace
- MobileAttributes - Class in crawlercommons.sitemaps.extension
-
Google mobile sitemap attributes, see http://www.google.de/schemas/sitemap-mobile/1.0/ and https://www.google.com/schemas/sitemap-mobile/1.0/sitemap-mobile.xsd: Mobile sitemaps just contain an empty "mobile" tag to identify a URL as having mobile content.
- MobileAttributes() - Constructor for class crawlercommons.sitemaps.extension.MobileAttributes
- MobileHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handle SAX events in the Google Mobile sitemap extension namespace.
- MobileHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.MobileHandler
- MONTHLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
N
- NAME - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
- Namespace - Class in crawlercommons.sitemaps
-
supported sitemap formats: https://www.sitemaps.org/protocol.html#otherformats
- Namespace() - Constructor for class crawlercommons.sitemaps.Namespace
- NEVER - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
- newBuilder() - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
-
Create a new builder object for creating a customized
BasicURLNormalizer
object. - NEWS - crawlercommons.sitemaps.extension.Extension
-
Google News sitemaps, see https://support.google.com/news/publisher-center/answer/74288
- NEWS - Static variable in class crawlercommons.sitemaps.Namespace
- NewsAttributes - Class in crawlercommons.sitemaps.extension
-
Data model for Google's extension to the sitemap protocol regarding news indexing, as per http ://www.google.com/schemas/sitemap-news/0.9.
- NewsAttributes() - Constructor for class crawlercommons.sitemaps.extension.NewsAttributes
- NewsAttributes(String, String, ZonedDateTime, String) - Constructor for class crawlercommons.sitemaps.extension.NewsAttributes
- NewsAttributes.NewsGenre - Enum in crawlercommons.sitemaps.extension
- NewsHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handle SAX events in the Google News sitemap extension namespace.
- NewsHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.NewsHandler
- nextUnprocessedSitemap() - Method in class crawlercommons.sitemaps.SiteMapIndex
- Node() - Constructor for class crawlercommons.domains.SuffixTrie.Node
- NONE - crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
- normalize(String, byte[]) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- normalizeRSSTimestamp(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
-
Converts pubDate of RSS to the ISO-8601 instant format, e.g., '2017-01-05T12:34:54Z' in UTC / GMT time zone, see
DateTimeFormatter.ISO_INSTANT
.
O
- OpEd - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
- Opinion - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
- own - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
P
- PaidLevelDomain - Class in crawlercommons.domains
-
Routines to extract the PLD (paid-level domain, as per the IRLbot paper) from a hostname or URL.
- PaidLevelDomain() - Constructor for class crawlercommons.domains.PaidLevelDomain
- parseContent(String, byte[], String, String) - Method in class crawlercommons.robots.BaseRobotsParser
-
Deprecated.since 1.4 - replaced by
BaseRobotsParser.parseContent(java.lang.String,byte[],java.lang.String,java.util.Collection<java.lang.String>)
. Passing a collection of robot names gives users more control how user-agent and robot names are matched. Passing a list of names is also more efficient as it does not require to split the robot name string again and again on every robots.txt file to be parsed. - parseContent(String, byte[], String, String) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Deprecated.
- parseContent(String, byte[], String, Collection<String>) - Method in class crawlercommons.robots.BaseRobotsParser
-
Parse the robots.txt file in content, and return rules appropriate for processing paths by userAgent.
- parseContent(String, byte[], String, Collection<String>) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Parse the robots.txt file in content, and return rules appropriate for processing paths by userAgent.
- parseQueryParameters(String, int, Set<String>) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
-
Receives the URL query string and parses it into a list of name-value pairs.
- parseRSSTimestamp(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
-
Parse pubDate of RSS feeds.
- parseSiteMap(byte[], URL) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Parse a sitemap, given the content bytes and the URL.
- parseSiteMap(String, byte[], AbstractSiteMap) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Returns a processed copy of an unprocessed sitemap object, i.e.
- parseSiteMap(String, byte[], URL) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Parse a sitemap, given the MIME type, the content bytes, and the URL.
- parseSiteMap(URL) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Returns a SiteMap or SiteMapIndex given an online sitemap URL Please note that this method is a static method which goes online and fetches the sitemap then parses it This method is a convenience method for a user who has a sitemap URL and wants a "Keep it simple" way to parse it.
- PLAYER_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- PressRelease - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
- PRICES - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- processGzippedXML(URL, byte[]) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Decompress the gzipped content and process the resulting XML Sitemap.
- processText(URL, byte[]) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Process a text-based Sitemap.
- processText(URL, InputStream) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Process a text-based Sitemap.
- processXml(URL, byte[]) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Parse the given XML content.
- processXml(URL, InputSource) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Parse the given XML content.
- PUBLICATION_DATE - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
- PUBLICATION_DATE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- PUNYCODE - crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
- put(String, V) - Method in class crawlercommons.domains.SuffixTrie
-
Insert a string and an associated value into the trie.
Q
- queryParamsToRemove(Collection<String>) - Method in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
-
A collection of names of query parameters that should be removed from the URL query.
R
- RATING - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- rent - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
- REQUIRES_SUBSCRIPTION - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- reset() - Method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- reset() - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
- reset() - Method in class crawlercommons.sitemaps.sax.extension.MobileHandler
- reset() - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
- reset() - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
- resetCharacterBuffer() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- RESTRICTED_COUNTRIES - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- RESTRICTED_PLATFORMS - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- RobotRule(String, boolean) - Constructor for class crawlercommons.robots.SimpleRobotRules.RobotRule
-
A allow/disallow rule: a path prefix or pattern and whether it is allowed or disallowed.
- root - Variable in class crawlercommons.domains.SuffixTrie
- RSS - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
- RSS_2_0 - Static variable in class crawlercommons.sitemaps.Namespace
-
RSS and Atom sitemap formats do not have strict definition.
S
- Satire - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
- SD - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
- setAcceptedNamespaces(Set<String>) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- setAllowDocTypeDefinitions(boolean) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Sets if the parser allows a DTD in sitemaps or feeds.
- setAllowedCountries(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setAllowedPlatforms(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setCaption(String) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- setCategory(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setChangeFrequency(SiteMapURL.ChangeFrequency) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL's change frequency
- setChangeFrequency(String) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL's change frequency In case of a bad ChangeFrequency, the current frequency in this instance will be set to NULL
- setContentLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setCrawlDelay(long) - Method in class crawlercommons.robots.BaseRobotRules
- setDeferVisits(boolean) - Method in class crawlercommons.robots.BaseRobotRules
-
Indicate to defer visits to the server, e.g.
- setDescription(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setDuration(Integer) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setExactUserAgentMatching(boolean) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Set how the user-agent names in the robots.txt (
User-agent:
lines) are matched with the provided robot names: (with exact matching) follow the Robots Exclusion Protocol RFC 9309 and match user agent literally but case-insensitive over the full string length: Crawlers set their own name, which is called a product token, to find relevant groups. - setException(UnknownFormatException) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- setExpirationDate(ZonedDateTime) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setExtensionNamespaces(Map<String, Extension>) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- setFamilyFriendly(Boolean) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setGalleryLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setGalleryTitle(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setGenres(NewsAttributes.NewsGenre[]) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setGeoLocation(String) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- setHref(URL) - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- setKeywords(String[]) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setLanguage(String) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setLastModified(String) - Method in class crawlercommons.sitemaps.AbstractSiteMap
- setLastModified(String) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set when this URL was last modified.
- setLastModified(ZonedDateTime) - Method in class crawlercommons.sitemaps.AbstractSiteMap
- setLastModified(ZonedDateTime) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set when this URL was last modified.
- setLastModified(Date) - Method in class crawlercommons.sitemaps.AbstractSiteMap
- setLastModified(Date) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set when this URL was last modified.
- setLicense(URL) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- setLive(Boolean) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setLoc(URL) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- setMaxCrawlDelay(long) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Set the max value in milliseconds accepted for the Crawl-Delay directive.
- setMaxWarnings(int) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Set the max number of warnings about parse errors logged per robots.txt
- setName(String) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setParams(Map<String, String>) - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- setPlayerLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setPrice(Float) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- setPrices(VideoAttributes.VideoPrice[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setPriority(double) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL's priority to a value between [0.0 - 1.0] (Default Priority is used if the given priority is out of range).
- setPriority(String) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL's priority to a value between [0.0 - 1.0] (Default Priority is used if the given priority missing or is out of range).
- setProcessed(boolean) - Method in class crawlercommons.sitemaps.AbstractSiteMap
- setPublicationDate(ZonedDateTime) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setPublicationDate(ZonedDateTime) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setRating(Float) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setRequiresSubscription(Boolean) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setRestrictedCountries(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setRestrictedPlatforms(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setStockTickers(String[]) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setStrictNamespace(boolean) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- setStrictNamespace(boolean) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Sets the parser to allow any XML namespace or just the one from the specification, or any accepted namespace (see
SiteMapParser.addAcceptedNamespace(String)
). - setTags(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setThumbnailLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setTitle(String) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- setTitle(String) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setTitle(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setType(AbstractSiteMap.SitemapType) - Method in class crawlercommons.sitemaps.AbstractSiteMap
- setUploader(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setUploaderInfo(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setUrl(String) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL.
- setUrl(URL) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL.
- setURLFilter(URLFilter) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Use
URLFilter
to filter URLs, eg. - setURLFilter(Function<String, String>) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- setURLFilter(Function<String, String>) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Set URL filter function to normalize URLs found in sitemaps or filter URLs away if the function returns null.
- setValid(boolean) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Valid means that it follows the official guidelines that the siteMapURL must be under the base url
- setViewCount(Integer) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- SimpleRobotRules - Class in crawlercommons.robots
-
Result from parsing a single robots.txt file – a set of allow/disallow rules to check whether a given URL is allowed, and optionally a Crawl-delay and Sitemap URLs.
- SimpleRobotRules() - Constructor for class crawlercommons.robots.SimpleRobotRules
- SimpleRobotRules(SimpleRobotRules.RobotRulesMode) - Constructor for class crawlercommons.robots.SimpleRobotRules
- SimpleRobotRules.RobotRule - Class in crawlercommons.robots
-
Single rule that maps from a path prefix to an allow flag.
- SimpleRobotRules.RobotRulesMode - Enum in crawlercommons.robots
- SimpleRobotRulesParser - Class in crawlercommons.robots
-
Robots.txt parser following RFC 9309, supporting the Sitemap and Crawl-delay extensions.
- SimpleRobotRulesParser() - Constructor for class crawlercommons.robots.SimpleRobotRulesParser
- SimpleRobotRulesParser(long, int) - Constructor for class crawlercommons.robots.SimpleRobotRulesParser
- SiteMap - Class in crawlercommons.sitemaps
- SiteMap() - Constructor for class crawlercommons.sitemaps.SiteMap
- SiteMap(String) - Constructor for class crawlercommons.sitemaps.SiteMap
- SiteMap(String, String) - Constructor for class crawlercommons.sitemaps.SiteMap
- SiteMap(URL) - Constructor for class crawlercommons.sitemaps.SiteMap
- SiteMap(URL, Date) - Constructor for class crawlercommons.sitemaps.SiteMap
- SITEMAP - Static variable in class crawlercommons.sitemaps.Namespace
- SITEMAP_EXTENSION_NAMESPACES - Static variable in class crawlercommons.sitemaps.Namespace
- SITEMAP_LEGACY - Static variable in class crawlercommons.sitemaps.Namespace
-
Legacy schema URIs from prior sitemap protocol versions and frequent variants.
- SITEMAP_SUPPORTED_NAMESPACES - Static variable in class crawlercommons.sitemaps.Namespace
- SiteMapIndex - Class in crawlercommons.sitemaps
- SiteMapIndex() - Constructor for class crawlercommons.sitemaps.SiteMapIndex
- SiteMapIndex(URL) - Constructor for class crawlercommons.sitemaps.SiteMapIndex
- SiteMapParser - Class in crawlercommons.sitemaps
- SiteMapParser() - Constructor for class crawlercommons.sitemaps.SiteMapParser
-
SiteMapParser with strict location validation (
SiteMapParser.isStrict()
) and not allowing partially parsed content. - SiteMapParser(boolean) - Constructor for class crawlercommons.sitemaps.SiteMapParser
-
SiteMapParser with configurable location validation, not allowing partially parsed content.
- SiteMapParser(boolean, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapParser
- SiteMapTester - Class in crawlercommons.sitemaps
-
Sitemap Tool for recursively fetching all URL's from a sitemap (and all of it's children)
- SiteMapTester() - Constructor for class crawlercommons.sitemaps.SiteMapTester
- SiteMapURL - Class in crawlercommons.sitemaps
-
The SitemapUrl class represents a URL found in a Sitemap.
- SiteMapURL(String, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
- SiteMapURL(String, String, String, String, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
- SiteMapURL(URL, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
- SiteMapURL(URL, ZonedDateTime, SiteMapURL.ChangeFrequency, double, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
- SiteMapURL(URL, Date, SiteMapURL.ChangeFrequency, double, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
- SiteMapURL.ChangeFrequency - Enum in crawlercommons.sitemaps
-
Allowed change frequencies
- SkipLeadingWhiteSpaceInputStream - Class in crawlercommons.sitemaps
-
Wraps a stream and skips over leading whitespace (at beginning of file) in the wrapped stream.
- SkipLeadingWhiteSpaceInputStream(InputStream) - Constructor for class crawlercommons.sitemaps.SkipLeadingWhiteSpaceInputStream
- sortRules() - Method in class crawlercommons.robots.SimpleRobotRules
-
Sort and deduplicate robot rules.
- specialCharactersPathMatching - Static variable in class crawlercommons.robots.SimpleRobotRules
-
Special characters which require percent-encoding for path matching
- splitRobotNames(String) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Split a string listing user-agent / robot names into tokens.
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.LinksHandler
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.MobileHandler
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
- STOCK_TICKERS - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
- strict - Variable in class crawlercommons.sitemaps.SiteMapParser
-
True (by default) meaning that invalid URLs should be rejected, as the official docs allow the siteMapURLs to be only under the base url: https://www.sitemaps.org/protocol.html#location
- strictNamespace - Variable in class crawlercommons.sitemaps.SiteMapParser
-
Indicates whether the parser should work with the namespace from the specifications or any namespace.
- Strings - Class in crawlercommons.utils
-
Util functions for manipulating strings.
- Strings() - Constructor for class crawlercommons.utils.Strings
- stripAllBlank(CharSequence) - Static method in class crawlercommons.sitemaps.sax.DelegatorHandler
-
Trim all whitespace including Unicode whitespace
- SuffixTrie<V> - Class in crawlercommons.domains
- SuffixTrie() - Constructor for class crawlercommons.domains.SuffixTrie
- SuffixTrie.LookupResult<V> - Class in crawlercommons.domains
-
Wrapper for results when a string is checked for suffixes contained in the suffix trie.
- SuffixTrie.Node<V> - Class in crawlercommons.domains
T
- TAGS - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- TEXT - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
- THUMBNAIL_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- TIME_ZONE_UTC - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
- TITLE - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
- TITLE - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
- TITLE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- toString() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
- toString() - Method in class crawlercommons.robots.BaseRobotRules
-
Returns a string with the crawl delay as well as a list of sitemaps if they exist (and aren't more than 10).
- toString() - Method in class crawlercommons.robots.SimpleRobotRules
- toString() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- toString() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- toString() - Method in class crawlercommons.sitemaps.extension.MobileAttributes
- toString() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- toString() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- toString() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- toString() - Method in class crawlercommons.sitemaps.SiteMap
- toString() - Method in class crawlercommons.sitemaps.SiteMapIndex
- toString() - Method in class crawlercommons.sitemaps.SiteMapURL
U
- unescapePath(String) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
-
Remove % encoding from path segment in URL for characters which should be unescaped according to RFC3986.
- UNICODE - crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
- UnknownFormatException - Exception in crawlercommons.sitemaps
-
Exception thrown if the format of a sitemap failed to parse.
- UnknownFormatException() - Constructor for exception crawlercommons.sitemaps.UnknownFormatException
- UnknownFormatException(String) - Constructor for exception crawlercommons.sitemaps.UnknownFormatException
- UnknownFormatException(String, Throwable) - Constructor for exception crawlercommons.sitemaps.UnknownFormatException
- UNSET_CRAWL_DELAY - Static variable in class crawlercommons.robots.BaseRobotRules
- UPLOADER - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- UPLOADER_INFO - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- url - Variable in class crawlercommons.sitemaps.AbstractSiteMap
- urlEquals(URL, URL) - Static method in class crawlercommons.sitemaps.extension.ExtensionMetadata
-
Compare URLs by their string representation because calling
URL.equals(Object)
may trigger an unwanted and potentially slow DNS lookup to resolve the host part - urlFilter - Variable in class crawlercommons.sitemaps.sax.DelegatorHandler
- URLFilter - Class in crawlercommons.filters
- URLFilter() - Constructor for class crawlercommons.filters.URLFilter
- urlIsValid(String, String) - Static method in class crawlercommons.sitemaps.SiteMapParser
-
See if testUrl is under sitemapBaseUrl.
- USER_AGENT_PRODUCT_TOKEN_MATCHER - Static variable in class crawlercommons.robots.SimpleRobotRulesParser
-
Pattern to match a valid user-agent product tokens as defined in RFC 9309, section 2.2.1
- userAgentProductTokenPartialMatch(String, Collection<String>) - Method in class crawlercommons.robots.SimpleRobotRulesParser
- UserGenerated - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
V
- valueOf(String) - Static method in enum crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.robots.SimpleRobotRules.RobotRulesMode
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.AbstractSiteMap.SitemapType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.Extension
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.robots.SimpleRobotRules.RobotRulesMode
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.AbstractSiteMap.SitemapType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.extension.Extension
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
-
Returns an array containing the constants of this enum type, in the order they are declared.
- VIDEO - crawlercommons.sitemaps.extension.Extension
-
Google Video sitemaps, see https://support.google.com/webmasters/answer/80471
- VIDEO - Static variable in class crawlercommons.sitemaps.Namespace
- VideoAttributes - Class in crawlercommons.sitemaps.extension
-
Data model for Google extension to the sitemap protocol regarding images indexing, as per http://www.google.com/schemas/sitemap-video/1.1
- VideoAttributes() - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes
- VideoAttributes(URL, String, String, URL, URL) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes
- VideoAttributes.VideoPrice - Class in crawlercommons.sitemaps.extension
- VideoAttributes.VideoPriceResolution - Enum in crawlercommons.sitemaps.extension
- VideoAttributes.VideoPriceType - Enum in crawlercommons.sitemaps.extension
- VideoHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handle SAX events in the Google Video sitemap extension namespace.
- VideoHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.VideoHandler
- VideoPrice(String, Float) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- VideoPrice(String, Float, VideoAttributes.VideoPriceType) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- VideoPrice(String, Float, VideoAttributes.VideoPriceType, VideoAttributes.VideoPriceResolution) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- VIEW_COUNT - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
W
- W3C_FULLDATE_FORMATTER - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
-
DateTimeFormatter for parsing dates in ISO-8601 format
- W3C_FULLDATE_FORMATTER_UTC - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
-
DateTimeFormatter to format dates in ISO-8601 format (UTC time zone 'Z')
- W3C_SHORTDATE_FORMATTER - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
-
DateTimeFormatter for parsing short dates ('1997', '1997-07', '1997-07-16') without daytime and time zone
- walkSiteMap(AbstractSiteMap, Consumer<SiteMapURL>) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Traverse a sitemap, recursively fetching and traversing the content of any enclosed sitemap index, and performing the specified action for each sitemap URL until all URLs have been processed or the action throws an exception.
- walkSiteMap(URL, Consumer<SiteMapURL>) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Fetch a sitemap from the specified URL, recursively fetching and traversing the content of any enclosed sitemap index, and performing the specified action for each sitemap URL until all URLs have been processed or the action throws an exception.
- WEEKLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
- WILD_CARD - Static variable in class crawlercommons.domains.EffectiveTldFinder
X
- XML - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
Y
- YEARLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
_
- _mode - Variable in class crawlercommons.robots.SimpleRobotRules
- _rules - Variable in class crawlercommons.robots.SimpleRobotRules
All Classes All Packages