A B C D E F G H I K L M N O P Q R S T U V W X Y _
All Classes All Packages
All Classes All Packages
All Classes All Packages
A
- AbstractSiteMap - Class in crawlercommons.sitemaps
-
SiteMap or SiteMapIndex
- AbstractSiteMap() - Constructor for class crawlercommons.sitemaps.AbstractSiteMap
- AbstractSiteMap.SitemapType - Enum in crawlercommons.sitemaps
-
Various Sitemap types
- acceptedNamespaces - Variable in class crawlercommons.sitemaps.SiteMapParser
-
Set of namespaces (if
SiteMapParser.strictNamespace
) accepted by the parser. - ACCESS - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
-
Accessibility of the news article.
- addAcceptedNamespace(String) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Add namespace URI to set of accepted namespaces.
- addAcceptedNamespace(String[]) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Add namespace URIs to set of accepted namespaces.
- addAttribute(String, String) - Method in class crawlercommons.sitemaps.extension.PageMapDataObject
- addAttributesForExtension(Extension, ExtensionMetadata[]) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Add attributes of a specific sitemap extension
- addChild(char, V) - Method in class crawlercommons.domains.SuffixTrie.Node
- addContentSegment(Integer, URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- addDataObject(PageMapDataObject) - Method in class crawlercommons.sitemaps.extension.PageMap
- addPrice(VideoAttributes.VideoPrice) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- addRule(String, boolean) - Method in class crawlercommons.robots.SimpleRobotRules
-
Add an allow/disallow rule to the ruleset
- addSitemap(AbstractSiteMap) - Method in class crawlercommons.sitemaps.SiteMapIndex
-
Add this Sitemap to the list of Sitemaps,
- addSitemap(String) - Method in class crawlercommons.robots.BaseRobotRules
-
Add sitemap URL to rules if not a duplicate
- addSiteMapUrl(SiteMapURL) - Method in class crawlercommons.sitemaps.SiteMap
- addTag(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- ALLOW_ALL - crawlercommons.robots.SimpleRobotRules.RobotRulesMode
- ALLOW_NONE - crawlercommons.robots.SimpleRobotRules.RobotRulesMode
- ALLOW_SOME - crawlercommons.robots.SimpleRobotRules.RobotRulesMode
- ALLOWED_COUNTRIES - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- ALLOWED_PLATFORMS - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- ALWAYS - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
- appendCharacterBuffer(char[], int, int) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- appendCharacterBuffer(String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- asMap() - Method in class crawlercommons.sitemaps.extension.ExtensionMetadata
- asMap() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- asMap() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- asMap() - Method in class crawlercommons.sitemaps.extension.MobileAttributes
- asMap() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- asMap() - Method in class crawlercommons.sitemaps.extension.PageMap
- asMap() - Method in class crawlercommons.sitemaps.extension.PageMapDataObject
- asMap() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- ATOM - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
- ATOM_0_3 - Static variable in class crawlercommons.sitemaps.Namespace
- ATOM_1_0 - Static variable in class crawlercommons.sitemaps.Namespace
- attributes - Variable in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
B
- BaseRobotRules - Class in crawlercommons.robots
-
Result from parsing a single robots.txt file – a set of allow/disallow rules to check whether a given URL is allowed, and optionally a Crawl-delay and Sitemap URLs.
- BaseRobotRules() - Constructor for class crawlercommons.robots.BaseRobotRules
- BaseRobotsParser - Class in crawlercommons.robots
-
Robots.txt parser definition.
- BaseRobotsParser() - Constructor for class crawlercommons.robots.BaseRobotsParser
- BasicURLNormalizer - Class in crawlercommons.filters.basic
-
Converts URLs to a normal form.
- BasicURLNormalizer() - Constructor for class crawlercommons.filters.basic.BasicURLNormalizer
- BasicURLNormalizer(BasicURLNormalizer.Builder) - Constructor for class crawlercommons.filters.basic.BasicURLNormalizer
- BasicURLNormalizer.Builder - Class in crawlercommons.filters.basic
-
A builder class for the
BasicURLNormalizer
. - BasicURLNormalizer.IdnNormalization - Enum in crawlercommons.filters.basic
- beforeRead(int) - Method in class crawlercommons.sitemaps.SkipLeadingWhiteSpaceInputStream
- Blog - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
- build() - Method in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
-
Constructs the custom URL normalizer instance.
C
- CAPTION - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
- CATEGORY - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
- characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
- characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.extension.PageMapsHandler
- characters(char[], int, int) - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
- clearRules() - Method in class crawlercommons.robots.SimpleRobotRules
- clip - crawlercommons.sitemaps.extension.VideoAttributes.TVShow.VideoType
- commaSeparated - Static variable in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- COMMENT - Static variable in class crawlercommons.domains.EffectiveTldFinder
- compareTo(SimpleRobotRules.RobotRule) - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
- contains(String) - Method in class crawlercommons.domains.SuffixTrie
-
Checks whether trie contains a suffix string.
- CONTENT_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- ContentSegment(Integer, URL) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes.ContentSegment
- convertToDate(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
- convertToZonedDateTime(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
-
Convert the given date (given in an acceptable DateFormat), return null if the date is not in the correct format.
- crawlercommons - package crawlercommons
- CrawlerCommons - Class in crawlercommons
-
Crawler-Commons is a set of reusable Java components that implement functionality common to web crawlers: robots.txt and sitemap parsing, or URL normalization.
- CrawlerCommons() - Constructor for class crawlercommons.CrawlerCommons
- crawlercommons.domains - package crawlercommons.domains
-
Classes contained within the domains package relate to the definition of "paid-level" domains or "effective top-level domains", that is Internet domain names on level below a public suffix defined in the public suffix list.
- crawlercommons.filters - package crawlercommons.filters
-
The filters package contains code and resources for URL filtering.
- crawlercommons.filters.basic - package crawlercommons.filters.basic
-
URL normalizer performing basic normalizations applicable to
http://
andhttps://
URLs. - crawlercommons.mimetypes - package crawlercommons.mimetypes
-
Utilities for detecting MIME types relevant for in the context of crawler-commons.
- crawlercommons.robots - package crawlercommons.robots
-
The robots package contains all of the robots.txt rule inference, parsing and utilities contained within Crawler-Commons.
- crawlercommons.sitemaps - package crawlercommons.sitemaps
-
Classes focused on parsing and processing sitemaps and holding the resulting set of URLs with crawling-related metadata, such as the change frequency of a page.
- crawlercommons.sitemaps.extension - package crawlercommons.sitemaps.extension
-
Extensions to the sitemaps protocol for additional attributes and links to alternate media formats, for example image, video and news sitemaps.
- crawlercommons.sitemaps.sax - package crawlercommons.sitemaps.sax
-
SAX handlers to parse specific elements of XML sitemaps or Atom/RSS feeds.
- crawlercommons.sitemaps.sax.extension - package crawlercommons.sitemaps.sax.extension
-
SAX handlers to parse extensions of XML sitemaps.
- create(Extension) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- currentElement() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- currentElementParent() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
D
- DAILY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
- DEFAULT_MAX_CRAWL_DELAY - Static variable in class crawlercommons.robots.SimpleRobotRulesParser
-
Default max Crawl-Delay in milliseconds, see
SimpleRobotRulesParser.setMaxCrawlDelay(long)
- DEFAULT_MAX_WARNINGS - Static variable in class crawlercommons.robots.SimpleRobotRulesParser
-
Default max number of warnings logged during parse of any one robots.txt file, see
SimpleRobotRulesParser.setMaxWarnings(int)
- DEFAULT_PRIORITY - Static variable in class crawlercommons.sitemaps.SiteMapURL
- DelegatorHandler - Class in crawlercommons.sitemaps.sax
-
Provides a base SAX handler for parsing of XML documents representing sub-classes of AbstractSiteMap.
- DelegatorHandler(URL, boolean) - Constructor for class crawlercommons.sitemaps.sax.DelegatorHandler
- DelegatorHandler(LinkedList<String>, boolean) - Constructor for class crawlercommons.sitemaps.sax.DelegatorHandler
- DESCRIPTION - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- detect(byte[]) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- detect(byte[], int) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- detect(InputStream) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- DOT - Static variable in class crawlercommons.domains.EffectiveTldFinder
- DOT_REGEX - Static variable in class crawlercommons.domains.EffectiveTldFinder
- DURATION - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
E
- EffectiveTLD(String, boolean) - Constructor for class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
-
Parse one non-empty, non-comment line in the public suffix list and hold the public suffix and its properties in the created object.
- EffectiveTldFinder - Class in crawlercommons.domains
-
To determine the actual domain name of a host name or URL requires knowledge of the various domain registrars and their assignment policies.
- EffectiveTldFinder.EffectiveTLD - Class in crawlercommons.domains
-
EffectiveTLD objects hold one line of the public suffix list: the suffix (
com
,co.uk
, etc.) for IDN suffixes: both the ASCII and IDN variant (xn--p1ai
andрф
) and the properties required to parse host/domain names given in the public suffix list: whether it's a wildcard suffix (*.kawasaki.jp
), or an exception to a wildcard rule (!
- EMPTY - Static variable in class crawlercommons.sitemaps.Namespace
-
In contradiction to the protocol specification ("The Sitemap must ...
- enableExtension(Extension) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Enable a support for a sitemap extension in the parser.
- enableExtensions() - Method in class crawlercommons.sitemaps.SiteMapParser
-
Enable all supported sitemap extensions in the parser.
- endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
- endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
- endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.extension.PageMapsHandler
- endElement(String, String, String) - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
- equals(Object) - Method in class crawlercommons.robots.BaseRobotRules
- equals(Object) - Method in class crawlercommons.robots.SimpleRobotRules
- equals(Object) - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
- equals(Object) - Method in class crawlercommons.sitemaps.extension.ExtensionMetadata
- equals(Object) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- equals(Object) - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- equals(Object) - Method in class crawlercommons.sitemaps.extension.MobileAttributes
- equals(Object) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- equals(Object) - Method in class crawlercommons.sitemaps.extension.PageMap
- equals(Object) - Method in class crawlercommons.sitemaps.extension.PageMapDataObject
- equals(Object) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.ContentSegment
- equals(Object) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- equals(Object) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- equals(Object) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- equals(Object) - Method in class crawlercommons.sitemaps.SiteMapURL
- error(SAXParseException) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- escapePath(String) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
-
Convert path segment of URL from Unicode to UTF-8 and escape all characters which should be escaped according to RFC3986.
- escapePath(String, boolean[]) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
- escapePath(String, boolean[]) - Static method in class crawlercommons.robots.SimpleRobotRules
-
Encode/decode (using percent-encoding) all characters where necessary: encode Unicode/non-ASCII characters) and decode printable ASCII characters without special semantics.
- ETLD_DATA - Static variable in class crawlercommons.domains.EffectiveTldFinder
- EXCEPTION - Static variable in class crawlercommons.domains.EffectiveTldFinder
- EXPIRATION_DATE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- Extension - Enum in crawlercommons.sitemaps.extension
-
Sitemap extensions supported by the parser.
- ExtensionHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handler to be called for elements in the namespace of a sitemap extension.
- ExtensionHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- ExtensionMetadata - Class in crawlercommons.sitemaps.extension
-
Container for attributes of a
SiteMapURL
defined by a sitemap extension. - ExtensionMetadata() - Constructor for class crawlercommons.sitemaps.extension.ExtensionMetadata
- extensionNamespaces - Variable in class crawlercommons.sitemaps.sax.DelegatorHandler
- extensionNamespaces - Variable in class crawlercommons.sitemaps.SiteMapParser
-
Map of sitemap extension namespaces required to find the right extension handler.
F
- failedFetch(int) - Method in class crawlercommons.robots.BaseRobotsParser
-
The fetch of robots.txt failed, so return rules appropriate for the given HTTP status code.
- failedFetch(int) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
The fetch of robots.txt failed, so return rules appropriate for the given HTTP status code.
- FAMILY_FRIENDLY - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- fatalError(SAXParseException) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- filter(String) - Method in class crawlercommons.filters.basic.BasicURLNormalizer
- filter(String) - Method in class crawlercommons.filters.URLFilter
-
Returns a modified version of the input URL or null if the URL should be removed
- formatQueryParameters(List<BasicURLNormalizer.NameValuePair>) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
-
Formats a list of query parameter name-value pairs into a query parameter string.
- full - crawlercommons.sitemaps.extension.VideoAttributes.TVShow.VideoType
G
- GALLERY_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- GALLERY_TITLE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- GENRES - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
- GEO_LOCATION - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
- get(String) - Method in class crawlercommons.domains.SuffixTrie
-
Get value associated with suffix string in trie.
- getAccess() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getAllowedCountries() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getAllowedPlatforms() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getAndResetCharacterBuffer() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- getAssignedDomain(String) - Static method in class crawlercommons.domains.EffectiveTldFinder
-
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").
- getAssignedDomain(String, boolean) - Static method in class crawlercommons.domains.EffectiveTldFinder
-
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").
- getAssignedDomain(String, boolean, boolean) - Static method in class crawlercommons.domains.EffectiveTldFinder
-
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name.
- getAttribute(String) - Method in class crawlercommons.sitemaps.extension.PageMapDataObject
- getAttributes() - Method in class crawlercommons.sitemaps.extension.PageMapDataObject
- getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
- getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.MobileHandler
- getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
- getAttributes() - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
- getAttributes() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Get attributes of sitemap extensions (news, images, videos, etc.)
- getAttributesForExtension(Extension) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Get attributes of a specific sitemap extension
- getBaseUrl() - Method in class crawlercommons.sitemaps.SiteMap
- getCaption() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- getCategory() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getChangeFrequency() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Return the URL's change frequency
- getChild(char) - Method in class crawlercommons.domains.SuffixTrie.Node
- getContentLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getContentSegmentLocs() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getCrawlDelay() - Method in class crawlercommons.robots.BaseRobotRules
-
Get Crawl-delay (in milliseconds)
- getCurrency() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- getDateValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- getDescription() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getDomain() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
- getDuration() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.ContentSegment
- getDuration() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getEffectiveTLD(String) - Static method in class crawlercommons.domains.EffectiveTldFinder
-
Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.
- getEffectiveTLD(String, boolean) - Static method in class crawlercommons.domains.EffectiveTldFinder
-
Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.
- getEffectiveTLDs() - Static method in class crawlercommons.domains.EffectiveTldFinder
- getEpisodeNumber() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- getEpisodeTitle() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- getException() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- getExpirationDate() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getExpirationDateTime() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getFamilyFriendly() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getFloatValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- getGalleryLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getGalleryTitle() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getGenres() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getGeoLocation() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- getHref() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- getId() - Method in class crawlercommons.sitemaps.extension.PageMapDataObject
- getInstance() - Static method in class crawlercommons.domains.EffectiveTldFinder
-
Get singleton instance of EffectiveTldFinder with default configuration.
- getIntegerValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- getKeywords() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getLanguage() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getLastModified() - Method in class crawlercommons.sitemaps.AbstractSiteMap
- getLastModified() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Return when this URL was last modified.
- getLicense() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- getLive() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getLoc() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- getLocation() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.ContentSegment
- getLongestSuffix(String) - Method in class crawlercommons.domains.SuffixTrie
-
Match the longest suffix of a string contained in trie.
- getMaxCrawlDelay() - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Get configured max crawl delay.
- getMaxWarnings() - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Get max number of logged warnings per robots.txt
- getName() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getNameVariants() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
-
Generate name variants caused by Internationalized Domain Names: every IDN part of a eTLD can be replaced by its punycoded ASCII variant.
- getNumWarnings() - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Get the number of warnings due to invalid rules/lines in the latest processed robots.txt file (see
SimpleRobotRulesParser.parseContent(String, byte[], String, String)
. - getPageMapDataObjects() - Method in class crawlercommons.sitemaps.extension.PageMap
- getParams() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- getPlayerLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getPLD(String) - Static method in class crawlercommons.domains.PaidLevelDomain
-
Extract the PLD (paid-level domain) from the hostname.
- getPLD(URL) - Static method in class crawlercommons.domains.PaidLevelDomain
-
Extract the PLD (paid-level domain) from the URL.
- getPrefix() - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
- getPremierDate() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- getPrice() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- getPrices() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getPriority() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Return this URL's priority (a value between [0.0 - 1.0]).
- getPublicationDate() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getPublicationDate() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getPublicationDateTime() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getPublicationDateTime() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getRating() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getRequiresSubscription() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getResolution() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- getRestrictedCountries() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getRestrictedPlatforms() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getRobotRules() - Method in class crawlercommons.robots.SimpleRobotRules
- getSeasonNumber() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- getShowTitle() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- getSitemap(URL) - Method in class crawlercommons.sitemaps.SiteMapIndex
-
Returns the Sitemap that has the given URL.
- getSiteMap() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- getSitemaps() - Method in class crawlercommons.robots.BaseRobotRules
-
Get URLs of sitemap links found in robots.txt
- getSitemaps() - Method in class crawlercommons.sitemaps.SiteMapIndex
- getSitemaps(boolean) - Method in class crawlercommons.sitemaps.SiteMapIndex
- getSiteMapUrls() - Method in class crawlercommons.sitemaps.SiteMap
- getStockTickers() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getSuffix() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
- getSuffixes(String) - Method in class crawlercommons.domains.SuffixTrie
-
Match all suffixes of a string contained in trie.
- getTags() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getThumbnailLoc() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getTitle() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- getTitle() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- getTitle() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getTVShow() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getType() - Method in class crawlercommons.sitemaps.AbstractSiteMap
- getType() - Method in class crawlercommons.sitemaps.extension.PageMapDataObject
- getType() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- getUnicodeDomain() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
- getUploader() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getUploaderInfo() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getUrl() - Method in class crawlercommons.sitemaps.AbstractSiteMap
- getUrl() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- getUrl() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Return the URL.
- getURLValue(String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- getVersion() - Static method in class crawlercommons.CrawlerCommons
- getVideoType() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- getViewCount() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- getYesNoBooleanValue(String, String) - Static method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
H
- hasAttribute(String) - Method in class crawlercommons.sitemaps.extension.PageMapDataObject
- hashCode() - Method in class crawlercommons.robots.BaseRobotRules
- hashCode() - Method in class crawlercommons.robots.SimpleRobotRules
- hashCode() - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
- hashCode() - Method in class crawlercommons.sitemaps.extension.PageMap
- hashCode() - Method in class crawlercommons.sitemaps.SiteMapURL
- hasUnprocessedSitemap() - Method in class crawlercommons.sitemaps.SiteMapIndex
- HD - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
- HOST - crawlercommons.sitemaps.SiteMapCrossSubmitValidator.CrossSubmitValidationLevel
-
Host name resp. full domain name
- HOURLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
- HREF - Static variable in class crawlercommons.sitemaps.extension.LinkAttributes
I
- ICANN_DOMAIN - crawlercommons.sitemaps.SiteMapCrossSubmitValidator.CrossSubmitValidationLevel
-
Domain name below a suffix in the ICANN section of the public suffix list.
- idnNormalization - Variable in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
- idnNormalization(BasicURLNormalizer.IdnNormalization) - Method in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
-
Configures whether internationalized domain names (IDNs) should be converted to ASCII/Punycode or Unicode.
- IMAGE - crawlercommons.sitemaps.extension.Extension
-
Google Image sitemaps, see https://support.google.com/webmasters/answer/178636
- IMAGE - Static variable in class crawlercommons.sitemaps.Namespace
- ImageAttributes - Class in crawlercommons.sitemaps.extension
-
Data model for Google extension to the sitemap protocol regarding images indexing, as per http://www.google.com/schemas/sitemap-image/1.1
- ImageAttributes() - Constructor for class crawlercommons.sitemaps.extension.ImageAttributes
- ImageAttributes(URL) - Constructor for class crawlercommons.sitemaps.extension.ImageAttributes
- ImageHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handle SAX events in the Google Image sitemap extension namespace.
- ImageHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.ImageHandler
- INDEX - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
- initialize(InputStream) - Method in class crawlercommons.domains.EffectiveTldFinder
-
(Re)initialize EffectiveTldFinder with custom public suffix list.
- interview - crawlercommons.sitemaps.extension.VideoAttributes.TVShow.VideoType
- IS_LIVE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- isAcceptedNamespace(String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- isAllBlank(CharSequence) - Static method in class crawlercommons.sitemaps.sax.DelegatorHandler
-
Return true if character sequence contains only white space including Unicode whitespace, cf.
- isAllow() - Method in class crawlercommons.robots.SimpleRobotRules.RobotRule
- isAllowAll() - Method in class crawlercommons.robots.BaseRobotRules
- isAllowAll() - Method in class crawlercommons.robots.SimpleRobotRules
-
Is our ruleset set up to allow all access?
- isAllowed(String) - Method in class crawlercommons.robots.BaseRobotRules
- isAllowed(String) - Method in class crawlercommons.robots.SimpleRobotRules
-
Check whether a URL is allowed to be fetched according to the robots rules.
- isAllowed(URL) - Method in class crawlercommons.robots.BaseRobotRules
- isAllowed(URL) - Method in class crawlercommons.robots.SimpleRobotRules
-
Check whether a URL is allowed to be fetched according to the robots rules.
- isAllowNone() - Method in class crawlercommons.robots.BaseRobotRules
- isAllowNone() - Method in class crawlercommons.robots.SimpleRobotRules
-
Is our ruleset set up to disallow all access?
- isConfigured() - Method in class crawlercommons.domains.EffectiveTldFinder
- isDeferVisits() - Method in class crawlercommons.robots.BaseRobotRules
- isExactUserAgentMatching() - Method in class crawlercommons.robots.SimpleRobotRulesParser
- isException() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
- isExtensionNamespace(String) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- isGzip(String) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- isIndex() - Method in class crawlercommons.sitemaps.AbstractSiteMap
- isIndex() - Method in class crawlercommons.sitemaps.SiteMap
- isIndex() - Method in class crawlercommons.sitemaps.SiteMapIndex
- isProcessed() - Method in class crawlercommons.sitemaps.AbstractSiteMap
- isStrict() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- isStrict() - Method in class crawlercommons.sitemaps.SiteMapParser
- isStrictNamespace() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- isStrictNamespace() - Method in class crawlercommons.sitemaps.SiteMapParser
- isSupported(String) - Static method in class crawlercommons.sitemaps.Namespace
- isText(String) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- isValid() - Method in class crawlercommons.sitemaps.extension.ExtensionMetadata
-
Validate extension metadata.
- isValid() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- isValid() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- isValid() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- isValid() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- isValid() - Method in class crawlercommons.sitemaps.SiteMapURL
-
Is the siteMapURL under the base url ?
- isValidUserAgentToObey(String) - Static method in class crawlercommons.robots.SimpleRobotRulesParser
-
Validate a user-agent product token as defined in RFC 9309, section 2.2.1
- isWhitespace(char) - Static method in class crawlercommons.sitemaps.sax.DelegatorHandler
-
Check whether character is any Unicode whitespace, including the space characters not covered by
Character.isWhitespace(char)
- isWild() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
-
Deprecated.since 1.6 - replaced by
EffectiveTldFinder.EffectiveTLD.isWildcard()
- isWildcard() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
- isXml(String) - Method in class crawlercommons.mimetypes.MimeTypeDetector
K
- KEYWORDS - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
L
- LANGUAGE - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
-
Language of the news publication in which the article appears.
- LICENSE - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
- LinkAttributes - Class in crawlercommons.sitemaps.extension
-
Data model for Google extension to the sitemap protocol regarding alternate links indexing.
- LinkAttributes() - Constructor for class crawlercommons.sitemaps.extension.LinkAttributes
- LinkAttributes(URL) - Constructor for class crawlercommons.sitemaps.extension.LinkAttributes
- LINKS - crawlercommons.sitemaps.extension.Extension
-
Usage of
<xhtml:links>
in sitemaps to include localized page versions/variants, see https://support.google.com/webmasters/answer/189077 - LINKS - Static variable in class crawlercommons.sitemaps.Namespace
- LinksHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handle SAX events in the Google Image sitemap extension namespace.
- LinksHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.LinksHandler
- LOC - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
- LOG - Static variable in class crawlercommons.filters.basic.BasicURLNormalizer
- LOG - Static variable in class crawlercommons.sitemaps.SiteMapCrossSubmitValidator
- LOG - Static variable in class crawlercommons.sitemaps.SiteMapParser
- LookupResult(int, V) - Constructor for class crawlercommons.domains.SuffixTrie.LookupResult
M
- main(String[]) - Static method in class crawlercommons.domains.EffectiveTldFinder
- main(String[]) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
- main(String[]) - Static method in class crawlercommons.robots.SimpleRobotRulesParser
- main(String[]) - Static method in class crawlercommons.sitemaps.SiteMapTester
- MAX_BYTES_ALLOWED - Static variable in class crawlercommons.sitemaps.SiteMapParser
-
Sitemaps (including sitemap index files) "must be no larger than 50MB (52,428,800 bytes)" as specified in the Sitemaps XML format (before Nov. 2016 the limit has been 10MB).
- MAX_DOMAIN_LENGTH_PART - Static variable in class crawlercommons.domains.EffectiveTldFinder
-
Max. length in ASCII characters of a dot-separated segment in host names (applies to domain names as well), cf.
- MimeTypeDetector - Class in crawlercommons.mimetypes
-
Light-weight content type detector, supporting a restricted set of MIME types relevant to parsing sitemaps.
- MimeTypeDetector() - Constructor for class crawlercommons.mimetypes.MimeTypeDetector
- MOBILE - crawlercommons.sitemaps.extension.Extension
-
Mobile sitemaps just contain an empty "mobile" tag to identify a URL as having mobile content, cf.
- MOBILE - Static variable in class crawlercommons.sitemaps.Namespace
- MobileAttributes - Class in crawlercommons.sitemaps.extension
-
Google mobile sitemap attributes, see http://www.google.de/schemas/sitemap-mobile/1.0/ and https://www.google.com/schemas/sitemap-mobile/1.0/sitemap-mobile.xsd: Mobile sitemaps just contain an empty "mobile" tag to identify a URL as having mobile content.
- MobileAttributes() - Constructor for class crawlercommons.sitemaps.extension.MobileAttributes
- MobileHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handle SAX events in the Google Mobile sitemap extension namespace.
- MobileHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.MobileHandler
- MONTHLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
N
- NAME - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
-
Name of the news publication in which the article appears.
- Namespace - Class in crawlercommons.sitemaps
-
supported sitemap formats: https://www.sitemaps.org/protocol.html#otherformats
- Namespace() - Constructor for class crawlercommons.sitemaps.Namespace
- NEVER - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
- newBuilder() - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
-
Create a new builder object for creating a customized
BasicURLNormalizer
object. - news - crawlercommons.sitemaps.extension.VideoAttributes.TVShow.VideoType
- NEWS - crawlercommons.sitemaps.extension.Extension
-
Google News sitemaps, see https://support.google.com/news/publisher-center/answer/74288
- NEWS - Static variable in class crawlercommons.sitemaps.Namespace
- NewsAttributes - Class in crawlercommons.sitemaps.extension
-
Data model for Google's extension to the sitemap protocol regarding news indexing, as per http ://www.google.com/schemas/sitemap-news/0.9.
- NewsAttributes() - Constructor for class crawlercommons.sitemaps.extension.NewsAttributes
- NewsAttributes(String, String, ZonedDateTime, String) - Constructor for class crawlercommons.sitemaps.extension.NewsAttributes
- NewsAttributes.AccessOption - Enum in crawlercommons.sitemaps.extension
- NewsAttributes.NewsGenre - Enum in crawlercommons.sitemaps.extension
- NewsHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handle SAX events in the Google News sitemap extension namespace.
- NewsHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.NewsHandler
- nextUnprocessedSitemap() - Method in class crawlercommons.sitemaps.SiteMapIndex
- Node() - Constructor for class crawlercommons.domains.SuffixTrie.Node
- NONE - crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
- normalize(String, byte[]) - Method in class crawlercommons.mimetypes.MimeTypeDetector
- normalizeRSSTimestamp(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
-
Converts pubDate of RSS to the ISO-8601 instant format, e.g., '2017-01-05T12:34:54Z' in UTC / GMT time zone, see
DateTimeFormatter.ISO_INSTANT
.
O
- OpEd - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
- Opinion - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
- other - crawlercommons.sitemaps.extension.VideoAttributes.TVShow.VideoType
- own - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
P
- PageMap - Class in crawlercommons.sitemaps.extension
-
Data model for the PageMaps extension to the sitemap protocol used for Google's Programmable Search Engine.
- PageMap() - Constructor for class crawlercommons.sitemaps.extension.PageMap
- PageMapDataObject - Class in crawlercommons.sitemaps.extension
- PageMapDataObject(String, String) - Constructor for class crawlercommons.sitemaps.extension.PageMapDataObject
- PAGEMAPS - crawlercommons.sitemaps.extension.Extension
-
PageMaps is a structured data format that Google created to enable website creators to embed data and notes in their webpages., cf.
- PAGEMAPS - Static variable in class crawlercommons.sitemaps.Namespace
- PageMapsHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handle SAX events in the Google's Programmable Search Engine PageMaps extension namespace.
- PageMapsHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.PageMapsHandler
- PaidLevelDomain - Class in crawlercommons.domains
-
Routines to extract the PLD (paid-level domain, as per the IRLbot paper) from a hostname or URL.
- PaidLevelDomain() - Constructor for class crawlercommons.domains.PaidLevelDomain
- parseContent(String, byte[], String, String) - Method in class crawlercommons.robots.BaseRobotsParser
-
Deprecated.since 1.4 - replaced by
BaseRobotsParser.parseContent(java.lang.String,byte[],java.lang.String,java.util.Collection<java.lang.String>)
. Passing a collection of robot names gives users more control how user-agent and robot names are matched. Passing a list of names is also more efficient as it does not require to split the robot name string again and again on every robots.txt file to be parsed. - parseContent(String, byte[], String, String) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Deprecated.
- parseContent(String, byte[], String, Collection<String>) - Method in class crawlercommons.robots.BaseRobotsParser
-
Parse the robots.txt file in content, and return rules appropriate for processing paths by userAgent.
- parseContent(String, byte[], String, Collection<String>) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Parse the robots.txt file in content, and return rules appropriate for processing paths by userAgent.
- parseQueryParameters(String, int, Set<String>) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
-
Receives the URL query string and parses it into a list of name-value pairs.
- parseRSSTimestamp(String) - Static method in class crawlercommons.sitemaps.AbstractSiteMap
-
Parse pubDate of RSS feeds.
- parseSiteMap(byte[], URL) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Parse a sitemap, given the content bytes and the URL.
- parseSiteMap(String, byte[], AbstractSiteMap) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Returns a processed copy of an unprocessed sitemap object, i.e. transfer the value of getLastModified().
- parseSiteMap(String, byte[], URL) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Parse a sitemap, given the MIME type, the content bytes, and the URL.
- parseSiteMap(URL) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Returns a SiteMap or SiteMapIndex given an online sitemap URL Please note that this method is a static method which goes online and fetches the sitemap then parses it This method is a convenience method for a user who has a sitemap URL and wants a "Keep it simple" way to parse it.
- PLAYER_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- PressRelease - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
- preview - crawlercommons.sitemaps.extension.VideoAttributes.TVShow.VideoType
- PRICES - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- PRIVATE_DOMAIN - crawlercommons.sitemaps.SiteMapCrossSubmitValidator.CrossSubmitValidationLevel
-
Domain name below a public suffix, cf.
- processGzippedXML(URL, byte[]) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Decompress the gzipped content and process the resulting XML Sitemap.
- processText(URL, byte[]) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Process a text-based Sitemap.
- processText(URL, InputStream) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Process a text-based Sitemap.
- processXml(URL, byte[]) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Parse the given XML content.
- processXml(URL, InputSource) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Parse the given XML content.
- PUBLICATION_DATE - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
- PUBLICATION_DATE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- PUNYCODE - crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
- put(String, V) - Method in class crawlercommons.domains.SuffixTrie
-
Insert a string and an associated value into the trie.
Q
- queryParamsToRemove(Collection<String>) - Method in class crawlercommons.filters.basic.BasicURLNormalizer.Builder
-
A collection of names of query parameters that should be removed from the URL query.
R
- RATING - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- Registration - crawlercommons.sitemaps.extension.NewsAttributes.AccessOption
- rent - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
- REQUIRES_SUBSCRIPTION - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- reset() - Method in class crawlercommons.sitemaps.sax.extension.ExtensionHandler
- reset() - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
- reset() - Method in class crawlercommons.sitemaps.sax.extension.MobileHandler
- reset() - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
- reset() - Method in class crawlercommons.sitemaps.sax.extension.PageMapsHandler
- reset() - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
- resetCharacterBuffer() - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- RESTRICTED_COUNTRIES - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- RESTRICTED_PLATFORMS - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- RobotRule(String, boolean) - Constructor for class crawlercommons.robots.SimpleRobotRules.RobotRule
-
A allow/disallow rule: a path prefix or pattern and whether it is allowed or disallowed.
- root - Variable in class crawlercommons.domains.SuffixTrie
- RSS - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
- RSS_2_0 - Static variable in class crawlercommons.sitemaps.Namespace
-
RSS and Atom sitemap formats do not have strict definition.
S
- sanitizeRobotNames(Collection<String>) - Static method in class crawlercommons.robots.SimpleRobotRulesParser
-
Sanitize user-agent names for exact user-agent matching according to RFC 9309 (see
SimpleRobotRulesParser.setExactUserAgentMatching(boolean)
) to be used inSimpleRobotRulesParser.parseContent(String, byte[], String, Collection)
. - Satire - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
- SD - crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
- setAcceptedNamespaces(Set<String>) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- setAccess(String) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setAllowDocTypeDefinitions(boolean) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Sets if the parser allows a DTD in sitemaps or feeds.
- setAllowedCountries(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setAllowedPlatforms(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setCaption(String) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- setCategory(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setChangeFrequency(SiteMapURL.ChangeFrequency) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL's change frequency
- setChangeFrequency(String) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL's change frequency In case of a bad ChangeFrequency, the current frequency in this instance will be set to NULL
- setContentLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setContentSegmentLocs(VideoAttributes.ContentSegment[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setCrawlDelay(long) - Method in class crawlercommons.robots.BaseRobotRules
- setDeferVisits(boolean) - Method in class crawlercommons.robots.BaseRobotRules
-
Indicate to defer visits to the server, e.g. to wait until the robots.txt becomes available.
- setDescription(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setDuration(Integer) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setEpisodeNumber(Integer) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- setEpisodeTitle(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- setExactUserAgentMatching(boolean) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Set how the user-agent names in the robots.txt (
User-agent:
lines) are matched with the provided robot names: (with exact matching) follow the Robots Exclusion Protocol RFC 9309 and match user agent literally but case-insensitive over the full string length: Crawlers set their own name, which is called a product token, to find relevant groups. - setException(UnknownFormatException) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- setExpirationDate(ZonedDateTime) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setExtensionNamespaces(Map<String, Extension>) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- setFamilyFriendly(Boolean) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setGalleryLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setGalleryTitle(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setGenres(NewsAttributes.NewsGenre[]) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setGeoLocation(String) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- setHref(URL) - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- setKeywords(String[]) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setLanguage(String) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setLastModified(String) - Method in class crawlercommons.sitemaps.AbstractSiteMap
- setLastModified(String) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set when this URL was last modified.
- setLastModified(ZonedDateTime) - Method in class crawlercommons.sitemaps.AbstractSiteMap
- setLastModified(ZonedDateTime) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set when this URL was last modified.
- setLastModified(Date) - Method in class crawlercommons.sitemaps.AbstractSiteMap
- setLastModified(Date) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set when this URL was last modified.
- setLicense(URL) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- setLive(Boolean) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setLoc(URL) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- setMaxCrawlDelay(long) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Set the max value in milliseconds accepted for the Crawl-Delay directive.
- setMaxWarnings(int) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Set the max number of warnings about parse errors logged per robots.txt
- setName(String) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setParams(Map<String, String>) - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- setPlayerLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setPremierDate(ZonedDateTime) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- setPrice(Float) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- setPrices(VideoAttributes.VideoPrice[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setPriority(double) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL's priority to a value between [0.0 - 1.0] (Default Priority is used if the given priority is out of range).
- setPriority(String) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL's priority to a value between [0.0 - 1.0] (Default Priority is used if the given priority missing or is out of range).
- setProcessed(boolean) - Method in class crawlercommons.sitemaps.AbstractSiteMap
- setPublicationDate(ZonedDateTime) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setPublicationDate(ZonedDateTime) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setRating(Float) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setRequiresSubscription(Boolean) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setRestrictedCountries(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setRestrictedPlatforms(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setSeasonNumber(Integer) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- setShowTitle(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- setStockTickers(String[]) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setStrictNamespace(boolean) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- setStrictNamespace(boolean) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Sets the parser to allow any XML namespace or just the one from the specification, or any accepted namespace (see
SiteMapParser.addAcceptedNamespace(String)
). - setTags(String[]) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setThumbnailLoc(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setTitle(String) - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- setTitle(String) - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- setTitle(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setTVShow(VideoAttributes.TVShow) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setType(AbstractSiteMap.SitemapType) - Method in class crawlercommons.sitemaps.AbstractSiteMap
- setUploader(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setUploaderInfo(URL) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- setUrl(String) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL.
- setUrl(URL) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Set the URL.
- setURLFilter(URLFilter) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Use
URLFilter
to filter URLs, eg. to configure that URLs found in sitemaps are normalized byBasicURLNormalizer
: - setURLFilter(Function<String, String>) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- setURLFilter(Function<String, String>) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Set URL filter function to normalize URLs found in sitemaps or filter URLs away if the function returns null.
- setValid(boolean) - Method in class crawlercommons.sitemaps.SiteMapURL
-
Valid means that it follows the official guidelines that the siteMapURL must be under the base url
- setVideoType(VideoAttributes.TVShow.VideoType) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- setVideoType(String) - Method in class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
- setViewCount(Integer) - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- SimpleRobotRules - Class in crawlercommons.robots
-
Result from parsing a single robots.txt file – a set of allow/disallow rules to check whether a given URL is allowed, and optionally a Crawl-delay and Sitemap URLs.
- SimpleRobotRules() - Constructor for class crawlercommons.robots.SimpleRobotRules
- SimpleRobotRules(SimpleRobotRules.RobotRulesMode) - Constructor for class crawlercommons.robots.SimpleRobotRules
- SimpleRobotRules.RobotRule - Class in crawlercommons.robots
-
Single rule that maps from a path prefix to an allow flag.
- SimpleRobotRules.RobotRulesMode - Enum in crawlercommons.robots
- SimpleRobotRulesParser - Class in crawlercommons.robots
-
Robots.txt parser following RFC 9309, supporting the Sitemap and Crawl-delay extensions.
- SimpleRobotRulesParser() - Constructor for class crawlercommons.robots.SimpleRobotRulesParser
- SimpleRobotRulesParser(long, int) - Constructor for class crawlercommons.robots.SimpleRobotRulesParser
- SiteMap - Class in crawlercommons.sitemaps
- SiteMap() - Constructor for class crawlercommons.sitemaps.SiteMap
- SiteMap(String) - Constructor for class crawlercommons.sitemaps.SiteMap
- SiteMap(String, String) - Constructor for class crawlercommons.sitemaps.SiteMap
- SiteMap(URL) - Constructor for class crawlercommons.sitemaps.SiteMap
- SiteMap(URL, Date) - Constructor for class crawlercommons.sitemaps.SiteMap
- SITEMAP - Static variable in class crawlercommons.sitemaps.Namespace
- SITEMAP_EXTENSION_NAMESPACES - Static variable in class crawlercommons.sitemaps.Namespace
- SITEMAP_LEGACY - Static variable in class crawlercommons.sitemaps.Namespace
-
Legacy schema URIs from prior sitemap protocol versions and frequent variants.
- SITEMAP_SUPPORTED_NAMESPACES - Static variable in class crawlercommons.sitemaps.Namespace
- SiteMapCrossSubmitValidator - Class in crawlercommons.sitemaps
-
Validator for sitemap cross submits.
- SiteMapCrossSubmitValidator.CrossSubmitValidationLevel - Enum in crawlercommons.sitemaps
- SiteMapIndex - Class in crawlercommons.sitemaps
- SiteMapIndex() - Constructor for class crawlercommons.sitemaps.SiteMapIndex
- SiteMapIndex(URL) - Constructor for class crawlercommons.sitemaps.SiteMapIndex
- SiteMapParser - Class in crawlercommons.sitemaps
- SiteMapParser() - Constructor for class crawlercommons.sitemaps.SiteMapParser
-
SiteMapParser with strict location validation (
SiteMapParser.isStrict()
) and not allowing partially parsed content. - SiteMapParser(boolean) - Constructor for class crawlercommons.sitemaps.SiteMapParser
-
SiteMapParser with configurable location validation, not allowing partially parsed content.
- SiteMapParser(boolean, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapParser
- SiteMapTester - Class in crawlercommons.sitemaps
-
Sitemap Tool for recursively fetching all URL's from a sitemap (and all of it's children)
- SiteMapTester() - Constructor for class crawlercommons.sitemaps.SiteMapTester
- SiteMapURL - Class in crawlercommons.sitemaps
-
The SitemapUrl class represents a URL found in a Sitemap.
- SiteMapURL(String, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
- SiteMapURL(String, String, String, String, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
- SiteMapURL(URL, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
- SiteMapURL(URL, ZonedDateTime, SiteMapURL.ChangeFrequency, double, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
- SiteMapURL(URL, Date, SiteMapURL.ChangeFrequency, double, boolean) - Constructor for class crawlercommons.sitemaps.SiteMapURL
- SiteMapURL.ChangeFrequency - Enum in crawlercommons.sitemaps
-
Allowed change frequencies
- SkipLeadingWhiteSpaceInputStream - Class in crawlercommons.sitemaps
-
Wraps a stream and skips over leading whitespace (at beginning of file) in the wrapped stream.
- SkipLeadingWhiteSpaceInputStream(InputStream) - Constructor for class crawlercommons.sitemaps.SkipLeadingWhiteSpaceInputStream
- sortRules() - Method in class crawlercommons.robots.SimpleRobotRules
-
Sort and deduplicate robot rules.
- specialCharactersPathMatching - Static variable in class crawlercommons.robots.SimpleRobotRules
-
Special characters which require percent-encoding for path matching
- splitRobotNames(String) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Split a string listing user-agent / robot names into tokens.
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.DelegatorHandler
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.ImageHandler
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.LinksHandler
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.MobileHandler
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.NewsHandler
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.PageMapsHandler
- startElement(String, String, String, Attributes) - Method in class crawlercommons.sitemaps.sax.extension.VideoHandler
- STOCK_TICKERS - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
- strict - Variable in class crawlercommons.sitemaps.SiteMapParser
-
True (by default) meaning that invalid URLs should be rejected, as the official docs allow the siteMapURLs to be only under the base URL: https://www.sitemaps.org/protocol.html#location
- strictNamespace - Variable in class crawlercommons.sitemaps.SiteMapParser
-
Indicates whether the parser should work with the namespace from the specifications or any namespace.
- stripAllBlank(CharSequence) - Static method in class crawlercommons.sitemaps.sax.DelegatorHandler
-
Trim all whitespace including Unicode whitespace
- Subscription - crawlercommons.sitemaps.extension.NewsAttributes.AccessOption
- SuffixTrie<V> - Class in crawlercommons.domains
- SuffixTrie() - Constructor for class crawlercommons.domains.SuffixTrie
- SuffixTrie.LookupResult<V> - Class in crawlercommons.domains
-
Wrapper for results when a string is checked for suffixes contained in the suffix trie.
- SuffixTrie.Node<V> - Class in crawlercommons.domains
T
- TAGS - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- TEXT - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
- THUMBNAIL_LOC - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- TIME_ZONE_UTC - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
- TITLE - Static variable in class crawlercommons.sitemaps.extension.ImageAttributes
- TITLE - Static variable in class crawlercommons.sitemaps.extension.NewsAttributes
-
Title of the news article.
- TITLE - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- toString() - Method in class crawlercommons.domains.EffectiveTldFinder.EffectiveTLD
- toString() - Method in class crawlercommons.robots.BaseRobotRules
-
Returns a string with the crawl delay as well as a list of sitemaps if they exist (and aren't more than 10).
- toString() - Method in class crawlercommons.robots.SimpleRobotRules
- toString() - Method in class crawlercommons.sitemaps.extension.ImageAttributes
- toString() - Method in class crawlercommons.sitemaps.extension.LinkAttributes
- toString() - Method in class crawlercommons.sitemaps.extension.MobileAttributes
- toString() - Method in class crawlercommons.sitemaps.extension.NewsAttributes
- toString() - Method in class crawlercommons.sitemaps.extension.PageMap
- toString() - Method in class crawlercommons.sitemaps.extension.PageMapDataObject
- toString() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.ContentSegment
- toString() - Method in class crawlercommons.sitemaps.extension.VideoAttributes
- toString() - Method in class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- toString() - Method in class crawlercommons.sitemaps.SiteMap
- toString() - Method in class crawlercommons.sitemaps.SiteMapIndex
- toString() - Method in class crawlercommons.sitemaps.SiteMapURL
- TVShow() - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes.TVShow
U
- unescapePath(String) - Static method in class crawlercommons.filters.basic.BasicURLNormalizer
-
Remove % encoding from path segment in URL for characters which should be unescaped according to RFC3986.
- UNICODE - crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
- UnknownFormatException - Exception in crawlercommons.sitemaps
-
Exception thrown if the format of a sitemap failed to parse.
- UnknownFormatException() - Constructor for exception crawlercommons.sitemaps.UnknownFormatException
- UnknownFormatException(String) - Constructor for exception crawlercommons.sitemaps.UnknownFormatException
- UnknownFormatException(String, Throwable) - Constructor for exception crawlercommons.sitemaps.UnknownFormatException
- UNSET_CRAWL_DELAY - Static variable in class crawlercommons.robots.BaseRobotRules
- UPLOADER - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- UPLOADER_INFO - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
- url - Variable in class crawlercommons.sitemaps.AbstractSiteMap
- urlEquals(URL, URL) - Static method in class crawlercommons.sitemaps.extension.ExtensionMetadata
-
Compare URLs by their string representation because calling
URL.equals(Object)
may trigger an unwanted and potentially slow DNS lookup to resolve the host part - urlFilter - Variable in class crawlercommons.sitemaps.sax.DelegatorHandler
- URLFilter - Class in crawlercommons.filters
- URLFilter() - Constructor for class crawlercommons.filters.URLFilter
- urlIsValid(String, String) - Static method in class crawlercommons.sitemaps.SiteMapParser
-
Verify whether the
testUrl
is under thesitemapBaseUrl
. - USER_AGENT_PRODUCT_TOKEN_MATCHER - Static variable in class crawlercommons.robots.SimpleRobotRulesParser
-
Pattern to match a valid user-agent product tokens as defined in RFC 9309, section 2.2.1
- userAgentProductTokenPartialMatch(String, Collection<String>) - Method in class crawlercommons.robots.SimpleRobotRulesParser
-
Check whether user-agent line starts with a valid user-agent product token, but continues with additional characters to be ignored e.g.
- UserGenerated - crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
V
- validate(URL, Collection<String>, SiteMapCrossSubmitValidator.CrossSubmitValidationLevel) - Static method in class crawlercommons.sitemaps.SiteMapCrossSubmitValidator
-
Validate a single URL whether its host, ICANN or private domain is part of a list of domain names.
- validateSiteMapURLs(AbstractSiteMap, Collection<String>, SiteMapCrossSubmitValidator.CrossSubmitValidationLevel) - Static method in class crawlercommons.sitemaps.SiteMapCrossSubmitValidator
-
Validation of a sitemap or recursive validation of a sitemap index.
- validateSiteMapURLs(SiteMap) - Static method in class crawlercommons.sitemaps.SiteMapCrossSubmitValidator
-
Validate the URLs submitted in a sitemap whether they are valid, that is below the same URL prefix as the location of the sitemap.
- validateSiteMapURLs(SiteMap, String) - Static method in class crawlercommons.sitemaps.SiteMapCrossSubmitValidator
-
Validate the URLs in a sitemap against a single cross-submit host.
- validateSiteMapURLs(SiteMap, Collection<String>) - Static method in class crawlercommons.sitemaps.SiteMapCrossSubmitValidator
-
Validate the URLs in a sitemap against a set of cross-submit hosts.
- validateSiteMapURLs(SiteMap, Collection<String>, SiteMapCrossSubmitValidator.CrossSubmitValidationLevel) - Static method in class crawlercommons.sitemaps.SiteMapCrossSubmitValidator
-
Validate the URLs in a sitemap against a set of cross-submit domains.
- validateSiteMapURLs(SiteMap, Predicate<URL>) - Static method in class crawlercommons.sitemaps.SiteMapCrossSubmitValidator
- valueOf(String) - Static method in enum crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.robots.SimpleRobotRules.RobotRulesMode
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.AbstractSiteMap.SitemapType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.Extension
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.NewsAttributes.AccessOption
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.TVShow.VideoType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.SiteMapCrossSubmitValidator.CrossSubmitValidationLevel
-
Returns the enum constant of this type with the specified name.
- valueOf(String) - Static method in enum crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum crawlercommons.filters.basic.BasicURLNormalizer.IdnNormalization
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.robots.SimpleRobotRules.RobotRulesMode
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.AbstractSiteMap.SitemapType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.extension.Extension
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.extension.NewsAttributes.AccessOption
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.extension.NewsAttributes.NewsGenre
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.TVShow.VideoType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceResolution
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.extension.VideoAttributes.VideoPriceType
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.SiteMapCrossSubmitValidator.CrossSubmitValidationLevel
-
Returns an array containing the constants of this enum type, in the order they are declared.
- values() - Static method in enum crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
-
Returns an array containing the constants of this enum type, in the order they are declared.
- VERSION_PATTERN - Static variable in class crawlercommons.domains.EffectiveTldFinder
- VIDEO - crawlercommons.sitemaps.extension.Extension
-
Google Video sitemaps, see https://support.google.com/webmasters/answer/80471
- VIDEO - Static variable in class crawlercommons.sitemaps.Namespace
- VideoAttributes - Class in crawlercommons.sitemaps.extension
-
Data model for Google extension to the sitemap protocol regarding images indexing, as per http://www.google.com/schemas/sitemap-video/1.1
- VideoAttributes() - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes
- VideoAttributes(URL, String, String, URL, URL) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes
- VideoAttributes.ContentSegment - Class in crawlercommons.sitemaps.extension
- VideoAttributes.TVShow - Class in crawlercommons.sitemaps.extension
- VideoAttributes.TVShow.VideoType - Enum in crawlercommons.sitemaps.extension
- VideoAttributes.VideoPrice - Class in crawlercommons.sitemaps.extension
- VideoAttributes.VideoPriceResolution - Enum in crawlercommons.sitemaps.extension
- VideoAttributes.VideoPriceType - Enum in crawlercommons.sitemaps.extension
- VideoHandler - Class in crawlercommons.sitemaps.sax.extension
-
Handle SAX events in the Google Video sitemap extension namespace.
- VideoHandler() - Constructor for class crawlercommons.sitemaps.sax.extension.VideoHandler
- VideoPrice(String, Float) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- VideoPrice(String, Float, VideoAttributes.VideoPriceType) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- VideoPrice(String, Float, VideoAttributes.VideoPriceType, VideoAttributes.VideoPriceResolution) - Constructor for class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice
- VIEW_COUNT - Static variable in class crawlercommons.sitemaps.extension.VideoAttributes
W
- W3C_FULLDATE_FORMATTER - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
-
DateTimeFormatter for parsing dates in ISO-8601 format
- W3C_FULLDATE_FORMATTER_UTC - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
-
DateTimeFormatter to format dates in ISO-8601 format (UTC time zone 'Z')
- W3C_SHORTDATE_FORMATTER - Static variable in class crawlercommons.sitemaps.AbstractSiteMap
-
DateTimeFormatter for parsing short dates ('1997', '1997-07', '1997-07-16') without daytime and time zone
- walkSiteMap(AbstractSiteMap, Consumer<SiteMapURL>) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Traverse a sitemap, recursively fetching and traversing the content of any enclosed sitemap index, and performing the specified action for each sitemap URL until all URLs have been processed or the action throws an exception.
- walkSiteMap(URL, Consumer<SiteMapURL>) - Method in class crawlercommons.sitemaps.SiteMapParser
-
Fetch a sitemap from the specified URL, recursively fetching and traversing the content of any enclosed sitemap index, and performing the specified action for each sitemap URL until all URLs have been processed or the action throws an exception.
- WEEKLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
- WILD_CARD - Static variable in class crawlercommons.domains.EffectiveTldFinder
X
- XML - crawlercommons.sitemaps.AbstractSiteMap.SitemapType
Y
- YEARLY - crawlercommons.sitemaps.SiteMapURL.ChangeFrequency
_
- _mode - Variable in class crawlercommons.robots.SimpleRobotRules
- _rules - Variable in class crawlercommons.robots.SimpleRobotRules
All Classes All Packages