All Classes Class Summary Enum Summary Exception Summary
Class |
Description |
AbstractSiteMap |
SiteMap or SiteMapIndex
|
AbstractSiteMap.SitemapType |
Various Sitemap types
|
BaseRobotRules |
Result from parsing a single robots.txt file – a set of allow/disallow rules
to check whether a given URL is allowed, and optionally a Crawl-delay and Sitemap URLs.
|
BaseRobotsParser |
Robots.txt parser definition.
|
BasicURLNormalizer |
Code borrowed from Apache Nutch.
|
BasicURLNormalizer.Builder |
|
BasicURLNormalizer.IdnNormalization |
|
CrawlerCommons |
|
DelegatorHandler |
Provides a base SAX handler for parsing of XML documents representing
sub-classes of AbstractSiteMap.
|
EffectiveTldFinder |
To determine the actual domain name of a host name or URL requires knowledge
of the various domain registrars and their assignment policies.
|
EffectiveTldFinder.EffectiveTLD |
EffectiveTLD objects hold one line of the public suffix list:
the suffix (com , co.uk , etc.)
for IDN suffixes: both the ASCII and IDN variant
(xn--p1ai and рф )
and the properties required to parse host/domain names given in the
public suffix list (wildcard suffix, exception, in private domain
section)
|
Extension |
Sitemap extensions supported by the parser.
|
ExtensionHandler |
Handler to be called for elements in the namespace of a sitemap extension.
|
ExtensionMetadata |
Container for attributes of a SiteMapURL defined by a sitemap
extension.
|
ImageAttributes |
Data model for Google extension to the sitemap protocol regarding images
indexing, as per http://www.google.com/schemas/sitemap-image/1.1
|
ImageHandler |
Handle SAX events in the Google Image sitemap extension namespace.
|
LinkAttributes |
Data model for Google extension to the sitemap protocol regarding alternate
links indexing.
|
LinksHandler |
Handle SAX events in the Google Image sitemap extension namespace.
|
MimeTypeDetector |
|
MobileAttributes |
Google mobile sitemap attributes, see
http://www.google.de/schemas/sitemap-mobile/1.0/ and
https://www.google.com/schemas/sitemap-mobile/1.0/sitemap-mobile.xsd:
Mobile sitemaps just contain an empty "mobile" tag to identify a
URL as having mobile content.
|
MobileHandler |
Handle SAX events in the Google Mobile sitemap extension namespace.
|
Namespace |
supported sitemap formats:
https://www.sitemaps.org/protocol.html#otherformats
|
NewsAttributes |
|
NewsAttributes.NewsGenre |
|
NewsHandler |
Handle SAX events in the Google News sitemap extension namespace.
|
PaidLevelDomain |
Routines to extract the PLD (paid-level domain, as per the IRLbot paper) from
a hostname or URL.
|
SimpleRobotRules |
Result from parsing a single robots.txt file – a set of allow/disallow rules
to check whether a given URL is allowed, and optionally a Crawl-delay and Sitemap URLs.
|
SimpleRobotRules.RobotRule |
Single rule that maps from a path prefix to an allow flag.
|
SimpleRobotRules.RobotRulesMode |
|
SimpleRobotRulesParser |
Robots.txt parser following RFC 9309, supporting the Sitemap and Crawl-delay
extensions.
|
SiteMap |
|
SiteMapIndex |
|
SiteMapParser |
|
SiteMapTester |
Sitemap Tool for recursively fetching all URL's from a sitemap (and all of
it's children)
|
SiteMapURL |
The SitemapUrl class represents a URL found in a Sitemap.
|
SiteMapURL.ChangeFrequency |
Allowed change frequencies
|
SkipLeadingWhiteSpaceInputStream |
Wraps a stream and skips over leading whitespace (at beginning of file) in
the wrapped stream.
|
Strings |
Util functions for manipulating strings.
|
SuffixTrie<V> |
|
SuffixTrie.LookupResult<V> |
Wrapper for results when a string is checked for suffixes contained in
the suffix trie.
|
SuffixTrie.Node<V> |
|
UnknownFormatException |
Exception thrown if the format of a sitemap failed to parse.
|
URLFilter |
|
VideoAttributes |
Data model for Google extension to the sitemap protocol regarding images
indexing, as per http://www.google.com/schemas/sitemap-video/1.1
|
VideoAttributes.VideoPrice |
|
VideoAttributes.VideoPriceResolution |
|
VideoAttributes.VideoPriceType |
|
VideoHandler |
Handle SAX events in the Google Video sitemap extension namespace.
|