All Classes Class Summary Enum Summary Exception Summary
| Class |
Description |
| AbstractSiteMap |
SiteMap or SiteMapIndex
|
| AbstractSiteMap.SitemapType |
Various Sitemap types
|
| BaseRobotRules |
Result from parsing a single robots.txt file – a set of allow/disallow rules
to check whether a given URL is allowed, and optionally a Crawl-delay and Sitemap URLs.
|
| BaseRobotsParser |
Robots.txt parser definition.
|
| BasicURLNormalizer |
Code borrowed from Apache Nutch.
|
| BasicURLNormalizer.Builder |
|
| BasicURLNormalizer.IdnNormalization |
|
| CrawlerCommons |
|
| DelegatorHandler |
Provides a base SAX handler for parsing of XML documents representing
sub-classes of AbstractSiteMap.
|
| EffectiveTldFinder |
To determine the actual domain name of a host name or URL requires knowledge
of the various domain registrars and their assignment policies.
|
| EffectiveTldFinder.EffectiveTLD |
EffectiveTLD objects hold one line of the public suffix list:
the suffix (com, co.uk, etc.)
for IDN suffixes: both the ASCII and IDN variant
(xn--p1ai and рф)
and the properties required to parse host/domain names given in the
public suffix list (wildcard suffix, exception, in private domain
section)
|
| Extension |
Sitemap extensions supported by the parser.
|
| ExtensionHandler |
Handler to be called for elements in the namespace of a sitemap extension.
|
| ExtensionMetadata |
Container for attributes of a SiteMapURL defined by a sitemap
extension.
|
| ImageAttributes |
Data model for Google extension to the sitemap protocol regarding images
indexing, as per http://www.google.com/schemas/sitemap-image/1.1
|
| ImageHandler |
Handle SAX events in the Google Image sitemap extension namespace.
|
| LinkAttributes |
Data model for Google extension to the sitemap protocol regarding alternate
links indexing.
|
| LinksHandler |
Handle SAX events in the Google Image sitemap extension namespace.
|
| MimeTypeDetector |
|
| MobileAttributes |
Google mobile sitemap attributes, see
http://www.google.de/schemas/sitemap-mobile/1.0/ and
https://www.google.com/schemas/sitemap-mobile/1.0/sitemap-mobile.xsd:
Mobile sitemaps just contain an empty "mobile" tag to identify a
URL as having mobile content.
|
| MobileHandler |
Handle SAX events in the Google Mobile sitemap extension namespace.
|
| Namespace |
supported sitemap formats:
https://www.sitemaps.org/protocol.html#otherformats
|
| NewsAttributes |
|
| NewsAttributes.NewsGenre |
|
| NewsHandler |
Handle SAX events in the Google News sitemap extension namespace.
|
| PaidLevelDomain |
Routines to extract the PLD (paid-level domain, as per the IRLbot paper) from
a hostname or URL.
|
| SimpleRobotRules |
Result from parsing a single robots.txt file – a set of allow/disallow rules
to check whether a given URL is allowed, and optionally a Crawl-delay and Sitemap URLs.
|
| SimpleRobotRules.RobotRule |
Single rule that maps from a path prefix to an allow flag.
|
| SimpleRobotRules.RobotRulesMode |
|
| SimpleRobotRulesParser |
Robots.txt parser following RFC 9309, supporting the Sitemap and Crawl-delay
extensions.
|
| SiteMap |
|
| SiteMapIndex |
|
| SiteMapParser |
|
| SiteMapTester |
Sitemap Tool for recursively fetching all URL's from a sitemap (and all of
it's children)
|
| SiteMapURL |
The SitemapUrl class represents a URL found in a Sitemap.
|
| SiteMapURL.ChangeFrequency |
Allowed change frequencies
|
| SkipLeadingWhiteSpaceInputStream |
Wraps a stream and skips over leading whitespace (at beginning of file) in
the wrapped stream.
|
| Strings |
Util functions for manipulating strings.
|
| SuffixTrie<V> |
|
| SuffixTrie.LookupResult<V> |
Wrapper for results when a string is checked for suffixes contained in
the suffix trie.
|
| SuffixTrie.Node<V> |
|
| UnknownFormatException |
Exception thrown if the format of a sitemap failed to parse.
|
| URLFilter |
|
| VideoAttributes |
Data model for Google extension to the sitemap protocol regarding images
indexing, as per http://www.google.com/schemas/sitemap-video/1.1
|
| VideoAttributes.VideoPrice |
|
| VideoAttributes.VideoPriceResolution |
|
| VideoAttributes.VideoPriceType |
|
| VideoHandler |
Handle SAX events in the Google Video sitemap extension namespace.
|