Package crawlercommons.sitemaps
Classes focused on parsing and processing
sitemaps and holding the resulting set of
URLs with crawling-related metadata, such as the change frequency of a page.
-
Class Summary Class Description AbstractSiteMap SiteMap or SiteMapIndexNamespace supported sitemap formats: https://www.sitemaps.org/protocol.html#otherformatsSiteMap SiteMapCrossSubmitValidator Validator for sitemap cross submits.SiteMapIndex SiteMapParser SiteMapTester Sitemap Tool for recursively fetching all URL's from a sitemap (and all of it's children)SiteMapURL The SitemapUrl class represents a URL found in a Sitemap.SkipLeadingWhiteSpaceInputStream Wraps a stream and skips over leading whitespace (at beginning of file) in the wrapped stream. -
Enum Summary Enum Description AbstractSiteMap.SitemapType Various Sitemap typesSiteMapCrossSubmitValidator.CrossSubmitValidationLevel SiteMapURL.ChangeFrequency Allowed change frequencies -
Exception Summary Exception Description UnknownFormatException Exception thrown if the format of a sitemap failed to parse.