Overview (Crawler-commons 1.5 API)

Packages
Package	Description
crawlercommons
crawlercommons.domains	Classes contained within the domains package relate to the definition of "paid-level" domains or "effective top-level domains", that is Internet domain names on level below a public suffix defined in the public suffix list.
crawlercommons.filters	The filters package contains code and resources for URL filtering.
crawlercommons.filters.basic	URL normalizer performing basic normalizations applicable to `http://` and `https://` URLs.
crawlercommons.mimetypes	Utilities for detecting MIME types relevant for in the context of crawler-commons.
crawlercommons.robots	The robots package contains all of the robots.txt rule inference, parsing and utilities contained within Crawler-Commons.
crawlercommons.sitemaps	Classes focused on parsing and processing sitemaps and holding the resulting set of URLs with crawling-related metadata, such as the change frequency of a page.
crawlercommons.sitemaps.extension	Extensions to the sitemaps protocol for additional attributes and links to alternate media formats, for example image, video and news sitemaps.
crawlercommons.sitemaps.sax	SAX handlers to parse specific elements of XML sitemaps or Atom/RSS feeds.
crawlercommons.sitemaps.sax.extension	SAX handlers to parse extensions of XML sitemaps.