Package crawlercommons.robots
Class BaseRobotRules
- java.lang.Object
-
- crawlercommons.robots.BaseRobotRules
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
SimpleRobotRules
public abstract class BaseRobotRules extends Object implements Serializable
Result from parsing a single robots.txt file – a set of allow/disallow rules to check whether a given URL is allowed, and optionally a Crawl-delay and Sitemap URLs.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static longUNSET_CRAWL_DELAY
-
Constructor Summary
Constructors Constructor Description BaseRobotRules()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidaddSitemap(String sitemap)Add sitemap URL to rules if not a duplicatebooleanequals(Object obj)longgetCrawlDelay()Get Crawl-delay (in milliseconds)List<String>getSitemaps()Get URLs of sitemap links found in robots.txtinthashCode()abstract booleanisAllowAll()abstract booleanisAllowed(String url)abstract booleanisAllowNone()booleanisDeferVisits()voidsetCrawlDelay(long crawlDelay)voidsetDeferVisits(boolean deferVisits)Indicate to defer visits to the server, e.g.StringtoString()Returns a string with the crawl delay as well as a list of sitemaps if they exist (and aren't more than 10).
-
-
-
Field Detail
-
UNSET_CRAWL_DELAY
public static final long UNSET_CRAWL_DELAY
- See Also:
- Constant Field Values
-
-
Method Detail
-
isAllowed
public abstract boolean isAllowed(String url)
-
isAllowAll
public abstract boolean isAllowAll()
-
isAllowNone
public abstract boolean isAllowNone()
-
getCrawlDelay
public long getCrawlDelay()
Get Crawl-delay (in milliseconds)- Returns:
- Crawl-delay defined in the robots.txt for the given agent name,
or
UNSET_CRAWL_DELAYif not defined.
-
setCrawlDelay
public void setCrawlDelay(long crawlDelay)
- Parameters:
crawlDelay- Crawl-Delay in milliseconds
-
isDeferVisits
public boolean isDeferVisits()
- Returns:
- whether to defer visits to the server
-
setDeferVisits
public void setDeferVisits(boolean deferVisits)
Indicate to defer visits to the server, e.g. to wait until the robots.txt becomes available.
-
addSitemap
public void addSitemap(String sitemap)
Add sitemap URL to rules if not a duplicate
-
-