Package crawlercommons.robots
Class BaseRobotRules
- java.lang.Object
-
- crawlercommons.robots.BaseRobotRules
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
SimpleRobotRules
public abstract class BaseRobotRules extends Object implements Serializable
Result from parsing a single robots.txt file - which means we get a set of rules, and a crawl-delay.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static longUNSET_CRAWL_DELAY
-
Constructor Summary
Constructors Constructor Description BaseRobotRules()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidaddSitemap(String sitemap)Add sitemap URL to rules if not a duplicatebooleanequals(Object obj)longgetCrawlDelay()List<String>getSitemaps()Get URLs of sitemap links found in robots.txtinthashCode()abstract booleanisAllowAll()abstract booleanisAllowed(String url)abstract booleanisAllowNone()booleanisDeferVisits()voidsetCrawlDelay(long crawlDelay)voidsetDeferVisits(boolean deferVisits)StringtoString()Returns a string with the crawl delay as well as a list of sitemaps if they exist (and aren't more than 10)
-
-
-
Field Detail
-
UNSET_CRAWL_DELAY
public static final long UNSET_CRAWL_DELAY
- See Also:
- Constant Field Values
-
-
Method Detail
-
isAllowed
public abstract boolean isAllowed(String url)
-
isAllowAll
public abstract boolean isAllowAll()
-
isAllowNone
public abstract boolean isAllowNone()
-
getCrawlDelay
public long getCrawlDelay()
-
setCrawlDelay
public void setCrawlDelay(long crawlDelay)
-
isDeferVisits
public boolean isDeferVisits()
-
setDeferVisits
public void setDeferVisits(boolean deferVisits)
-
addSitemap
public void addSitemap(String sitemap)
Add sitemap URL to rules if not a duplicate
-
-