Package crawlercommons.robots
Class BaseRobotRules
- java.lang.Object
-
- crawlercommons.robots.BaseRobotRules
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
SimpleRobotRules
public abstract class BaseRobotRules extends Object implements Serializable
Result from parsing a single robots.txt file – a set of allow/disallow rules to check whether a given URL is allowed, and optionally a Crawl-delay and Sitemap URLs.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static long
UNSET_CRAWL_DELAY
-
Constructor Summary
Constructors Constructor Description BaseRobotRules()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
addSitemap(String sitemap)
Add sitemap URL to rules if not a duplicateboolean
equals(Object obj)
long
getCrawlDelay()
Get Crawl-delay (in milliseconds)List<String>
getSitemaps()
Get URLs of sitemap links found in robots.txtint
hashCode()
abstract boolean
isAllowAll()
abstract boolean
isAllowed(String url)
abstract boolean
isAllowNone()
boolean
isDeferVisits()
void
setCrawlDelay(long crawlDelay)
void
setDeferVisits(boolean deferVisits)
Indicate to defer visits to the server, e.g.String
toString()
Returns a string with the crawl delay as well as a list of sitemaps if they exist (and aren't more than 10).
-
-
-
Field Detail
-
UNSET_CRAWL_DELAY
public static final long UNSET_CRAWL_DELAY
- See Also:
- Constant Field Values
-
-
Method Detail
-
isAllowed
public abstract boolean isAllowed(String url)
-
isAllowAll
public abstract boolean isAllowAll()
-
isAllowNone
public abstract boolean isAllowNone()
-
getCrawlDelay
public long getCrawlDelay()
Get Crawl-delay (in milliseconds)- Returns:
- Crawl-delay defined in the robots.txt for the given agent name,
or
UNSET_CRAWL_DELAY
if not defined.
-
setCrawlDelay
public void setCrawlDelay(long crawlDelay)
- Parameters:
crawlDelay
- Crawl-Delay in milliseconds
-
isDeferVisits
public boolean isDeferVisits()
- Returns:
- whether to defer visits to the server
-
setDeferVisits
public void setDeferVisits(boolean deferVisits)
Indicate to defer visits to the server, e.g. to wait until the robots.txt becomes available.
-
addSitemap
public void addSitemap(String sitemap)
Add sitemap URL to rules if not a duplicate
-
-