public abstract class BaseRobotsParser extends Object implements Serializable
| Constructor and Description |
|---|
BaseRobotsParser() |
| Modifier and Type | Method and Description |
|---|---|
abstract BaseRobotRules |
failedFetch(int httpStatusCode)
The fetch of robots.txt failed, so return rules appropriate give the HTTP
status code.
|
abstract BaseRobotRules |
parseContent(String url,
byte[] content,
String contentType,
String robotNames)
Parse the robots.txt file in content, and return rules appropriate
for processing paths by userAgent.
|
public abstract BaseRobotRules parseContent(String url, byte[] content, String contentType, String robotNames)
url - URL that content was fetched from (for reporting purposes)content - raw bytes from the site's robots.txt filecontentType - HTTP response header (mime-type)robotNames - name(s) of crawler, to be used when processing file contents
(just the name portion, w/o version or other details)public abstract BaseRobotRules failedFetch(int httpStatusCode)
httpStatusCode - a failure status code (NOT 2xx)Copyright © 2009–2016 Crawler-Commons. All rights reserved.