public abstract class BaseRobotsParser extends Object implements Serializable
Constructor and Description |
---|
BaseRobotsParser() |
Modifier and Type | Method and Description |
---|---|
abstract BaseRobotRules |
failedFetch(int httpStatusCode)
The fetch of robots.txt failed, so return rules appropriate give the HTTP
status code.
|
abstract BaseRobotRules |
parseContent(String url,
byte[] content,
String contentType,
String robotNames)
Parse the robots.txt file in content, and return rules appropriate
for processing paths by userAgent.
|
public abstract BaseRobotRules parseContent(String url, byte[] content, String contentType, String robotNames)
url
- URL that content was fetched from (for reporting purposes)content
- raw bytes from the site's robots.txt filecontentType
- HTTP response header (mime-type)robotNames
- name(s) of crawler, to be used when processing file contents
(just the name portion, w/o version or other details)public abstract BaseRobotRules failedFetch(int httpStatusCode)
httpStatusCode
- a failure status code (NOT 2xx)Copyright © 2009–2016 Crawler-Commons. All rights reserved.