Package | Description |
---|---|
crawlercommons.robots |
The robots package contains all of the robots.txt rule inference, parsing and utilities contained within Crawler Commons.
|
Modifier and Type | Class and Description |
---|---|
class |
SimpleRobotRules
Result from parsing a single robots.txt file - which means we get a set of
rules, and an optional crawl-delay, and an optional sitemap URL.
|
Modifier and Type | Method and Description |
---|---|
abstract BaseRobotRules |
BaseRobotsParser.failedFetch(int httpStatusCode)
The fetch of robots.txt failed, so return rules appropriate give the HTTP
status code.
|
abstract BaseRobotRules |
BaseRobotsParser.parseContent(String url,
byte[] content,
String contentType,
String robotNames)
Parse the robots.txt file in content, and return rules appropriate
for processing paths by userAgent.
|
Copyright © 2009–2021 Crawler-Commons. All rights reserved.