Package | Description |
---|---|
crawlercommons.fetcher.http |
This package concerns the fetching of files over the HTTP protocol:
Extending from
BaseHttpFetcher (which itself extends BaseFetcher ) the
SimpleHttpFetcher provides the Crawler Commons HTTP fetching implementation. |
crawlercommons.robots |
The robots package contains all of the robots.txt rule inference, parsing and utilities contained within Crawler Commons.
|
Modifier and Type | Class and Description |
---|---|
class |
SimpleHttpFetcher
Deprecated.
As of release 0.6. We recommend directly using Apache HttpClient,
async-http-client, or any other robust, industrial-strength HTTP
clients.
|
Modifier and Type | Method and Description |
---|---|
static BaseHttpFetcher |
RobotUtils.createFetcher(BaseHttpFetcher fetcher) |
static BaseHttpFetcher |
RobotUtils.createFetcher(UserAgent userAgent,
int maxThreads) |
Modifier and Type | Method and Description |
---|---|
static BaseHttpFetcher |
RobotUtils.createFetcher(BaseHttpFetcher fetcher) |
static BaseRobotRules |
RobotUtils.getRobotRules(BaseHttpFetcher fetcher,
BaseRobotsParser parser,
URL robotsUrl)
Externally visible, static method for use in tools and for testing.
|
Copyright © 2009–2016 Crawler-Commons. All rights reserved.