Serialized Form
-
Package crawlercommons.robots
-
Class crawlercommons.robots.BaseRobotRules extends Object implements Serializable
-
Serialized Fields
-
_crawlDelay
long _crawlDelay
-
_deferVisits
boolean _deferVisits
-
_sitemaps
LinkedHashSet<String> _sitemaps
-
-
-
Class crawlercommons.robots.BaseRobotsParser extends Object implements Serializable
-
Class crawlercommons.robots.SimpleRobotRules extends BaseRobotRules implements Serializable
-
Serialized Fields
-
_mode
SimpleRobotRules.RobotRulesMode _mode
-
_rules
ArrayList<SimpleRobotRules.RobotRule> _rules
-
-
-
Class crawlercommons.robots.SimpleRobotRules.RobotRule extends Object implements Serializable
-
Serialized Fields
-
_allow
boolean _allow
-
_prefix
String _prefix
-
-
-
Class crawlercommons.robots.SimpleRobotRulesParser extends BaseRobotsParser implements Serializable
-
Serialized Fields
-
_exactUserAgentMatching
boolean _exactUserAgentMatching
-
_maxCrawlDelay
long _maxCrawlDelay
-
_maxWarnings
int _maxWarnings
-
_numWarningsDuringLastParse
ThreadLocal<Integer> _numWarningsDuringLastParse
-
-
-
-
Package crawlercommons.sitemaps
-
Class crawlercommons.sitemaps.AbstractSiteMap extends Object implements Serializable
-
Serialized Fields
-
lastModified
Date lastModified
W3C date the Sitemap was last modified -
processed
boolean processed
indicate if the Sitemap has been processed. -
type
AbstractSiteMap.SitemapType type
This Sitemap's type -
url
URL url
-
-
-
Class crawlercommons.sitemaps.SiteMap extends AbstractSiteMap implements Serializable
-
Serialized Fields
-
baseUrl
String baseUrl
The base URL for the Sitemap is where the Sitemap was found If found at http://foo.org/abc/sitemap.xml then baseUrl is http://foo.org/abc/ Sitemaps can only contain URLs that are under the base URL. -
urlList
List<SiteMapURL> urlList
URLs found in this Sitemap
-
-
-
Class crawlercommons.sitemaps.SiteMapIndex extends AbstractSiteMap implements Serializable
-
Serialized Fields
-
sitemaps
List<AbstractSiteMap> sitemaps
URLs found in this Sitemap Index
-
-
-
Class crawlercommons.sitemaps.SiteMapURL extends Object implements Serializable
-
Serialized Fields
-
attributes
Map<Extension,ExtensionMetadata[]> attributes
attributes from sitemap extensions (news, image, video sitemaps, etc.) -
changeFreq
SiteMapURL.ChangeFrequency changeFreq
How often the URL changes (optional) -
lastModified
Date lastModified
When URL was last modified (optional) -
priority
double priority
Value between [0.0 - 1.0] (optional) -
url
URL url
URL found in Sitemap (required) -
valid
boolean valid
could be false, if URL isn't found under base path as indicated here: http://www.sitemaps.org/protocol.html#location *
-
-
-
Class crawlercommons.sitemaps.UnknownFormatException extends Exception implements Serializable
-
-
Package crawlercommons.sitemaps.extension
-
Class crawlercommons.sitemaps.extension.ExtensionMetadata extends Object implements Serializable
-
Class crawlercommons.sitemaps.extension.ImageAttributes extends ExtensionMetadata implements Serializable
-
Serialized Fields
-
caption
String caption
Image caption attribute found under image/caption (optional) -
geoLocation
String geoLocation
Image geo location attribute found under image/geo_location (optional) -
license
URL license
Image license attribute found under image/license (optional) -
loc
URL loc
Image location attribute found under image/loc (required) -
title
String title
Image title attribute found under image/title (optional)
-
-
-
Class crawlercommons.sitemaps.extension.LinkAttributes extends ExtensionMetadata implements Serializable
-
Class crawlercommons.sitemaps.extension.MobileAttributes extends ExtensionMetadata implements Serializable
-
Class crawlercommons.sitemaps.extension.NewsAttributes extends ExtensionMetadata implements Serializable
-
Serialized Fields
-
genres
NewsAttributes.NewsGenre[] genres
News genres found under news/genres (required if applicable) -
keywords
String[] keywords
News keywords found under news/keywords (optional) See https://support.google.com/news/publisher/answer/116037 for examples -
language
String language
News publication language found under news/publication/language (required) -
name
String name
News publication name found under news/publication/name (required) -
publicationDate
ZonedDateTime publicationDate
News publication date found under news/publication_date (required) -
stockTickers
String[] stockTickers
News stock tickers found under news/stock_tickers (optional) -
title
String title
News title found under news/title (required)
-
-
-
Class crawlercommons.sitemaps.extension.VideoAttributes extends ExtensionMetadata implements Serializable
-
Serialized Fields
-
allowedCountries
String[] allowedCountries
Video allowed countries found under video/restriction (optional) whitelist of countries filled if video/restriction node has an attribute named relationship with a value of allow. -
allowedPlatforms
String[] allowedPlatforms
Video allowed platforms found under video/platform (optional) whitelist of platforms filled if video/platform node has an attribute named relationship with a value of allow. -
category
String category
Video category found under video/category (optional) -
contentLoc
URL contentLoc
Video content location found under video/content_loc (depends) if not specified, player location must be specified -
description
String description
Video description found under video/description (required) -
duration
Integer duration
Video duration in seconds found under video/duration (recommended) Must be integer between 0 and 28800 (8 hours) -
expirationDate
ZonedDateTime expirationDate
Video expiration date found under video/expiration_date (recommended if applicable) -
familyFriendly
Boolean familyFriendly
Video family friendly attribute found under video/family_friendly (optional) -
galleryLoc
URL galleryLoc
Video gallery location found under video/gallery_loc (optional) -
galleryTitle
String galleryTitle
Video gallery title found under video/gallery_loc[@title] (optional) -
isLive
Boolean isLive
Video is a live stream found under video/live (optional) -
playerLoc
URL playerLoc
Video player location found under video/player_loc (depends) if not specified, content location must be specified -
prices
VideoAttributes.VideoPrice[] prices
Video prices found under video/price (optional) -
publicationDate
ZonedDateTime publicationDate
Video publication date found under video/publication_date (optional) -
rating
Float rating
Video rating found under video/rating (optional) Must be float value between 0.0 and 5.0 -
requiresSubscription
Boolean requiresSubscription
Video requires subscription (free or paid) found under video/requires_subscription (optional) -
restrictedCountries
String[] restrictedCountries
Video restricted countries found under video/restriction (optional) blacklist of countries filled if video/restriction node has an attribute named relationship with a value of deny. -
restrictedPlatforms
String[] restrictedPlatforms
Video restricted platforms found under video/platform (optional) blacklist of platform filled if video/platform node has an attribute named relationship with a value of deny. -
tags
String[] tags
Video tags found under video/tag (optional) Up to 32 tags can be specified -
thumbnailLoc
URL thumbnailLoc
Video thumbnail URL found under video/thumbnail_loc (required) -
title
String title
Video title found under video/title (required) -
uploader
String uploader
Video uploader found under video/uploader (optional) -
uploaderInfo
URL uploaderInfo
Video uploader location (optional) Must be on the same domain as the <loc> this property refers to -
viewCount
Integer viewCount
Video view count found under video/view_count (optional)
-
-
-
Class crawlercommons.sitemaps.extension.VideoAttributes.VideoPrice extends Object implements Serializable
-
Serialized Fields
-
currency
String currency
Video price currency found under video/price[@currency] (required) -
price
Float price
Video price -
resolution
VideoAttributes.VideoPriceResolution resolution
Video price resolution found under video/price[@resolution] -
type
VideoAttributes.VideoPriceType type
Video price type (rent vs own) found under video/price[@type] (optional, defaults to own)
-
-
-