Package crawlercommons.sitemaps
Class AbstractSiteMap
- java.lang.Object
-
- crawlercommons.sitemaps.AbstractSiteMap
-
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
SiteMap
,SiteMapIndex
public abstract class AbstractSiteMap extends Object implements Serializable
SiteMap or SiteMapIndex- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
AbstractSiteMap.SitemapType
Various Sitemap types
-
Field Summary
Fields Modifier and Type Field Description protected static ZoneId
TIME_ZONE_UTC
protected URL
url
static DateTimeFormatter
W3C_FULLDATE_FORMATTER
DateTimeFormatter for parsing dates in ISO-8601 formatstatic DateTimeFormatter
W3C_FULLDATE_FORMATTER_UTC
DateTimeFormatter to format dates in ISO-8601 format (UTC time zone 'Z')static DateTimeFormatter
W3C_SHORTDATE_FORMATTER
DateTimeFormatter for parsing short dates ('1997', '1997-07', '1997-07-16') without daytime and time zone
-
Constructor Summary
Constructors Constructor Description AbstractSiteMap()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static Date
convertToDate(String date)
static ZonedDateTime
convertToZonedDateTime(String date)
Convert the given date (given in an acceptable DateFormat), return null if the date is not in the correct format.Date
getLastModified()
AbstractSiteMap.SitemapType
getType()
URL
getUrl()
boolean
isIndex()
boolean
isProcessed()
static String
normalizeRSSTimestamp(String pubDate)
Converts pubDate of RSS to the ISO-8601 instant format, e.g., '2017-01-05T12:34:54Z' in UTC / GMT time zone, seeDateTimeFormatter.ISO_INSTANT
.static ZonedDateTime
parseRSSTimestamp(String pubDate)
Parse pubDate of RSS feeds.void
setLastModified(String lastModified)
void
setLastModified(ZonedDateTime lastModified)
void
setLastModified(Date lastModified)
void
setProcessed(boolean processed)
void
setType(AbstractSiteMap.SitemapType type)
-
-
-
Field Detail
-
TIME_ZONE_UTC
protected static final ZoneId TIME_ZONE_UTC
-
W3C_FULLDATE_FORMATTER
public static final DateTimeFormatter W3C_FULLDATE_FORMATTER
DateTimeFormatter for parsing dates in ISO-8601 format
-
W3C_FULLDATE_FORMATTER_UTC
public static final DateTimeFormatter W3C_FULLDATE_FORMATTER_UTC
DateTimeFormatter to format dates in ISO-8601 format (UTC time zone 'Z')
-
W3C_SHORTDATE_FORMATTER
public static final DateTimeFormatter W3C_SHORTDATE_FORMATTER
DateTimeFormatter for parsing short dates ('1997', '1997-07', '1997-07-16') without daytime and time zone
-
url
protected URL url
-
-
Method Detail
-
isIndex
public boolean isIndex()
-
getUrl
public URL getUrl()
- Returns:
- the URL of the Sitemap
-
setType
public void setType(AbstractSiteMap.SitemapType type)
- Parameters:
type
- the Sitemap type to set
-
getType
public AbstractSiteMap.SitemapType getType()
- Returns:
- the Sitemap type
-
setProcessed
public void setProcessed(boolean processed)
- Parameters:
processed
- - indicate if the Sitemap has been processed.
-
isProcessed
public boolean isProcessed()
- Returns:
- true if the Sitemap has been processed i.e it contains at least one SiteMapURL
-
setLastModified
public void setLastModified(Date lastModified)
- Parameters:
lastModified
- the last-modified date
-
setLastModified
public void setLastModified(ZonedDateTime lastModified)
- Parameters:
lastModified
- the last-modified date and time
-
setLastModified
public void setLastModified(String lastModified)
- Parameters:
lastModified
- the last-modified date time. If parsing of the given date time fails, the last-modified field is set to null.
-
getLastModified
public Date getLastModified()
- Returns:
- the lastModified date of the Sitemap
-
convertToZonedDateTime
public static ZonedDateTime convertToZonedDateTime(String date)
Convert the given date (given in an acceptable DateFormat), return null if the date is not in the correct format.Dates must follow the W3C Datetime format which is similar to ISO-8601 but allows dates with different precisions:
Year: YYYY (eg 1997) Year and month: YYYY-MM (eg 1997-07) Complete date: YYYY-MM-DD (eg 1997-07-16) Complete date plus hours and minutes: YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00) Complete date plus hours, minutes and seconds: YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00) Complete date plus hours, minutes, seconds and a decimal fraction of a second YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
Note: Greenwich time (UTC) is assumed if the date string does not specify a time zone.- Parameters:
date
- - the date to be parsed- Returns:
- the zoned date time equivalent to the date string or NULL if parsing failed
-
convertToDate
public static Date convertToDate(String date)
- Parameters:
date
- the date string to convert- Returns:
- returns the date or null if parsing of the date string fails
-
normalizeRSSTimestamp
public static String normalizeRSSTimestamp(String pubDate)
Converts pubDate of RSS to the ISO-8601 instant format, e.g., '2017-01-05T12:34:54Z' in UTC / GMT time zone, seeDateTimeFormatter.ISO_INSTANT
.- Parameters:
pubDate
- - date time of pubDate in RFC822- Returns:
- converted to "yyyy-MM-dd'T'HH:mm:ssZ" format or original value if it doesn't follow the RFC822
-
parseRSSTimestamp
public static ZonedDateTime parseRSSTimestamp(String pubDate)
Parse pubDate of RSS feeds.- Parameters:
pubDate
- - date time of pubDate in RFC822- Returns:
- date time or null if parsing failed
-
-