Package crawlercommons.domains
Class EffectiveTldFinder
- java.lang.Object
-
- crawlercommons.domains.EffectiveTldFinder
-
public class EffectiveTldFinder extends Object
To determine the actual domain name of a host name or URL requires knowledge of the various domain registrars and their assignment policies. The best publicly available knowledge base is the public suffix list maintained and available at publicsuffix.org. This class implements the publicsuffix.org ruleset and uses a copy of the public suffix list. For more information, see- publicsuffix.org
- Wikipedia article about the public suffix list
- Mozilla's Effective TLD Service: for historic reasons the class name stems from the term "effective top-level domain" (eTLD)
EffectiveTldFinder.getInstance()
.initialize(InputStream)
. Updates to the public suffix list can be found here:- https://publicsuffix.org/list/public_suffix_list.dat
- https://publicsuffix.org/list/effective_tld_names.dat (same as public_suffix_list.dat)
- https://raw.githubusercontent.com/publicsuffix/list/master/ public_suffix_list.dat
ICANN vs. Private Domains
The public suffix list (see section "divisions") is subdivided into "ICANN" and "PRIVATE" domains. To restrict the EffectiveTldFinder to "ICANN" domains only, pass "true" as flagexcludePrivate
togetAssignedDomain(String, boolean, boolean)
resp.getEffectiveTLD(String, boolean)
. This will exclude the eTLDs from the PRIVATE domain section of the public suffix list while a domain or eTLD is matched.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
EffectiveTldFinder.EffectiveTLD
EffectiveTLD objects hold one line of the public suffix list: the suffix (com
,co.uk
, etc.) for IDN suffixes: both the ASCII and IDN variant (xn--p1ai
andрф
) and the properties required to parse host/domain names given in the public suffix list (wildcard suffix, exception, in private domain section)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static String
getAssignedDomain(String hostname)
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").static String
getAssignedDomain(String hostname, boolean strict)
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").static String
getAssignedDomain(String hostname, boolean strict, boolean excludePrivate)
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name.static EffectiveTldFinder.EffectiveTLD
getEffectiveTLD(String hostname)
Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.static EffectiveTldFinder.EffectiveTLD
getEffectiveTLD(String hostname, boolean excludePrivate)
Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.static Map<String,EffectiveTldFinder.EffectiveTLD>
getEffectiveTLDs()
static EffectiveTldFinder
getInstance()
Get singleton instance of EffectiveTldFinder with default configuration.static void
help()
boolean
initialize(InputStream effectiveTldDataStream)
(Re)initialize EffectiveTldFinder with custom public suffix list.boolean
isConfigured()
static void
main(String[] args)
-
-
-
Field Detail
-
ETLD_DATA
public static final String ETLD_DATA
- See Also:
- Constant Field Values
-
COMMENT
public static final String COMMENT
- See Also:
- Constant Field Values
-
DOT_REGEX
public static final String DOT_REGEX
- See Also:
- Constant Field Values
-
EXCEPTION
public static final String EXCEPTION
- See Also:
- Constant Field Values
-
WILD_CARD
public static final String WILD_CARD
- See Also:
- Constant Field Values
-
DOT
public static final char DOT
- See Also:
- Constant Field Values
-
MAX_DOMAIN_LENGTH_PART
public static final int MAX_DOMAIN_LENGTH_PART
Max. length in ASCII characters of a dot-separated segment in host names (applies to domain names as well), cf. https://tools.ietf.org/html/rfc1034#section-3.1 and https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_hostnames Note: We only have to validate domain names and not the host names passed as input. For domain names a verification of the segment length also implies that the entire domain names stays in the limit of 253 characters. Wildcard suffixes only allow two additional segments (2*63+1 = 127 chars) and all wildcard suffixes are far away from reaching the critical length of 126 characters.- See Also:
- Constant Field Values
-
-
Method Detail
-
getInstance
public static EffectiveTldFinder getInstance()
Get singleton instance of EffectiveTldFinder with default configuration.- Returns:
- singleton instance of EffectiveTldFinder
-
initialize
public boolean initialize(InputStream effectiveTldDataStream)
(Re)initialize EffectiveTldFinder with custom public suffix list.- Parameters:
effectiveTldDataStream
- content of public suffix list as input stream- Returns:
- true if (re)initialization was successful
-
getEffectiveTLDs
public static Map<String,EffectiveTldFinder.EffectiveTLD> getEffectiveTLDs()
-
getEffectiveTLD
public static EffectiveTldFinder.EffectiveTLD getEffectiveTLD(String hostname)
Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.- Parameters:
hostname
- the hostname for which to find theEffectiveTldFinder.EffectiveTLD
- Returns:
- the
EffectiveTldFinder.EffectiveTLD
-
getEffectiveTLD
public static EffectiveTldFinder.EffectiveTLD getEffectiveTLD(String hostname, boolean excludePrivate)
Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.- Parameters:
hostname
- the hostname for which to find theEffectiveTldFinder.EffectiveTLD
excludePrivate
- do not return an effective TLD from the PRIVATE section, instead return the shorter eTLD not in the PRIVATE section- Returns:
- the
EffectiveTldFinder.EffectiveTLD
-
getAssignedDomain
public static String getAssignedDomain(String hostname)
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").- Parameters:
hostname
- a string for which to obtain a NIC-assigned domain name- Returns:
- the NIC-assigned domain name or as fall-back the hostname if no FQDN with valid TLD is found
-
getAssignedDomain
public static String getAssignedDomain(String hostname, boolean strict)
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").- Parameters:
hostname
- a string for which to obtain a NIC-assigned domain namestrict
- do not return the hostname as fall-back if a FQDN with valid TLD cannot be determined- Returns:
- the NIC-assigned domain name, null if strict and no FQDN with valid TLD is found
-
getAssignedDomain
public static String getAssignedDomain(String hostname, boolean strict, boolean excludePrivate)
This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name.- Parameters:
hostname
- a string for which to obtain a NIC-assigned domain namestrict
- do not return the hostname as fall-back if a FQDN with valid TLD cannot be determinedexcludePrivate
- do not return a domain which is below an eTLD from the PRIVATE section, return the shorter domain which is below the "ICANN" registry suffix- Returns:
- the NIC-assigned domain name, null if strict and no FQDN with valid TLD is found
-
isConfigured
public boolean isConfigured()
-
help
public static void help()
-
main
public static void main(String[] args) throws IOException
- Throws:
IOException
-
-