Package crawlercommons.domains
Class EffectiveTldFinder
- java.lang.Object
- 
- crawlercommons.domains.EffectiveTldFinder
 
- 
 public class EffectiveTldFinder extends Object To determine the actual domain name of a host name or URL requires knowledge of the various domain registrars and their assignment policies. The best publicly available knowledge base is the public suffix list maintained and available at publicsuffix.org. This class implements the publicsuffix.org ruleset and uses a copy of the public suffix list. For more information, see- publicsuffix.org
- Wikipedia article about the public suffix list
- Mozilla's Effective TLD Service: for historic reasons the class name stems from the term "effective top-level domain" (eTLD)
 EffectiveTldFinder.getInstance().initialize(InputStream). Updates to the public suffix list can be found here:- https://publicsuffix.org/list/public_suffix_list.dat
- https://publicsuffix.org/list/effective_tld_names.dat (same as public_suffix_list.dat)
- https://raw.githubusercontent.com/publicsuffix/list/master/ public_suffix_list.dat
 ICANN vs. Private DomainsThe public suffix list (see section "divisions") is subdivided into "ICANN" and "PRIVATE" domains. To restrict the EffectiveTldFinder to "ICANN" domains only, pass "true" as flagexcludePrivatetogetAssignedDomain(String, boolean, boolean)resp.getEffectiveTLD(String, boolean). This will exclude the eTLDs from the PRIVATE domain section of the public suffix list while a domain or eTLD is matched.
- 
- 
Nested Class SummaryNested Classes Modifier and Type Class Description static classEffectiveTldFinder.EffectiveTLDEffectiveTLD objects hold one line of the public suffix list: the suffix (com,co.uk, etc.) for IDN suffixes: both the ASCII and IDN variant (xn--p1aiandрф) and the properties required to parse host/domain names given in the public suffix list (wildcard suffix, exception, in private domain section)
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static StringgetAssignedDomain(String hostname)This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").static StringgetAssignedDomain(String hostname, boolean strict)This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").static StringgetAssignedDomain(String hostname, boolean strict, boolean excludePrivate)This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name.static EffectiveTldFinder.EffectiveTLDgetEffectiveTLD(String hostname)Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.static EffectiveTldFinder.EffectiveTLDgetEffectiveTLD(String hostname, boolean excludePrivate)Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.static Map<String,EffectiveTldFinder.EffectiveTLD>getEffectiveTLDs()static EffectiveTldFindergetInstance()Get singleton instance of EffectiveTldFinder with default configuration.static voidhelp()booleaninitialize(InputStream effectiveTldDataStream)(Re)initialize EffectiveTldFinder with custom public suffix list.booleanisConfigured()static voidmain(String[] args)
 
- 
- 
- 
Field Detail- 
ETLD_DATApublic static final String ETLD_DATA - See Also:
- Constant Field Values
 
 - 
COMMENTpublic static final String COMMENT - See Also:
- Constant Field Values
 
 - 
DOT_REGEXpublic static final String DOT_REGEX - See Also:
- Constant Field Values
 
 - 
EXCEPTIONpublic static final String EXCEPTION - See Also:
- Constant Field Values
 
 - 
WILD_CARDpublic static final String WILD_CARD - See Also:
- Constant Field Values
 
 - 
DOTpublic static final char DOT - See Also:
- Constant Field Values
 
 - 
MAX_DOMAIN_LENGTH_PARTpublic static final int MAX_DOMAIN_LENGTH_PART Max. length in ASCII characters of a dot-separated segment in host names (applies to domain names as well), cf. https://tools.ietf.org/html/rfc1034#section-3.1 and https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_hostnames Note: We only have to validate domain names and not the host names passed as input. For domain names a verification of the segment length also implies that the entire domain names stays in the limit of 253 characters. Wildcard suffixes only allow two additional segments (2*63+1 = 127 chars) and all wildcard suffixes are far away from reaching the critical length of 126 characters.- See Also:
- Constant Field Values
 
 
- 
 - 
Method Detail- 
getInstancepublic static EffectiveTldFinder getInstance() Get singleton instance of EffectiveTldFinder with default configuration.- Returns:
- singleton instance of EffectiveTldFinder
 
 - 
initializepublic boolean initialize(InputStream effectiveTldDataStream) (Re)initialize EffectiveTldFinder with custom public suffix list.- Parameters:
- effectiveTldDataStream- content of public suffix list as input stream
- Returns:
- true if (re)initialization was successful
 
 - 
getEffectiveTLDspublic static Map<String,EffectiveTldFinder.EffectiveTLD> getEffectiveTLDs() 
 - 
getEffectiveTLDpublic static EffectiveTldFinder.EffectiveTLD getEffectiveTLD(String hostname) Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.- Parameters:
- hostname- the hostname for which to find the- EffectiveTldFinder.EffectiveTLD
- Returns:
- the EffectiveTldFinder.EffectiveTLD
 
 - 
getEffectiveTLDpublic static EffectiveTldFinder.EffectiveTLD getEffectiveTLD(String hostname, boolean excludePrivate) Get EffectiveTLD for host name using the singleton instance of EffectiveTldFinder.- Parameters:
- hostname- the hostname for which to find the- EffectiveTldFinder.EffectiveTLD
- excludePrivate- do not return an effective TLD from the PRIVATE section, instead return the shorter eTLD not in the PRIVATE section
- Returns:
- the EffectiveTldFinder.EffectiveTLD
 
 - 
getAssignedDomainpublic static String getAssignedDomain(String hostname) This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").- Parameters:
- hostname- a string for which to obtain a NIC-assigned domain name
- Returns:
- the NIC-assigned domain name or as fall-back the hostname if no FQDN with valid TLD is found
 
 - 
getAssignedDomainpublic static String getAssignedDomain(String hostname, boolean strict) This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name (aka "Paid Level Domain").- Parameters:
- hostname- a string for which to obtain a NIC-assigned domain name
- strict- do not return the hostname as fall-back if a FQDN with valid TLD cannot be determined
- Returns:
- the NIC-assigned domain name, null if strict and no FQDN with valid TLD is found
 
 - 
getAssignedDomainpublic static String getAssignedDomain(String hostname, boolean strict, boolean excludePrivate) This method uses the effective TLD to determine which component of a FQDN is the NIC-assigned domain name.- Parameters:
- hostname- a string for which to obtain a NIC-assigned domain name
- strict- do not return the hostname as fall-back if a FQDN with valid TLD cannot be determined
- excludePrivate- do not return a domain which is below an eTLD from the PRIVATE section, return the shorter domain which is below the "ICANN" registry suffix
- Returns:
- the NIC-assigned domain name, null if strict and no FQDN with valid TLD is found
 
 - 
isConfiguredpublic boolean isConfigured() 
 - 
helppublic static void help() 
 - 
mainpublic static void main(String[] args) throws IOException - Throws:
- IOException
 
 
- 
 
-