Class SimpleRobotRules

    • Method Detail

      • clearRules

        public void clearRules()
      • addRule

        public void addRule​(String prefix,
                            boolean allow)
      • escapePath

        public static String escapePath​(String urlPathQuery,
                                        boolean[] additionalEncodedBytes)
        Encode/decode (using percent-encoding) all characters where necessary: encode Unicode/non-ASCII characters) and decode printable ASCII characters without special semantics.
        Parameters:
        urlPathQuery - path and query component of the URL
        additionalEncodedBytes - boolean array to request bytes (ASCII characters) to be percent-encoded in addition to other characters requiring encoding (Unicode/non-ASCII and characters not allowed in URLs).
        Returns:
        properly percent-encoded URL path and query
      • sortRules

        public void sortRules()
        Sort and deduplicate robot rules. This method must be called after the robots.txt has been processed and before rule matching. The ordering is implemented in SimpleRobotRules.RobotRule.compareTo(RobotRule) and defined by RFC 9309, section 2.2.2:
        The most specific match found MUST be used. The most specific match is the match that has the most octets. Duplicate rules in a group MAY be deduplicated.
      • isAllowAll

        public boolean isAllowAll()
        Is our ruleset set up to allow all access?

        Note: This is decided only based on the SimpleRobotRules.RobotRulesMode without inspecting the set of allow/disallow rules.

        Specified by:
        isAllowAll in class BaseRobotRules
        Returns:
        true if all URLs are allowed.
      • isAllowNone

        public boolean isAllowNone()
        Is our ruleset set up to disallow all access?

        Note: This is decided only based on the SimpleRobotRules.RobotRulesMode without inspecting the set of allow/disallow rules.

        Specified by:
        isAllowNone in class BaseRobotRules
        Returns:
        true if no URLs are allowed.
      • toString

        public String toString()
        Description copied from class: BaseRobotRules
        Returns a string with the crawl delay as well as a list of sitemaps if they exist (and aren't more than 10).
        Overrides:
        toString in class BaseRobotRules