public class BasicURLNormalizer extends URLFilter
/./
or /../
http://
Modifier and Type | Class and Description |
---|---|
static class |
BasicURLNormalizer.Builder
A builder class for the
BasicURLNormalizer . |
static class |
BasicURLNormalizer.IdnNormalization |
Modifier and Type | Field and Description |
---|---|
static org.slf4j.Logger |
LOG |
Constructor and Description |
---|
BasicURLNormalizer() |
BasicURLNormalizer(BasicURLNormalizer.Builder builder) |
Modifier and Type | Method and Description |
---|---|
String |
filter(String urlString)
Returns a modified version of the input URL or null if the URL should be
removed
|
static String |
formatQueryParameters(List<crawlercommons.filters.basic.BasicURLNormalizer.NameValuePair> parameters)
Formats a list of query parameter name-value pairs into a query parameter string.
|
static void |
main(String[] args) |
static BasicURLNormalizer.Builder |
newBuilder()
Create a new builder object for creating a customized
BasicURLNormalizer object. |
static List<crawlercommons.filters.basic.BasicURLNormalizer.NameValuePair> |
parseQueryParameters(String s,
int queryStartIdx,
Set<String> queryElementsToRemove)
Receives the URL query string and parses it into a list of name-value pairs.
|
static String |
unescapePath(String path)
Remove % encoding from path segment in URL for characters which should be
unescaped according to RFC3986.
|
public BasicURLNormalizer()
public BasicURLNormalizer(BasicURLNormalizer.Builder builder)
public String filter(String urlString)
URLFilter
public static List<crawlercommons.filters.basic.BasicURLNormalizer.NameValuePair> parseQueryParameters(String s, int queryStartIdx, Set<String> queryElementsToRemove)
s
- a String containing the URL file (as per java.net.URL.getFile(), i.e., the path + query +
fragment)queryStartIdx
- the index position of the query part in the string .queryElementsToRemove
- a set of query parameter names to be ignored while parsing the
query parameters.public static String formatQueryParameters(List<crawlercommons.filters.basic.BasicURLNormalizer.NameValuePair> parameters)
parameters
- the query parameter name-value pairspublic static String unescapePath(String path)
public static BasicURLNormalizer.Builder newBuilder()
BasicURLNormalizer
object.public static void main(String[] args) throws IOException
IOException
Copyright © 2009–2021 Crawler-Commons. All rights reserved.