Package org.attribyte.api.http.util
Class RobotsTxt
java.lang.Object
org.attribyte.api.http.util.RobotsTxt
A parsed
robots.txt file.-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionfinal booleanDetermine if a user agent is allowed for the specified path.final booleanDetermine if a user agent is allowed for the specified path.static RobotsTxtparse(String host, Client httpClient, String userAgent, Set<String> preserveAgents, org.attribyte.api.Logger logger) Creates a robots.txt from the standard location (/robots.txt).
-
Field Details
-
NO_ROBOTS
-
-
Constructor Details
-
RobotsTxt
Parse robots.txt from a character stream.- Parameters:
r- A reader from which therobots.txtis read.agents- A list of user agents that, if listed in the file, should be preserved. The wildcard (*) is always preserved.- Throws:
IOException- on input error.
-
-
Method Details
-
parse
public static RobotsTxt parse(String host, Client httpClient, String userAgent, Set<String> preserveAgents, org.attribyte.api.Logger logger) Creates a robots.txt from the standard location (/robots.txt).- Parameters:
host- The hostname. The URL will be created as[host]/robots.txt.httpClient- The HTTP client for making the request.userAgent- TheUser-Agentsent with the request.preserveAgents- The set of agents to preserve. Agents not contained in this set will be ignored during parse.logger- A logger for errors. May benull. If specified HTTP errors during parse will be logged at thewarnlevel.- Returns:
- The parsed robots.txt.
-
isAllowed
Determine if a user agent is allowed for the specified path.- Parameters:
userAgent- The user agent string.path- The path.- Returns:
- Is the agent allowed?
-
isAllowed
Determine if a user agent is allowed for the specified path.Technically, the treatment of Allow is not right (http://www.robotstxt.org/wc/norobots-rfc.html). A single list should be processed - matching all records in the order they appear. However, in practice, I have found that many times people do things that don't make sense - like disallow all, then allow, etc.
- Parameters:
userAgent- The user agent string.path- The path.checkWildcard- Should the wildcard record be checked? (This gives a way to know if a user agent is explicitly disallowed by name.)- Returns:
- Is the agent allowed?
-