Package com.attribyte.parser
Class Util
java.lang.Object
com.attribyte.parser.Util
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final org.jsoup.nodes.ElementAn element with empty content. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic StringchildrenToString(org.jsoup.nodes.Element element) Converts all children of an HTML element to a string.static StringGets the text for a JSON object property.static StringGets the text of a child element.static StringCleans special characters from a string, including emojii.static booleancontainsImage(Entry.Builder entry, String src) Determine if an entry contains the image.static StringGets the (top, private) domain for the link.static booleanendsWithIgnoreInvisible(String match, String source) Determine if a string ends with another string, ignoring trailing "invisible" characters in the source.static org.jsoup.nodes.ElementfirstChild(org.jsoup.nodes.Element parent, String childName) Gets first child element.static org.jsoup.nodes.ElementfirstMatch(org.jsoup.nodes.Element parent, String pattern) Finds the first element matching the pattern.static StringGets the host for a link.static StringCheck to see if a URL ishttp/https.Gets a map of the first child vs tag name for all names in the specified set.static StringGets the protocol from a link.static StringCreates a slug from a string.splitOnMultipleLinebreaks(String str, boolean unescapeHTMLEntities) Split on multiple line breaks.static org.jsoup.nodes.DocumentsplitSimpleHTML(String str) static booleanstartsWithIgnoreInvisible(String match, String source) Determine if a string starts with another string, ignoring leading "invisible" characters in the source.static voidUnzip a file.
-
Field Details
-
EMPTY_ELEMENT
public static final org.jsoup.nodes.Element EMPTY_ELEMENTAn element with empty content.
-
-
Constructor Details
-
Util
public Util()
-
-
Method Details
-
map
public static Map<String,org.jsoup.nodes.Element> map(org.jsoup.nodes.Element parent, Set<String> tagNames) Gets a map of the first child vs tag name for all names in the specified set.- Parameters:
parent- The parent element.tagNames- The set of tag names.- Returns:
- The element map.
-
childText
Gets the text of a child element.- Parameters:
parent- The parent element.childName- The tag name of the child.- Returns:
- The text or an empty string if none.
-
childText
Gets the text for a JSON object property.- Parameters:
parent- The parent node. May benull.propertyName- The name of the property.- Returns:
- The text or an empty string if none.
-
firstChild
Gets first child element.- Parameters:
parent- The parent element.childName- The tag name of the child.- Returns:
- The first matching child or
nullif none.
-
firstMatch
Finds the first element matching the pattern.- Parameters:
parent- The parent element.pattern- The pattern.- Returns:
- The element or
nullif not found.
-
childrenToString
Converts all children of an HTML element to a string.- Parameters:
element- The element.- Returns:
- The HTML string.
-
cleanSpecialCharacters
Cleans special characters from a string, including emojii.- Parameters:
str- The string.- Returns:
- The cleaned string.
-
protocol
Gets the protocol from a link.- Parameters:
link- The link.- Returns:
- The protocol or
nullif not found/invalid link.
-
host
Gets the host for a link.- Parameters:
link- The link.- Returns:
- The host or
nullif link is an invalid URL.
-
domain
Gets the (top, private) domain for the link.For example:
test.attribyte.com -> attribyte.com, test.blogspot.com -> test.blogspot.com.- Parameters:
link- The link.- Returns:
- The domain or
nullif invalid.
-
httpURL
Check to see if a URL ishttp/https.- Parameters:
url- The URL.defaultProtocol- The protocol used if url starts with//. Ifnull,httpsis used.- Returns:
- The URL or
null.
-
containsImage
Determine if an entry contains the image.- Parameters:
entry- The entry.src- The image source.- Returns:
- Does the entry contain the image?
-
endsWithIgnoreInvisible
Determine if a string ends with another string, ignoring trailing "invisible" characters in the source.- Parameters:
match- The string to match at the end.source- The source string.- Returns:
- Does the source end with the match string (ignoring invisible characters).
-
startsWithIgnoreInvisible
Determine if a string starts with another string, ignoring leading "invisible" characters in the source.- Parameters:
match- The string to match at the end.source- The source string.- Returns:
- Does the source end with the match string (ignoring invisible characters).
-
slugify
Creates a slug from a string.Does not strip markup.
- Parameters:
str- The string.- Returns:
- The slug for the string.
-
splitSimpleHTML
-
splitOnMultipleLinebreaks
Split on multiple line breaks. Single breaks are replaced with a space.- Parameters:
str- The string.unescapeHTMLEntities- Attempt to unescape any recognized HTML entities.
-
unzip
public static void unzip(@NonNull File zipFile, @NonNull File targetDir, @Nullable Consumer<File> fileConsumer) throws IOException Unzip a file.- Parameters:
zipFile- The zip file.targetDir- The target output directory.fileConsumer- A function called before each file or directory is written.- Throws:
IOException- on error.
-