Class Util

java.lang.Object
com.attribyte.parser.Util

public class Util extends Object
  • Field Details

    • EMPTY_ELEMENT

      public static final org.jsoup.nodes.Element EMPTY_ELEMENT
      An element with empty content.
  • Constructor Details

    • Util

      public Util()
  • Method Details

    • map

      public static Map<String,org.jsoup.nodes.Element> map(org.jsoup.nodes.Element parent, Set<String> tagNames)
      Gets a map of the first child vs tag name for all names in the specified set.
      Parameters:
      parent - The parent element.
      tagNames - The set of tag names.
      Returns:
      The element map.
    • childText

      public static String childText(org.jsoup.nodes.Element parent, String childName)
      Gets the text of a child element.
      Parameters:
      parent - The parent element.
      childName - The tag name of the child.
      Returns:
      The text or an empty string if none.
    • childText

      public static String childText(com.fasterxml.jackson.databind.JsonNode parent, String propertyName)
      Gets the text for a JSON object property.
      Parameters:
      parent - The parent node. May be null.
      propertyName - The name of the property.
      Returns:
      The text or an empty string if none.
    • firstChild

      public static org.jsoup.nodes.Element firstChild(org.jsoup.nodes.Element parent, String childName)
      Gets first child element.
      Parameters:
      parent - The parent element.
      childName - The tag name of the child.
      Returns:
      The first matching child or null if none.
    • firstMatch

      public static org.jsoup.nodes.Element firstMatch(org.jsoup.nodes.Element parent, String pattern)
      Finds the first element matching the pattern.
      Parameters:
      parent - The parent element.
      pattern - The pattern.
      Returns:
      The element or null if not found.
    • childrenToString

      public static String childrenToString(org.jsoup.nodes.Element element)
      Converts all children of an HTML element to a string.
      Parameters:
      element - The element.
      Returns:
      The HTML string.
    • cleanSpecialCharacters

      public static String cleanSpecialCharacters(String str)
      Cleans special characters from a string, including emojii.
      Parameters:
      str - The string.
      Returns:
      The cleaned string.
    • protocol

      public static String protocol(String link)
      Gets the protocol from a link.
      Parameters:
      link - The link.
      Returns:
      The protocol or null if not found/invalid link.
    • host

      public static String host(String link)
      Gets the host for a link.
      Parameters:
      link - The link.
      Returns:
      The host or null if link is an invalid URL.
    • domain

      public static String domain(String link)
      Gets the (top, private) domain for the link.

      For example: test.attribyte.com -> attribyte.com, test.blogspot.com -> test.blogspot.com.

      Parameters:
      link - The link.
      Returns:
      The domain or null if invalid.
    • httpURL

      public static String httpURL(String url, String defaultProtocol)
      Check to see if a URL is http/https.
      Parameters:
      url - The URL.
      defaultProtocol - The protocol used if url starts with //. If null, https is used.
      Returns:
      The URL or null.
    • containsImage

      public static boolean containsImage(Entry.Builder entry, String src)
      Determine if an entry contains the image.
      Parameters:
      entry - The entry.
      src - The image source.
      Returns:
      Does the entry contain the image?
    • endsWithIgnoreInvisible

      public static boolean endsWithIgnoreInvisible(String match, String source)
      Determine if a string ends with another string, ignoring trailing "invisible" characters in the source.
      Parameters:
      match - The string to match at the end.
      source - The source string.
      Returns:
      Does the source end with the match string (ignoring invisible characters).
    • startsWithIgnoreInvisible

      public static boolean startsWithIgnoreInvisible(String match, String source)
      Determine if a string starts with another string, ignoring leading "invisible" characters in the source.
      Parameters:
      match - The string to match at the end.
      source - The source string.
      Returns:
      Does the source end with the match string (ignoring invisible characters).
    • slugify

      public static String slugify(String str)
      Creates a slug from a string.

      Does not strip markup.

      Parameters:
      str - The string.
      Returns:
      The slug for the string.
    • splitSimpleHTML

      public static org.jsoup.nodes.Document splitSimpleHTML(String str)
    • splitOnMultipleLinebreaks

      public static List<String> splitOnMultipleLinebreaks(String str, boolean unescapeHTMLEntities)
      Split on multiple line breaks. Single breaks are replaced with a space.
      Parameters:
      str - The string.
      unescapeHTMLEntities - Attempt to unescape any recognized HTML entities.
    • unzip

      public static void unzip(@NonNull File zipFile, @NonNull File targetDir, @Nullable Consumer<File> fileConsumer) throws IOException
      Unzip a file.
      Parameters:
      zipFile - The zip file.
      targetDir - The target output directory.
      fileConsumer - A function called before each file or directory is written.
      Throws:
      IOException - on error.