Class DefaultContentCleaner

java.lang.Object
com.attribyte.parser.DefaultContentCleaner
All Implemented Interfaces:
ContentCleaner

public class DefaultContentCleaner extends Object implements ContentCleaner
  • Field Summary

    Fields inherited from interface com.attribyte.parser.ContentCleaner

    NOOP
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    protected int
    cleanAndMarkEmbeds(org.jsoup.nodes.Document doc, String defaultProtocol)
    Marks iframes by placing a q tag before every image with the cite attribute set to the frame source.
    protected int
    cleanAndMarkImages(org.jsoup.nodes.Document doc, String defaultProtocol)
    Removes any images without a source and marks images (if withImages=false) by placing a q tag before every image with the cite attribute set to the image source and class set to image.
    protected int
    cleanAndMarkTwitterBlockquotes(org.jsoup.nodes.Document doc)
     
    protected String
    Gets the protocol to be prepended to image source that start with //.
    void
    Initialize the cleaner.
    protected int
    massageLinks(org.jsoup.nodes.Document doc, String defaultProtocol)
    Cleans links, adding a protocol, if missing.
    toCleanContent(org.jsoup.nodes.Document doc)
    Converts the document to a cleaned string.
    org.jsoup.nodes.Document
    transform(org.jsoup.nodes.Document doc)
    Transforms the content of a document.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • DefaultContentCleaner

      public DefaultContentCleaner()
  • Method Details

    • transform

      public org.jsoup.nodes.Document transform(org.jsoup.nodes.Document doc)
      Description copied from interface: ContentCleaner
      Transforms the content of a document.

      The input document is modified, not copied!

      Specified by:
      transform in interface ContentCleaner
      Parameters:
      doc - The document
      Returns:
      The input document.
    • toCleanContent

      public String toCleanContent(org.jsoup.nodes.Document doc)
      Description copied from interface: ContentCleaner
      Converts the document to a cleaned string.
      Specified by:
      toCleanContent in interface ContentCleaner
      Parameters:
      doc - The document.
      Returns:
      The clean content.
    • init

      public void init(Properties props)
      Description copied from interface: ContentCleaner
      Initialize the cleaner.
      Specified by:
      init in interface ContentCleaner
      Parameters:
      props - The properties.
    • getDocumentProtocol

      protected String getDocumentProtocol(String link)
      Gets the protocol to be prepended to image source that start with //.
      Parameters:
      link - The link.
      Returns:
      The protocol.
    • cleanAndMarkEmbeds

      protected int cleanAndMarkEmbeds(org.jsoup.nodes.Document doc, String defaultProtocol)
      Marks iframes by placing a q tag before every image with the cite attribute set to the frame source. The class iframe is also added.
      Parameters:
      doc - The document.
      defaultProtocol - The default protocol.
      Returns:
      The number of modifications.
    • cleanAndMarkTwitterBlockquotes

      protected int cleanAndMarkTwitterBlockquotes(org.jsoup.nodes.Document doc)
    • massageLinks

      protected int massageLinks(org.jsoup.nodes.Document doc, String defaultProtocol)
      Cleans links, adding a protocol, if missing.

      Converts "mailto" links to q.mailto with value as cite.

      Parameters:
      doc - The document.
      defaultProtocol - The default protocol for protocol-less links.
      Returns:
      The number of modifications.
    • cleanAndMarkImages

      protected int cleanAndMarkImages(org.jsoup.nodes.Document doc, String defaultProtocol)
      Removes any images without a source and marks images (if withImages=false) by placing a q tag before every image with the cite attribute set to the image source and class set to image.
      Parameters:
      doc - The document.
      defaultProtocol - The default protocol.
      Returns:
      The number of changes.