Package com.attribyte.parser
Class DefaultContentCleaner
java.lang.Object
com.attribyte.parser.DefaultContentCleaner
- All Implemented Interfaces:
ContentCleaner
-
Field Summary
Fields inherited from interface com.attribyte.parser.ContentCleaner
NOOP -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected intcleanAndMarkEmbeds(org.jsoup.nodes.Document doc, String defaultProtocol) Marksiframesby placing aqtag before every image with theciteattribute set to the frame source.protected intcleanAndMarkImages(org.jsoup.nodes.Document doc, String defaultProtocol) Removes any images without a source and marks images (ifwithImages=false) by placing aqtag before every image with theciteattribute set to the image source andclassset toimage.protected intcleanAndMarkTwitterBlockquotes(org.jsoup.nodes.Document doc) protected StringgetDocumentProtocol(String link) Gets the protocol to be prepended to image source that start with//.voidinit(Properties props) Initialize the cleaner.protected intmassageLinks(org.jsoup.nodes.Document doc, String defaultProtocol) Cleans links, adding a protocol, if missing.toCleanContent(org.jsoup.nodes.Document doc) Converts the document to a cleaned string.org.jsoup.nodes.Documenttransform(org.jsoup.nodes.Document doc) Transforms the content of a document.
-
Constructor Details
-
DefaultContentCleaner
public DefaultContentCleaner()
-
-
Method Details
-
transform
public org.jsoup.nodes.Document transform(org.jsoup.nodes.Document doc) Description copied from interface:ContentCleanerTransforms the content of a document.The input document is modified, not copied!
- Specified by:
transformin interfaceContentCleaner- Parameters:
doc- The document- Returns:
- The input document.
-
toCleanContent
Description copied from interface:ContentCleanerConverts the document to a cleaned string.- Specified by:
toCleanContentin interfaceContentCleaner- Parameters:
doc- The document.- Returns:
- The clean content.
-
init
Description copied from interface:ContentCleanerInitialize the cleaner.- Specified by:
initin interfaceContentCleaner- Parameters:
props- The properties.
-
getDocumentProtocol
Gets the protocol to be prepended to image source that start with//.- Parameters:
link- The link.- Returns:
- The protocol.
-
cleanAndMarkEmbeds
Marksiframesby placing aqtag before every image with theciteattribute set to the frame source. The classiframeis also added.- Parameters:
doc- The document.defaultProtocol- The default protocol.- Returns:
- The number of modifications.
-
cleanAndMarkTwitterBlockquotes
protected int cleanAndMarkTwitterBlockquotes(org.jsoup.nodes.Document doc) -
massageLinks
Cleans links, adding a protocol, if missing.Converts "mailto" links to
q.mailtowith value ascite.- Parameters:
doc- The document.defaultProtocol- The default protocol for protocol-less links.- Returns:
- The number of modifications.
-
cleanAndMarkImages
Removes any images without a source and marks images (ifwithImages=false) by placing aqtag before every image with theciteattribute set to the image source andclassset toimage.- Parameters:
doc- The document.defaultProtocol- The default protocol.- Returns:
- The number of changes.
-