Package com.attribyte.parser.model
Class Page
java.lang.Object
com.attribyte.parser.model.Page
An immutable HTML page.
- Author:
- Matt Hamer
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final com.google.common.collect.ImmutableSet<String> final com.google.common.collect.ImmutableList<Anchor> All unique anchors found in the document.final com.google.common.collect.ImmutableList<Audio> All audio streams found in the document.final StringThe author name.final StringThe canonical link to this page.static final com.google.common.collect.ImmutableSet<String> The values of link 'type' attribute recognized as feeds.final com.google.common.collect.ImmutableList<Image> All unique images found in the document.final com.google.common.collect.ImmutableList<Link> All unique links found in the document.final com.google.common.collect.ImmutableList<Audio> All audio streams found in the metadata of the document.final com.google.common.collect.ImmutableList<Image> All unique images found in the metadata of the document.final com.google.common.collect.ImmutableList<Video> All videos found in the metadata of the document.final DateThe publish time.final com.google.common.collect.ImmutableSet<Link> A set of all 'self' links discovered in the document, including the canonical link.final StringThe name of the site for this page.final StringA summary.final StringThe page title.final com.google.common.collect.ImmutableList<Video> All videos found in the document. -
Constructor Summary
ConstructorsConstructorDescriptionPage(org.jsoup.nodes.Document doc, String canonicalLink, Collection<Link> selfLinks, String siteName, String title, String summary, String author, Date publishTime, Collection<Anchor> anchors, Collection<Image> images, Collection<Image> metaImages, Collection<Video> videos, Collection<Video> metaVideos, Collection<Audio> audios, Collection<Audio> metaAudios, Collection<Link> links) Creates a page. -
Method Summary
Modifier and TypeMethodDescriptionGets a list of all anchors that link to external sites in the order they appear on the page.Gets a list of potential feed links.Gets a list of icon links.Gets a list of links with specified relation and media type.voidwriteDebug(File debugOutputDir) Write debug output.
-
Field Details
-
canonicalLink
The canonical link to this page. -
selfLinks
A set of all 'self' links discovered in the document, including the canonical link. -
siteName
The name of the site for this page. -
title
The page title. -
summary
A summary. -
author
The author name. -
publishTime
The publish time. -
anchors
All unique anchors found in the document. -
metaImages
All unique images found in the metadata of the document. -
images
All unique images found in the document. -
links
All unique links found in the document. -
metaVideos
All videos found in the metadata of the document. -
videos
All videos found in the document. -
metaAudios
All audio streams found in the metadata of the document. -
audios
All audio streams found in the document. -
feedTypes
The values of link 'type' attribute recognized as feeds. -
altFeedTypes
-
-
Constructor Details
-
Page
public Page(org.jsoup.nodes.Document doc, String canonicalLink, Collection<Link> selfLinks, String siteName, String title, String summary, String author, Date publishTime, Collection<Anchor> anchors, Collection<Image> images, Collection<Image> metaImages, Collection<Video> videos, Collection<Video> metaVideos, Collection<Audio> audios, Collection<Audio> metaAudios, Collection<Link> links) Creates a page.- Parameters:
doc- The parsed document.canonicalLink- The canonical link.selfLinks- A collection of self-links.siteName- The site name.title- The title.summary- The summary.author- The author.publishTime- The publish time.anchors- A collection of anchors.images- A collection of images.metaImages- A collection of images found in metadata.videos- A collection of videos.metaVideos- A collection of videos found in metadata.audios- A collection of audio streams.metaAudios- A collection of audio streams found in metadata.links- A collection of links.
-
-
Method Details
-
writeDebug
Write debug output.- Parameters:
debugOutputDir- The output directory.- Throws:
IOException- On write error.
-
links
Gets a list of links with specified relation and media type.- Parameters:
rel- The relation. May benullto match any.type- The media type. May benullto match any.- Returns:
- The list of links.
-
externalAnchors
Gets a list of all anchors that link to external sites in the order they appear on the page.- Returns:
- The list of external anchors.
-
feedLinks
Gets a list of potential feed links.Links with known feed 'type' attributes are added first in the order they apper. Links with 'alternate' as the relationship are added next.
- Returns:
- The list of feed links.
-
iconLinks
Gets a list of icon links.- Returns:
- The list if icon links in the order they appear.
-