Class Page

java.lang.Object
com.attribyte.parser.model.Page

public class Page extends Object
An immutable HTML page.
Author:
Matt Hamer
  • Field Details

    • siteName

      public final String siteName
      The name of the site for this page.
    • title

      public final String title
      The page title.
    • summary

      public final String summary
      A summary.
    • author

      public final String author
      The author name.
    • publishTime

      public final Date publishTime
      The publish time.
    • anchors

      public final com.google.common.collect.ImmutableList<Anchor> anchors
      All unique anchors found in the document.
    • metaImages

      public final com.google.common.collect.ImmutableList<Image> metaImages
      All unique images found in the metadata of the document.
    • images

      public final com.google.common.collect.ImmutableList<Image> images
      All unique images found in the document.
    • metaVideos

      public final com.google.common.collect.ImmutableList<Video> metaVideos
      All videos found in the metadata of the document.
    • videos

      public final com.google.common.collect.ImmutableList<Video> videos
      All videos found in the document.
    • metaAudios

      public final com.google.common.collect.ImmutableList<Audio> metaAudios
      All audio streams found in the metadata of the document.
    • audios

      public final com.google.common.collect.ImmutableList<Audio> audios
      All audio streams found in the document.
    • feedTypes

      public static final com.google.common.collect.ImmutableSet<String> feedTypes
      The values of link 'type' attribute recognized as feeds.
    • altFeedTypes

      public static final com.google.common.collect.ImmutableSet<String> altFeedTypes
  • Constructor Details

    • Page

      public Page(org.jsoup.nodes.Document doc, String canonicalLink, Collection<Link> selfLinks, String siteName, String title, String summary, String author, Date publishTime, Collection<Anchor> anchors, Collection<Image> images, Collection<Image> metaImages, Collection<Video> videos, Collection<Video> metaVideos, Collection<Audio> audios, Collection<Audio> metaAudios, Collection<Link> links)
      Creates a page.
      Parameters:
      doc - The parsed document.
      canonicalLink - The canonical link.
      selfLinks - A collection of self-links.
      siteName - The site name.
      title - The title.
      summary - The summary.
      author - The author.
      publishTime - The publish time.
      anchors - A collection of anchors.
      images - A collection of images.
      metaImages - A collection of images found in metadata.
      videos - A collection of videos.
      metaVideos - A collection of videos found in metadata.
      audios - A collection of audio streams.
      metaAudios - A collection of audio streams found in metadata.
      links - A collection of links.
  • Method Details

    • writeDebug

      public void writeDebug(File debugOutputDir) throws IOException
      Write debug output.
      Parameters:
      debugOutputDir - The output directory.
      Throws:
      IOException - On write error.
    • links

      public List<Link> links(String rel, String type)
      Gets a list of links with specified relation and media type.
      Parameters:
      rel - The relation. May be null to match any.
      type - The media type. May be null to match any.
      Returns:
      The list of links.
    • externalAnchors

      public List<Anchor> externalAnchors()
      Gets a list of all anchors that link to external sites in the order they appear on the page.
      Returns:
      The list of external anchors.
    • feedLinks

      public List<Link> feedLinks()
      Gets a list of potential feed links.

      Links with known feed 'type' attributes are added first in the order they apper. Links with 'alternate' as the relationship are added next.

      Returns:
      The list of feed links.
    • iconLinks

      public List<Link> iconLinks()
      Gets a list of icon links.
      Returns:
      The list if icon links in the order they appear.