Class Document

All Implemented Interfaces:
Cloneable

public class Document extends Element
A HTML Document.
  • Field Details

  • Constructor Details

  • Method Details

    • createShell

      public static Document createShell(String baseUri)
      Create a valid, empty shell of a document, suitable for adding more elements to.
      Parameters:
      baseUri - baseUri of document
      Returns:
      document with html, head, and body elements.
    • location

      public String location()
      Get the URL this Document was parsed from. If the starting URL is a redirect, this will return the final URL from which the document was served from.
      Returns:
      location
    • documentType

      public DocumentType documentType()
      Returns this Document's doctype.
      Returns:
      document type, or null if not set
    • head

      public Element head()
      Accessor to the document's head element.
      Returns:
      head
    • body

      public Element body()
      Accessor to the document's body element.
      Returns:
      body
    • title

      public String title()
      Get the string contents of the document's title element.
      Returns:
      Trimmed title, or empty string if none set.
    • title

      public void title(String title)
      Set the document's title element. Updates the existing element, or adds title to head if not present
      Parameters:
      title - string to set as title
    • createElement

      public Element createElement(String tagName)
      Create a new Element, with this document's base uri. Does not make the new element a child of this document.
      Parameters:
      tagName - element tag name (e.g. a)
      Returns:
      new element
    • normalise

      public Document normalise()
      Normalise the document. This happens after the parse phase so generally does not need to be called. Moves any text content that is not in the body element into the body.
      Returns:
      this document after normalisation
    • normaliseTextNodes

      private void normaliseTextNodes(Element element)
    • normaliseStructure

      private void normaliseStructure(String tag, Element htmlEl)
    • findFirstElementByTagName

      private Element findFirstElementByTagName(String tag, Node node)
    • outerHtml

      public String outerHtml()
      Description copied from class: Node
      Get the outer HTML of this node. For example, on a p element, may return <p>Para</p>.
      Overrides:
      outerHtml in class Node
      Returns:
      outer HTML
      See Also:
    • text

      public Element text(String text)
      Set the text of the body of this document. Any existing nodes within the body will be cleared.
      Overrides:
      text in class Element
      Parameters:
      text - unencoded text
      Returns:
      this document
    • nodeName

      public String nodeName()
      Description copied from class: Node
      Get the node name of this node. Use for debugging purposes and not logic switching (for that, use instanceof).
      Overrides:
      nodeName in class Element
      Returns:
      node name
    • charset

      public void charset(Charset charset)
      Sets the charset used in this document. This method is equivalent to OutputSettings.charset(Charset) but in addition it updates the charset / encoding element within the document.

      This enables meta charset update.

      If there's no element with charset / encoding information yet it will be created. Obsolete charset / encoding definitions are removed!

      Elements used:

      • Html: <meta charset="CHARSET">
      • Xml: <?xml version="1.0" encoding="CHARSET">
      Parameters:
      charset - Charset
      See Also:
    • charset

      public Charset charset()
      Returns the charset used in this document. This method is equivalent to Document.OutputSettings.charset().
      Returns:
      Current Charset
      See Also:
    • updateMetaCharsetElement

      public void updateMetaCharsetElement(boolean update)
      Sets whether the element with charset information in this document is updated on changes through Document.charset(Charset) or not.

      If set to false (default) there are no elements modified.

      Parameters:
      update - If true the element updated on charset changes, false if not
      See Also:
    • updateMetaCharsetElement

      public boolean updateMetaCharsetElement()
      Returns whether the element with charset information in this document is updated on changes through Document.charset(Charset) or not.
      Returns:
      Returns true if the element is updated on charset changes, false if not
    • clone

      public Document clone()
      Description copied from class: Node
      Create a stand-alone, deep copy of this node, and all of its children. The cloned node will have no siblings or parent node. As a stand-alone object, any changes made to the clone or any of its children will not impact the original node.

      The cloned node may be adopted into another Document or node structure using Element.appendChild(Node).

      Overrides:
      clone in class Element
      Returns:
      a stand-alone cloned node, including clones of any children
      See Also:
    • ensureMetaCharsetElement

      private void ensureMetaCharsetElement()
      Ensures a meta charset (html) or xml declaration (xml) with the current encoding used. This only applies with updateMetaCharset set to true, otherwise this method does nothing.
      • An existing element gets updated with the current charset
      • If there's no element yet it will be inserted
      • Obsolete elements are removed

      Elements used:

      • Html: <meta charset="CHARSET">
      • Xml: <?xml version="1.0" encoding="CHARSET">
    • outputSettings

      public Document.OutputSettings outputSettings()
      Get the document's current output settings.
      Returns:
      the document's current output settings.
    • outputSettings

      public Document outputSettings(Document.OutputSettings outputSettings)
      Set the document's output settings.
      Parameters:
      outputSettings - new output settings.
      Returns:
      this document, for chaining.
    • quirksMode

      public Document.QuirksMode quirksMode()
    • quirksMode

      public Document quirksMode(Document.QuirksMode quirksMode)
    • parser

      public Parser parser()
      Get the parser that was used to parse this document.
      Returns:
      the parser
    • parser

      public Document parser(Parser parser)
      Set the parser used to create this document. This parser is then used when further parsing within this document is required.
      Parameters:
      parser - the configured parser to use when further parsing is required for this document.
      Returns:
      this document, for chaining.