Class StartTagType
- Direct Known Subclasses:
StartTagTypeGenericImplementation
A start tag type is any TagType
that starts with the character '<
'
(as with all tag types), but whose second character is not '/
'.
This includes types for many tags which stand alone, without a corresponding end tag, and would not intuitively be categorised as a "start tag". For example, an HTML comment in a document is represented as a single start tag that spans the whole comment, and does not have an end tag at all.
The singleton instances of all the standard start tag types are available in this class as static fields.
Because all StartTagType
instaces must be singletons, the '==
' operator can be used to test for a particular tag type
instead of the equals(Object)
method.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StartTagType
The tag type given to a CDATA section (<![CDATA[ ... ]]>
).static final StartTagType
The tag type given to an HTML comment (<!-- ... -->
).static final StartTagType
The tag type given to a document type declaration (<!DOCTYPE ... >
).static final StartTagType
The tag type given to a markup declaration (<!ELEMENT ... >
|<!ATTLIST ... >
|<!ENTITY ... >
|<!NOTATION ... >
).static final StartTagType
The tag type given to a normal HTML or XML start tag (<name ... >
).static final StartTagType
The tag type given to a common server tag (<% ... %>
).static final StartTagType
The tag type given to a common server comment tag (<%-- ... --%>
).static final StartTagType
The tag type given to an escaped common server tag (<\% ... %>
).static final StartTagType
static final StartTagType
The tag type given to an XML declaration (<?xml ... ?>
).static final StartTagType
The tag type given to an XML processing instruction (<?PITarget ... ?>
). -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
StartTagType
(String description, String startDelimiter, String closingDelimiter, EndTagType correspondingEndTagType, boolean isServerTag, boolean hasAttributes, boolean isNameAfterPrefixRequired) Constructs a newStartTagType
object with the specified properties. -
Method Summary
Modifier and TypeMethodDescriptionboolean
atEndOfAttributes
(Source source, int pos, boolean isClosingSlashIgnored) Indicates whether the specified source document position is at the end of a tag's attributes.protected final StartTag
constructStartTag
(Source source, int begin, int end, String name, Attributes attributes) Internal method for the construction of aStartTag
object if this type.final EndTagType
final boolean
Indicates whether a start tag of this type contains attributes.final boolean
Indicates whether a valid XML tag name is required directly after the prefix.protected final Attributes
parseAttributes
(Source source, int startTagBegin, String tagName) Internal method for the parsing ofAttributes
.Methods inherited from class net.htmlparser.jericho.TagType
constructTagAt, deregister, getClosingDelimiter, getDescription, getNamePrefix, getRegisteredTagTypes, getStartDelimiter, getTagTypesIgnoringEnclosedMarkup, isServerTag, isValidPosition, register, setTagTypesIgnoringEnclosedMarkup, tagEncloses, toString
-
Field Details
-
UNREGISTERED
The tag type given to an unregistered start tag (< ... >
).See the documentation of the
Tag.isUnregistered()
method for details.- Properties:
-
Property Value Description
unregistered StartDelimiter
<
ClosingDelimiter
>
IsServerTag
false
NamePrefix
(empty string) CorrespondingEndTagType
null
HasAttributes
false
IsNameAfterPrefixRequired
false
- Example:
<"This is not recognised as any of the predefined tag types in this library">
- See Also:
-
NORMAL
The tag type given to a normal HTML or XML start tag (<name ... >
).- Properties:
-
Property Value Description
normal StartDelimiter
<
ClosingDelimiter
>
IsServerTag
false
NamePrefix
(empty string) CorrespondingEndTagType
EndTagType.NORMAL
HasAttributes
true
IsNameAfterPrefixRequired
true
- Example:
<div class="NormalDivTag">
-
COMMENT
The tag type given to an HTML comment (<!-- ... -->
).An HTML comment is an area of the source document enclosed by the delimiters
<!--
on the left and-->
on the right.The HTML 4.01 specification section 3.2.4 states that the end of comment delimiter may contain white space between the "
--
" and ">
" characters, but this library does not recognise end of comment delimiters containing white space.In the default configuration, any non-server tag appearing within an HTML comment is ignored by the parser. See the documentation of the tag parsing process for more information.
- Properties:
-
Property Value Description
comment StartDelimiter
<!--
ClosingDelimiter
-->
IsServerTag
false
NamePrefix
!--
CorrespondingEndTagType
null
HasAttributes
false
IsNameAfterPrefixRequired
false
- Example:
<!-- This is a comment -->
-
XML_DECLARATION
The tag type given to an XML declaration (<?xml ... ?>
).An XML declaration is often referred to in texts as a special type of processing instruction with the reserved PITarget name of "
xml
". Technically it is not an XML processing instruction at all, but is still a type of SGML processing instruction.According to section 2.8 of the XML 1.0 specification, a valid XML declaration can specify only "version", "encoding" and "standalone" attributes in that order. This library parses the attributes of an XML declaration in the same way as those of a normal tag, without checking that they conform to the specification.
- Properties:
-
Property Value Description
XML declaration StartDelimiter
<?xml
ClosingDelimiter
?>
IsServerTag
false
NamePrefix
?xml
CorrespondingEndTagType
null
HasAttributes
true
IsNameAfterPrefixRequired
false
- Example:
<?xml version="1.0" encoding="UTF-8"?>
-
XML_PROCESSING_INSTRUCTION
The tag type given to an XML processing instruction (<?PITarget ... ?>
).An XML processing instruction is a specific form of SGML processing instruction with the following two additional constraints:
- it must be closed with '
?>
' instead of just a single '>
' character. - it requires a PITarget
(essentially a name following the '
<?
' start delimiter).
This library does not include a predefined generic tag type for SGML processing instructions as the only forms in which they are found in HTML documents are the more specific XML processing instruction and the XML declaration, both of which have their own dedicated predefined tag type.
There is no restriction on the contents of an XML processing instruction. In particular, it can not be assumed that the processing instruction contains attributes, in contrast to the XML declaration.
Note that registering the
PHPTagTypes.PHP_SHORT
tag type overrides this tag type. This is because they both have the same start delimiter, so the one registered latest takes precedence over the other. See the documentation of thePHPTagTypes
class for more information.- Properties:
-
Property Value Description
XML processing instruction StartDelimiter
<?
ClosingDelimiter
?>
IsServerTag
false
NamePrefix
?
CorrespondingEndTagType
null
HasAttributes
false
IsNameAfterPrefixRequired
true
- Example:
<?xml-stylesheet href="standardstyle.css" type="text/css"?>
- it must be closed with '
-
DOCTYPE_DECLARATION
The tag type given to a document type declaration (<!DOCTYPE ... >
).Information about the document type declaration can be found in the HTML 4.01 specification section 7.2, and the XML 1.0 specification section 2.8.
The "
!DOCTYPE
" tag name is required to be in upper case in the source document, but all tag properties are stored in lower case because this library performs all parsing in lower case.- Properties:
-
Property Value Description
document type declaration StartDelimiter
<!doctype
ClosingDelimiter
>
IsServerTag
false
NamePrefix
!doctype
CorrespondingEndTagType
null
HasAttributes
false
IsNameAfterPrefixRequired
false
- Example:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
-
MARKUP_DECLARATION
The tag type given to a markup declaration (<!ELEMENT ... >
|<!ATTLIST ... >
|<!ENTITY ... >
|<!NOTATION ... >
).The name of a markup declaration tag is must be one of "
!element
", "!attlist
", "!entity
" or "!notation
". These tag names are required to be in upper case in the source document, but all tag properties are stored in lower case because this library performs all parsing in lower case.Markup declarations usually appear inside a document type definition (DTD), which is usually an external document to the HTML or XML document, but they can also appear directly within the document type declaration which is why they must be recognised by the parser.
- Properties:
-
Property Value Description
markup declaration StartDelimiter
<!
ClosingDelimiter
>
IsServerTag
false
NamePrefix
!
CorrespondingEndTagType
null
HasAttributes
false
IsNameAfterPrefixRequired
true
- Example:
<!ELEMENT BODY O O (%flow;)* +(INS|DEL) -- document body -->
-
CDATA_SECTION
The tag type given to a CDATA section (<![CDATA[ ... ]]>
).A CDATA section is a specific form of a marked section. This library does not include a predefined generic tag type for marked sections, as the only type of marked sections found in HTML documents are CDATA sections.
The HTML 4.01 specification section B.3.5 and the XML 1.0 specification section 2.7 contain definitions for a CDATA section.
There is inconsistency between the SGML and HTML/XML specifications in the definition of a marked section. SGML requires the presence of a space between the "
<![
" prefix and the keyword, and allows a space after the keyword. The XML specification forbids these spaces, and the examples given in the HTML specification do not include them either. This library only recognises CDATA sections that do not include the spaces.The "
![CDATA[
" tag name is required to be in upper case in the source document according to the HTML/XML specifications, but all tag properties are stored in lower case because this makes it more efficient for the library to perform case-insensitive parsing of all tags.In the default configuration, any non-server tag appearing within a CDATA section is ignored by the parser. See the documentation of the tag parsing process for more information.
- Properties:
-
Property Value Description
CDATA section StartDelimiter
<![cdata[
ClosingDelimiter
]]>
IsServerTag
false
NamePrefix
![cdata[
CorrespondingEndTagType
null
HasAttributes
false
IsNameAfterPrefixRequired
false
- Example:
- This example shows the recommended practice of enclosing scripts inside a CDATA section:
<script type="text/javascript">
//<![CDATA[
function min(a,b) {return a<b ? a : b;}
//]]>
</script>
-
SERVER_COMMON
The tag type given to a common server tag (<% ... %>
).Common server tags include ASP, JSP, PSP, ASP-style PHP, eRuby, and Mason substitution tags.
This tag, the escaped common server tag and the common server comment tag are the only standard tag types that define server tags. They are included as standard tag types because of the common server tag's widespread use in many platforms, including those listed above.
- Properties:
-
Property Value Description
common server tag StartDelimiter
<%
ClosingDelimiter
%>
IsServerTag
true
NamePrefix
%
CorrespondingEndTagType
null
HasAttributes
false
IsNameAfterPrefixRequired
false
- Example:
<%@ include file="header.html" %>
-
SERVER_COMMON_ESCAPED
The tag type given to an escaped common server tag (<\% ... %>
).Some of the platforms that support the common server tag also support a mechanism to escape that tag by adding a backslash (
\
) before the percent (%
) character. Although rarely used, this tag type allows the parser to recognise these escaped tags in addition to the common server tag itself.- Properties:
-
Property Value Description
escaped common server tag StartDelimiter
<\%
ClosingDelimiter
%>
IsServerTag
true
NamePrefix
\%
CorrespondingEndTagType
null
HasAttributes
false
IsNameAfterPrefixRequired
false
- Example:
<\%@ include file="header.html" %>
-
SERVER_COMMON_COMMENT
The tag type given to a common server comment tag (<%-- ... --%>
).Some of the platforms that support the common server tag, such as JSP, also support a server based comment tag that allow nested server tags.
- Properties:
-
Property Value Description
common server comment tag StartDelimiter
<%--
ClosingDelimiter
--%>
IsServerTag
true
NamePrefix
%--
CorrespondingEndTagType
null
HasAttributes
false
IsNameAfterPrefixRequired
false
- Example:
<%-- this server side comment contains a <%="nested"%> server tag --%>
-
-
Constructor Details
-
StartTagType
protected StartTagType(String description, String startDelimiter, String closingDelimiter, EndTagType correspondingEndTagType, boolean isServerTag, boolean hasAttributes, boolean isNameAfterPrefixRequired) Constructs a newStartTagType
object with the specified properties.
(implementation assistance method)As
StartTagType
is an abstract class, this constructor is only called from sub-class constructors.- Parameters:
description
- a description of the new start tag type useful for debugging purposes.startDelimiter
- the start delimiter of the new start tag type.closingDelimiter
- the closing delimiter of the new start tag type.correspondingEndTagType
- the corresponding end tag type of the new start tag type.isServerTag
- indicates whether the new start tag type is a server tag.hasAttributes
- indicates whether the new start tag type has attributes.isNameAfterPrefixRequired
- indicates whether a name is required after the prefix.
-
-
Method Details
-
getCorrespondingEndTagType
Returns the type of end tag required to pair with a start tag of this type to form an element.
(property method)This can be represented by the following expression that is always
true
given an arbitrary element that has an end tag:element.
getStartTag()
.
getStartTagType()
.
getCorrespondingEndTagType()
==element.
getEndTag()
.
getEndTagType()
- Standard Tag Type Values:
-
Start Tag Type Corresponding End Tag Type UNREGISTERED
null
NORMAL
EndTagType.NORMAL
COMMENT
null
XML_DECLARATION
null
XML_PROCESSING_INSTRUCTION
null
DOCTYPE_DECLARATION
null
MARKUP_DECLARATION
null
CDATA_SECTION
null
SERVER_COMMON
null
SERVER_COMMON_ESCAPED
null
SERVER_COMMON_COMMENT
null
- Extended Tag Type Values:
-
hasAttributes
public final boolean hasAttributes()Indicates whether a start tag of this type contains attributes.
(property method)The attributes start at the end of the name and continue until the closing delimiter is encountered. If the character sequence representing the closing delimiter occurs within a quoted attribute value it is not recognised as the end of the tag.
The
atEndOfAttributes(Source, int pos, boolean isClosingSlashIgnored)
method can be overridden to provide more control over where the attributes end.- Standard Tag Type Values:
-
Start Tag Type Has Attributes UNREGISTERED
false
NORMAL
true
COMMENT
false
XML_DECLARATION
true
XML_PROCESSING_INSTRUCTION
false
DOCTYPE_DECLARATION
false
MARKUP_DECLARATION
false
CDATA_SECTION
false
SERVER_COMMON
false
SERVER_COMMON_ESCAPED
false
SERVER_COMMON_COMMENT
false
- Extended Tag Type Values:
- Returns:
true
if a start tag of this type contains attributes, otherwisefalse
.
-
isNameAfterPrefixRequired
public final boolean isNameAfterPrefixRequired()Indicates whether a valid XML tag name is required directly after the prefix.
(property method)If this property is
true
, the name of the tag consists of the prefix followed by an XML tag name.If this property is
false
, the name of the tag consists of only the prefix.- Standard Tag Type Values:
-
Start Tag Type Name After Prefix Required UNREGISTERED
false
NORMAL
true
COMMENT
false
XML_DECLARATION
false
XML_PROCESSING_INSTRUCTION
true
DOCTYPE_DECLARATION
false
MARKUP_DECLARATION
true
CDATA_SECTION
false
SERVER_COMMON
false
SERVER_COMMON_ESCAPED
false
SERVER_COMMON_COMMENT
false
- Extended Tag Type Values:
- Returns:
true
if a valid XML tag name is required directly after the prefix, otherwisefalse
.
-
atEndOfAttributes
Indicates whether the specified source document position is at the end of a tag's attributes.
(default implementation method)This method is called internally while parsing attributes to detect where they should end.
It can be assumed that the specified position is not inside a quoted attribute value.
The default implementation simply compares the parse text at the specified position with the closing delimiter, and is equivalent to:
source.
getParseText()
.containsAt(
getClosingDelimiter()
,pos)
The
isClosingSlashIgnored
parameter is only relevant in theNORMAL
start tag type, which makes use of it to cater for the '/
' character that can occur before the closing delimiter in empty-element tags. It's value is alwaysfalse
when passed to other start tag types.- Parameters:
source
- theSource
document.pos
- the character position in the source document.isClosingSlashIgnored
- indicates whether the name of the start tag being tested is incompatible with an empty-element tag.- Returns:
true
if the specified source document position is at the end of a tag's attributes, otherwisefalse
.
-
constructStartTag
protected final StartTag constructStartTag(Source source, int begin, int end, String name, Attributes attributes) Internal method for the construction of aStartTag
object if this type.
(implementation assistance method)Intended for use from within the
constructTagAt(Source, int pos)
method. -
parseAttributes
Internal method for the parsing ofAttributes
.
(implementation assistance method)Intended for use from within the
constructTagAt(Source, int pos)
method.The returned
Attributes
segment begins atstartTagBegin+1+tagName.length()
, and ends straight after the last attribute found before the tag's closing delimiter.Only returns
null
if the segment contains a major syntactical error or more than the default maximum number of minor syntactical errors.- Parameters:
source
- theSource
document.startTagBegin
- the position in the source document at which the start tag is to begin.tagName
- the name of the start tag to be constructed.- Returns:
- the
Attributes
of the start tag to be constructed, ornull
if too many errors occur while parsing.
-