Class NumericCharacterReference
- All Implemented Interfaces:
CharSequence
,Comparable<Segment>
A numeric character reference can be one of two types:
- Decimal Character Reference
- A numeric character reference specifying the unicode code point in decimal notation.
This is signified by the absence of an 'x
' character after the '#
', (eg ">
"). - Hexadecimal Character Reference
- A numeric character reference specifying the unicode code point in hexadecimal notation.
This is signified by the presence of an 'x
' character after the '#
', (eg ">
").
Static methods to encode and decode strings
and single characters can be found in the CharacterReference
superclass.
NumericCharacterReference
instances are obtained using one of the following methods:
- See Also:
-
Field Summary
Fields inherited from class net.htmlparser.jericho.CharacterReference
INVALID_CODE_POINT
-
Method Summary
Modifier and TypeMethodDescriptionstatic String
encode
(CharSequence unencodedText) Encodes the specified text, escaping special characters into numeric character references.static String
encodeDecimal
(CharSequence unencodedText) Encodes the specified text, escaping special characters into decimal character references.static String
encodeHexadecimal
(CharSequence unencodedText) Encodes the specified text, escaping special characters into hexadecimal character references.Returns the correct encoded form of this numeric character reference.static String
getCharacterReferenceString
(int codePoint) Returns the numeric character reference encoded form of the specified unicode code point.Returns a string representation of this object useful for debugging purposes.boolean
Indicates whether this numeric character reference specifies the unicode code point in decimal format.boolean
Indicates whether this numeric character reference specifies the unicode code point in hexadecimal format.Methods inherited from class net.htmlparser.jericho.CharacterReference
appendCharTo, decode, decode, decodeCollapseWhiteSpace, encode, encodeWithWhiteSpaceFormatting, getChar, getCodePoint, getCodePointFromCharacterReferenceString, getDecimalCharacterReferenceString, getDecimalCharacterReferenceString, getEncodingFilterWriter, getHexadecimalCharacterReferenceString, getHexadecimalCharacterReferenceString, getUnicodeText, getUnicodeText, isTerminated, parse, reencode, requiresEncoding
Methods inherited from class net.htmlparser.jericho.Segment
charAt, compareTo, encloses, encloses, equals, getAllCharacterReferences, getAllElements, getAllElements, getAllElements, getAllElements, getAllElements, getAllElementsByClass, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTagsByClass, getAllTags, getAllTags, getBegin, getChildElements, getEnd, getFirstElement, getFirstElement, getFirstElement, getFirstElement, getFirstElementByClass, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTagByClass, getFormControls, getFormFields, getMaxDepthIndicator, getNodeIterator, getRenderer, getRowColumnVector, getSource, getStyleURISegments, getTextExtractor, getURIAttributes, hashCode, ignoreWhenParsing, isWhiteSpace, isWhiteSpace, length, parseAttributes, subSequence, toString
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.CharSequence
chars, codePoints, isEmpty
-
Method Details
-
isDecimal
public boolean isDecimal()Indicates whether this numeric character reference specifies the unicode code point in decimal format.A numeric character reference in decimal format is referred to in this library as a decimal character reference.
- Returns:
true
if this numeric character reference specifies the unicode code point in decimal format, otherwisefalse
.- See Also:
-
isHexadecimal
public boolean isHexadecimal()Indicates whether this numeric character reference specifies the unicode code point in hexadecimal format.A numeric character reference in hexadecimal format is referred to in this library as a hexadecimal character reference.
- Returns:
true
if this numeric character reference specifies the unicode code point in hexadecimal format, otherwisefalse
.- See Also:
-
encode
Encodes the specified text, escaping special characters into numeric character references.Each character is encoded only if the
requiresEncoding(char)
method would returntrue
for that character.This method encodes all character references in decimal format, and is exactly the same as calling
encodeDecimal(CharSequence)
.To encode text using both character entity references and numeric character references, use the
CharacterReference.encode(CharSequence)
method instead.To encode text using hexadecimal character references only, use the
encodeHexadecimal(CharSequence)
method instead.- Parameters:
unencodedText
- the text to encode.- Returns:
- the encoded string.
- See Also:
-
encodeDecimal
Encodes the specified text, escaping special characters into decimal character references.Each character is encoded only if the
requiresEncoding(char)
method would returntrue
for that character.To encode text using both character entity references and numeric character references, use the
CharacterReference.encode(CharSequence)
method instead.To encode text using hexadecimal character references only, use the
encodeHexadecimal(CharSequence)
method instead.- Parameters:
unencodedText
- the text to encode.- Returns:
- the encoded string.
- See Also:
-
encodeHexadecimal
Encodes the specified text, escaping special characters into hexadecimal character references.Each character is encoded only if the
requiresEncoding(char)
method would returntrue
for that character.To encode text using both character entity references and numeric character references, use the
CharacterReference.encode(CharSequence)
method instead.To encode text using decimal character references only, use the
encodeDecimal(CharSequence)
method instead.- Parameters:
unencodedText
- the text to encode.- Returns:
- the encoded string.
- See Also:
-
getCharacterReferenceString
Returns the correct encoded form of this numeric character reference.The returned string uses the same radix as the original character reference in the source document, i.e. decimal format if
isDecimal()
istrue
, and hexadecimal format ifisHexadecimal()
istrue
.Note that the returned string is not necessarily the same as the original source text used to create this object. This library recognises certain invalid forms of character references, as detailed in the
decode(CharSequence)
method.To retrieve the original source text, use the
toString()
method instead.- Example:
CharacterReference.parse(">").getCharacterReferenceString()
returns ">
"
- Specified by:
getCharacterReferenceString
in classCharacterReference
- Returns:
- the correct encoded form of this numeric character reference.
- See Also:
-
getCharacterReferenceString
Returns the numeric character reference encoded form of the specified unicode code point.This method returns the character reference in decimal format, and is exactly the same as calling
CharacterReference.getDecimalCharacterReferenceString(int codePoint)
.To get either the character entity reference or numeric character reference, use the
CharacterReference.getCharacterReferenceString(int codePoint)
method instead.To get the character reference in hexadecimal format, use the
CharacterReference.getHexadecimalCharacterReferenceString(int codePoint)
method instead.- Examples:
NumericCharacterReference.getCharacterReferenceString(62)
returns ">
"NumericCharacterReference.getCharacterReferenceString('>')
returns ">
"
- Returns:
- the numeric character reference encoded form of the specified unicode code point.
- See Also:
-
getDebugInfo
Description copied from class:Segment
Returns a string representation of this object useful for debugging purposes.- Overrides:
getDebugInfo
in classSegment
- Returns:
- a string representation of this object useful for debugging purposes.
-