Package org.cyberneko.html
Class HTMLScanner
java.lang.Object
org.cyberneko.html.HTMLScanner
- All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponent
,org.apache.xerces.xni.parser.XMLDocumentScanner
,org.apache.xerces.xni.parser.XMLDocumentSource
,org.apache.xerces.xni.XMLLocator
,HTMLComponent
public class HTMLScanner
extends Object
implements org.apache.xerces.xni.parser.XMLDocumentScanner, org.apache.xerces.xni.XMLLocator, HTMLComponent
A simple HTML scanner. This scanner makes no attempt to balance tags
or fix other problems in the source document — it just scans what
it can and generates XNI document "events", ignoring errors of all
kinds.
This component recognizes the following features:
- http://cyberneko.org/html/features/augmentations
- http://cyberneko.org/html/features/report-errors
- http://apache.org/xml/features/scanner/notify-char-refs
- http://apache.org/xml/features/scanner/notify-builtin-refs
- http://cyberneko.org/html/features/scanner/notify-builtin-refs
- http://cyberneko.org/html/features/scanner/fix-mswindows-refs
- http://cyberneko.org/html/features/scanner/script/strip-cdata-delims
- http://cyberneko.org/html/features/scanner/script/strip-comment-delims
- http://cyberneko.org/html/features/scanner/style/strip-cdata-delims
- http://cyberneko.org/html/features/scanner/style/strip-comment-delims
- http://cyberneko.org/html/features/scanner/ignore-specified-charset
- http://cyberneko.org/html/features/scanner/cdata-sections
- http://cyberneko.org/html/features/override-doctype
- http://cyberneko.org/html/features/insert-doctype
- http://cyberneko.org/html/features/parse-noscript-content
- http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe
- http://cyberneko.org/html/features/scanner/allow-selfclosing-tags
This component recognizes the following properties:
- http://cyberneko.org/html/properties/names/elems
- http://cyberneko.org/html/properties/names/attrs
- http://cyberneko.org/html/properties/default-encoding
- http://cyberneko.org/html/properties/error-reporter
- http://cyberneko.org/html/properties/doctype/pubid
- http://cyberneko.org/html/properties/doctype/sysid
- Version:
- $Id: HTMLScanner.java,v 1.19 2005/06/14 05:52:37 andyc Exp $
- Author:
- Andy Clark, Marc Guillemot, Ahmed Ashour
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclass
The primary HTML document scanner.static class
Current entity.protected static class
Location infoset item.static class
A playback input stream.static interface
Basic scanner interface.class
Special scanner used for elements whose content needs to be scanned as plain text, ignoring markup such as elements and entity references. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String
Allows self closing <iframe/> tagstatic final String
Allows self closing tags e.g.protected static final String
Include infoset augmentations.static final String
Scan CDATA sections.protected static final boolean
Set to true to debug callbacks.protected static final int
Default buffer size.protected static final String
Default encoding.protected static final String
Doctype declaration public identifier.protected static final String
Doctype declaration system identifier.protected static final String
Error reporter.protected boolean
Allows self closing iframe tags.protected boolean
Allows self closing tags.protected boolean
Augmentations.protected int
Beginning character offset in the file.protected int
Beginning column number.protected int
Beginning line number.protected HTMLScanner.PlaybackInputStream
The playback byte stream.protected boolean
CDATA sections.protected HTMLScanner.Scanner
Content scanner.protected HTMLScanner.CurrentEntity
Current entity.protected final Stack
The current entity stack.protected String
Default encoding.protected String
Doctype declaration public identifier.protected String
Doctype declaration system identifier.protected org.apache.xerces.xni.XMLDocumentHandler
The document handler.protected int
Element count.protected int
Element depth.protected int
Ending character offset in the file.protected int
Ending column number.protected int
Ending line number.protected HTMLErrorReporter
Error reporter.protected boolean
Fix Microsoft Windows® character entity references.protected String
Auto-detected IANA encoding.protected boolean
Ignore specified character set.protected boolean
Insert document type declaration.protected boolean
True if the encoding matches "ISO-8859-*".static final String
Fix Microsoft Windows® character entity references.protected String
Auto-detected Java encoding.protected short
Modify HTML attribute names.protected short
Modify HTML element names.protected boolean
Normalize attribute values.protected boolean
Notify character entity references.protected boolean
Notify HTML built-in general entity references.protected boolean
Notify XML built-in general entity references.protected boolean
Override doctype declaration public and system identifiers.protected boolean
Parse noframes content.protected boolean
Parse noscript content.protected boolean
Report errors.protected HTMLScanner.Scanner
The current scanner.protected short
The current scanner state.protected boolean
Strip CDATA delimiters from SCRIPT tags.protected boolean
Strip comment delimiters from SCRIPT tags.protected HTMLScanner.SpecialScanner
Special scanner used for elements whose content needs to be scanned as plain text, ignoring markup such as elements and entity references.protected final org.apache.xerces.util.XMLStringBuffer
String buffer.protected boolean
Strip CDATA delimiters from STYLE tags.protected boolean
Strip comment delimiters from STYLE tags.static final String
HTML 4.01 frameset public identifier ("-//W3C//DTD HTML 4.01 Frameset//EN").static final String
HTML 4.01 frameset system identifier ("http://www.w3.org/TR/html4/frameset.dtd").static final String
HTML 4.01 strict public identifier ("-//W3C//DTD HTML 4.01//EN").static final String
HTML 4.01 strict system identifier ("http://www.w3.org/TR/html4/strict.dtd").static final String
HTML 4.01 transitional public identifier ("-//W3C//DTD HTML 4.01 Transitional//EN").static final String
HTML 4.01 transitional system identifier ("http://www.w3.org/TR/html4/loose.dtd").static final String
Ignore specified charset found in the <meta equiv='Content-Type' content='text/html;charset=…'> tag or in the <?xml … encoding='…'> processing instructionstatic final String
Insert document type declaration.protected static final String
Modify HTML attribute names: { "upper", "lower", "default" }.protected static final String
Modify HTML element names: { "upper", "lower", "default" }.protected static final short
Lowercase HTML names.protected static final short
Don't modify HTML names.protected static final short
Uppercase HTML names.protected static final String
Normalize attribute values.static final String
Notify character entity references (e.g.static final String
Notify handler of built-in entity references (e.g.static final String
Notify handler of built-in entity references (e.g.static final String
Override doctype declaration public and system identifiers.static final String
Parse <noscript>...</noscript> contentprotected static final String
Report errors.static final String
Strip XHTML CDATA delimiters ("<![CDATA[" and "]]>") from SCRIPT tag contents.static final String
Strip HTML comment delimiters ("<!−−" and "−−>") from SCRIPT tag contents.protected static final short
State: content.protected static final short
State: end document.protected static final short
State: markup bracket.protected static final short
State: start document.static final String
Strip XHTML CDATA delimiters ("<![CDATA[" and "]]>") from STYLE tag contents.static final String
Strip HTML comment delimiters ("<!−−" and "−−>") from STYLE tag contents.protected static final HTMLEventInfo
Synthesized event info item. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected static boolean
builtinXmlRef
(String name) Returns true if the name is a built-in XML general entity reference.void
cleanup
(boolean closeall) Cleans up used resources.void
evaluateInputSource
(org.apache.xerces.xni.parser.XMLInputSource inputSource) Immediately evaluates an input source and add the new content (e.g.static String
expandSystemId
(String systemId, String baseSystemId) Expands a system id and returns the system id as a URI, if it can be expanded.protected static String
Fixes a platform dependent filename to standard URI form.protected int
fixWindowsCharacter
(int origChar) Fixes Microsoft Windows® specific characters.Returns the base system identifier.int
Returns the character offset.int
Returns the current column number.org.apache.xerces.xni.XMLDocumentHandler
Returns the document handler.Returns the encoding.Returns the expanded system identifier.getFeatureDefault
(String featureId) Returns the default state for a feature.int
Returns the current line number.Returns the literal system identifier.protected static final short
getNamesValue
(String value) Converts HTML names string value to constant value.getPropertyDefault
(String propertyId) Returns the default state for a property.Returns the public identifier.String[]
Returns recognized features.String[]
Returns recognized properties.protected static String
Returns the value of the specified attribute, ignoring case.Returns the XML version.protected final org.apache.xerces.xni.Augmentations
Returns an augmentations object with a location item added.protected static final String
modifyName
(String name, short mode) Modifies the given name based on the specified mode.void
pushInputSource
(org.apache.xerces.xni.parser.XMLInputSource inputSource) Pushes an input source onto the current entity stack.protected int
read()
Reads a single character.protected int
Reads a single character, preserving the old buffer contentvoid
reset
(org.apache.xerces.xni.parser.XMLComponentManager manager) Resets the component.protected final org.apache.xerces.xni.XMLResourceIdentifier
Returns an empty resource identifier.protected void
Scans a DOCTYPE line.boolean
scanDocument
(boolean complete) Scans the document.protected int
scanEntityRef
(org.apache.xerces.util.XMLStringBuffer str, boolean content) Scans an entity reference.protected String
Scans a quoted literal.protected String
scanName
(boolean strict) Scans a name.void
setDocumentHandler
(org.apache.xerces.xni.XMLDocumentHandler handler) Sets the document handler.void
setFeature
(String featureId, boolean state) Sets a feature.void
setInputSource
(org.apache.xerces.xni.parser.XMLInputSource source) Sets the input source.void
setProperty
(String propertyId, Object value) Sets a property.protected void
setScanner
(HTMLScanner.Scanner scanner) Sets the scanner.protected void
setScannerState
(short state) Sets the scanner state.protected boolean
Returns true if the specified text is present and is skipped.protected boolean
skipMarkup
(boolean balance) Skips markup.protected int
Skips newlines and returns the number of newlines skipped.protected boolean
Skips whitespace.protected final org.apache.xerces.xni.Augmentations
Returns an augmentations object with a synthesized item added.
-
Field Details
-
HTML_4_01_STRICT_PUBID
HTML 4.01 strict public identifier ("-//W3C//DTD HTML 4.01//EN").- See Also:
-
HTML_4_01_STRICT_SYSID
HTML 4.01 strict system identifier ("http://www.w3.org/TR/html4/strict.dtd").- See Also:
-
HTML_4_01_TRANSITIONAL_PUBID
HTML 4.01 transitional public identifier ("-//W3C//DTD HTML 4.01 Transitional//EN").- See Also:
-
HTML_4_01_TRANSITIONAL_SYSID
HTML 4.01 transitional system identifier ("http://www.w3.org/TR/html4/loose.dtd").- See Also:
-
HTML_4_01_FRAMESET_PUBID
HTML 4.01 frameset public identifier ("-//W3C//DTD HTML 4.01 Frameset//EN").- See Also:
-
HTML_4_01_FRAMESET_SYSID
HTML 4.01 frameset system identifier ("http://www.w3.org/TR/html4/frameset.dtd").- See Also:
-
AUGMENTATIONS
Include infoset augmentations.- See Also:
-
REPORT_ERRORS
Report errors.- See Also:
-
NOTIFY_CHAR_REFS
Notify character entity references (e.g.  ,  , etc).- See Also:
-
NOTIFY_XML_BUILTIN_REFS
Notify handler of built-in entity references (e.g. &, <, etc).Note: This only applies to the five pre-defined XML general entities. Specifically, "amp", "lt", "gt", "quot", and "apos". This is done for compatibility with the Xerces feature.
To be notified of the built-in entity references in HTML, set the
http://cyberneko.org/html/features/scanner/notify-builtin-refs
feature totrue
.- See Also:
-
NOTIFY_HTML_BUILTIN_REFS
Notify handler of built-in entity references (e.g. &nobr;, ©, etc).Note: This includes the five pre-defined XML general entities.
- See Also:
-
FIX_MSWINDOWS_REFS
Fix Microsoft Windows® character entity references.- See Also:
-
SCRIPT_STRIP_COMMENT_DELIMS
Strip HTML comment delimiters ("<!−−" and "−−>") from SCRIPT tag contents.- See Also:
-
SCRIPT_STRIP_CDATA_DELIMS
Strip XHTML CDATA delimiters ("<![CDATA[" and "]]>") from SCRIPT tag contents.- See Also:
-
STYLE_STRIP_COMMENT_DELIMS
Strip HTML comment delimiters ("<!−−" and "−−>") from STYLE tag contents.- See Also:
-
STYLE_STRIP_CDATA_DELIMS
Strip XHTML CDATA delimiters ("<![CDATA[" and "]]>") from STYLE tag contents.- See Also:
-
IGNORE_SPECIFIED_CHARSET
Ignore specified charset found in the <meta equiv='Content-Type' content='text/html;charset=…'> tag or in the <?xml … encoding='…'> processing instruction- See Also:
-
CDATA_SECTIONS
Scan CDATA sections.- See Also:
-
OVERRIDE_DOCTYPE
Override doctype declaration public and system identifiers.- See Also:
-
INSERT_DOCTYPE
Insert document type declaration.- See Also:
-
PARSE_NOSCRIPT_CONTENT
Parse <noscript>...</noscript> content- See Also:
-
ALLOW_SELFCLOSING_IFRAME
Allows self closing <iframe/> tag- See Also:
-
ALLOW_SELFCLOSING_TAGS
Allows self closing tags e.g. <div/> (XHTML)- See Also:
-
NORMALIZE_ATTRIBUTES
Normalize attribute values.- See Also:
-
NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.- See Also:
-
NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.- See Also:
-
DEFAULT_ENCODING
Default encoding.- See Also:
-
ERROR_REPORTER
Error reporter.- See Also:
-
DOCTYPE_PUBID
Doctype declaration public identifier.- See Also:
-
DOCTYPE_SYSID
Doctype declaration system identifier.- See Also:
-
STATE_CONTENT
protected static final short STATE_CONTENTState: content.- See Also:
-
STATE_MARKUP_BRACKET
protected static final short STATE_MARKUP_BRACKETState: markup bracket.- See Also:
-
STATE_START_DOCUMENT
protected static final short STATE_START_DOCUMENTState: start document.- See Also:
-
STATE_END_DOCUMENT
protected static final short STATE_END_DOCUMENTState: end document.- See Also:
-
NAMES_NO_CHANGE
protected static final short NAMES_NO_CHANGEDon't modify HTML names.- See Also:
-
NAMES_UPPERCASE
protected static final short NAMES_UPPERCASEUppercase HTML names.- See Also:
-
NAMES_LOWERCASE
protected static final short NAMES_LOWERCASELowercase HTML names.- See Also:
-
DEFAULT_BUFFER_SIZE
protected static final int DEFAULT_BUFFER_SIZEDefault buffer size.- See Also:
-
DEBUG_CALLBACKS
protected static final boolean DEBUG_CALLBACKSSet to true to debug callbacks.- See Also:
-
SYNTHESIZED_ITEM
Synthesized event info item. -
fAugmentations
protected boolean fAugmentationsAugmentations. -
fReportErrors
protected boolean fReportErrorsReport errors. -
fNotifyCharRefs
protected boolean fNotifyCharRefsNotify character entity references. -
fNotifyXmlBuiltinRefs
protected boolean fNotifyXmlBuiltinRefsNotify XML built-in general entity references. -
fNotifyHtmlBuiltinRefs
protected boolean fNotifyHtmlBuiltinRefsNotify HTML built-in general entity references. -
fFixWindowsCharRefs
protected boolean fFixWindowsCharRefsFix Microsoft Windows® character entity references. -
fScriptStripCDATADelims
protected boolean fScriptStripCDATADelimsStrip CDATA delimiters from SCRIPT tags. -
fScriptStripCommentDelims
protected boolean fScriptStripCommentDelimsStrip comment delimiters from SCRIPT tags. -
fStyleStripCDATADelims
protected boolean fStyleStripCDATADelimsStrip CDATA delimiters from STYLE tags. -
fStyleStripCommentDelims
protected boolean fStyleStripCommentDelimsStrip comment delimiters from STYLE tags. -
fIgnoreSpecifiedCharset
protected boolean fIgnoreSpecifiedCharsetIgnore specified character set. -
fCDATASections
protected boolean fCDATASectionsCDATA sections. -
fOverrideDoctype
protected boolean fOverrideDoctypeOverride doctype declaration public and system identifiers. -
fInsertDoctype
protected boolean fInsertDoctypeInsert document type declaration. -
fNormalizeAttributes
protected boolean fNormalizeAttributesNormalize attribute values. -
fParseNoScriptContent
protected boolean fParseNoScriptContentParse noscript content. -
fParseNoFramesContent
protected boolean fParseNoFramesContentParse noframes content. -
fAllowSelfclosingIframe
protected boolean fAllowSelfclosingIframeAllows self closing iframe tags. -
fAllowSelfclosingTags
protected boolean fAllowSelfclosingTagsAllows self closing tags. -
fNamesElems
protected short fNamesElemsModify HTML element names. -
fNamesAttrs
protected short fNamesAttrsModify HTML attribute names. -
fDefaultIANAEncoding
Default encoding. -
fErrorReporter
Error reporter. -
fDoctypePubid
Doctype declaration public identifier. -
fDoctypeSysid
Doctype declaration system identifier. -
fBeginLineNumber
protected int fBeginLineNumberBeginning line number. -
fBeginColumnNumber
protected int fBeginColumnNumberBeginning column number. -
fBeginCharacterOffset
protected int fBeginCharacterOffsetBeginning character offset in the file. -
fEndLineNumber
protected int fEndLineNumberEnding line number. -
fEndColumnNumber
protected int fEndColumnNumberEnding column number. -
fEndCharacterOffset
protected int fEndCharacterOffsetEnding character offset in the file. -
fByteStream
The playback byte stream. -
fCurrentEntity
Current entity. -
fCurrentEntityStack
The current entity stack. -
fScanner
The current scanner. -
fScannerState
protected short fScannerStateThe current scanner state. -
fDocumentHandler
protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandlerThe document handler. -
fIANAEncoding
Auto-detected IANA encoding. -
fJavaEncoding
Auto-detected Java encoding. -
fIso8859Encoding
protected boolean fIso8859EncodingTrue if the encoding matches "ISO-8859-*". -
fElementCount
protected int fElementCountElement count. -
fElementDepth
protected int fElementDepthElement depth. -
fContentScanner
Content scanner. -
fSpecialScanner
Special scanner used for elements whose content needs to be scanned as plain text, ignoring markup such as elements and entity references. For example: <SCRIPT> and <COMMENT>. -
fStringBuffer
protected final org.apache.xerces.util.XMLStringBuffer fStringBufferString buffer.
-
-
Constructor Details
-
HTMLScanner
public HTMLScanner()
-
-
Method Details
-
pushInputSource
public void pushInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource) Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.Note: This functionality is experimental at this time and is subject to change in future releases of NekoHTML.
- Parameters:
inputSource
- The new input source to start scanning.- See Also:
-
evaluateInputSource
public void evaluateInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource) Immediately evaluates an input source and add the new content (e.g. the output written by an embedded script).- Parameters:
inputSource
- The new input source to start evaluating.- See Also:
-
cleanup
public void cleanup(boolean closeall) Cleans up used resources. For example, if scanning is terminated early, then this method ensures all remaining open streams are closed.- Parameters:
closeall
- Close all streams, including the original. This is used in cases when the application has opened the original document stream and should be responsible for closing it.
-
getEncoding
Returns the encoding.- Specified by:
getEncoding
in interfaceorg.apache.xerces.xni.XMLLocator
-
getPublicId
Returns the public identifier.- Specified by:
getPublicId
in interfaceorg.apache.xerces.xni.XMLLocator
-
getBaseSystemId
Returns the base system identifier.- Specified by:
getBaseSystemId
in interfaceorg.apache.xerces.xni.XMLLocator
-
getLiteralSystemId
Returns the literal system identifier.- Specified by:
getLiteralSystemId
in interfaceorg.apache.xerces.xni.XMLLocator
-
getExpandedSystemId
Returns the expanded system identifier.- Specified by:
getExpandedSystemId
in interfaceorg.apache.xerces.xni.XMLLocator
-
getLineNumber
public int getLineNumber()Returns the current line number.- Specified by:
getLineNumber
in interfaceorg.apache.xerces.xni.XMLLocator
-
getColumnNumber
public int getColumnNumber()Returns the current column number.- Specified by:
getColumnNumber
in interfaceorg.apache.xerces.xni.XMLLocator
-
getXMLVersion
Returns the XML version.- Specified by:
getXMLVersion
in interfaceorg.apache.xerces.xni.XMLLocator
-
getCharacterOffset
public int getCharacterOffset()Returns the character offset.- Specified by:
getCharacterOffset
in interfaceorg.apache.xerces.xni.XMLLocator
-
getFeatureDefault
Returns the default state for a feature.- Specified by:
getFeatureDefault
in interfaceHTMLComponent
- Specified by:
getFeatureDefault
in interfaceorg.apache.xerces.xni.parser.XMLComponent
-
getPropertyDefault
Returns the default state for a property.- Specified by:
getPropertyDefault
in interfaceHTMLComponent
- Specified by:
getPropertyDefault
in interfaceorg.apache.xerces.xni.parser.XMLComponent
-
getRecognizedFeatures
Returns recognized features.- Specified by:
getRecognizedFeatures
in interfaceorg.apache.xerces.xni.parser.XMLComponent
-
getRecognizedProperties
Returns recognized properties.- Specified by:
getRecognizedProperties
in interfaceorg.apache.xerces.xni.parser.XMLComponent
-
reset
public void reset(org.apache.xerces.xni.parser.XMLComponentManager manager) throws org.apache.xerces.xni.parser.XMLConfigurationException Resets the component.- Specified by:
reset
in interfaceorg.apache.xerces.xni.parser.XMLComponent
- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
setFeature
Sets a feature.- Specified by:
setFeature
in interfaceorg.apache.xerces.xni.parser.XMLComponent
-
setProperty
public void setProperty(String propertyId, Object value) throws org.apache.xerces.xni.parser.XMLConfigurationException Sets a property.- Specified by:
setProperty
in interfaceorg.apache.xerces.xni.parser.XMLComponent
- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
setInputSource
Sets the input source.- Specified by:
setInputSource
in interfaceorg.apache.xerces.xni.parser.XMLDocumentScanner
- Throws:
IOException
-
scanDocument
public boolean scanDocument(boolean complete) throws org.apache.xerces.xni.XNIException, IOException Scans the document.- Specified by:
scanDocument
in interfaceorg.apache.xerces.xni.parser.XMLDocumentScanner
- Throws:
org.apache.xerces.xni.XNIException
IOException
-
setDocumentHandler
public void setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler) Sets the document handler.- Specified by:
setDocumentHandler
in interfaceorg.apache.xerces.xni.parser.XMLDocumentSource
-
getDocumentHandler
public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()Returns the document handler.- Specified by:
getDocumentHandler
in interfaceorg.apache.xerces.xni.parser.XMLDocumentSource
-
getValue
Returns the value of the specified attribute, ignoring case. -
expandSystemId
Expands a system id and returns the system id as a URI, if it can be expanded. A return value of null means that the identifier is already expanded. An exception thrown indicates a failure to expand the id.- Parameters:
systemId
- The systemId to be expanded.- Returns:
- Returns the URI string representing the expanded system identifier. A null value indicates that the given system identifier is already expanded.
-
fixURI
Fixes a platform dependent filename to standard URI form.- Parameters:
str
- The string to fix.- Returns:
- Returns the fixed URI string.
-
modifyName
Modifies the given name based on the specified mode. -
getNamesValue
Converts HTML names string value to constant value.- See Also:
-
fixWindowsCharacter
protected int fixWindowsCharacter(int origChar) Fixes Microsoft Windows® specific characters.Details about this common problem can be found at http://www.cs.tut.fi/~jkorpela/www/windows-chars.html
-
read
Reads a single character.- Throws:
IOException
-
setScanner
Sets the scanner. -
setScannerState
protected void setScannerState(short state) Sets the scanner state. -
scanDoctype
Scans a DOCTYPE line.- Throws:
IOException
-
scanLiteral
Scans a quoted literal.- Throws:
IOException
-
scanName
Scans a name.- Throws:
IOException
-
scanEntityRef
protected int scanEntityRef(org.apache.xerces.util.XMLStringBuffer str, boolean content) throws IOException Scans an entity reference.- Throws:
IOException
-
skip
Returns true if the specified text is present and is skipped.- Throws:
IOException
-
skipMarkup
Skips markup.- Throws:
IOException
-
skipSpaces
Skips whitespace.- Throws:
IOException
-
skipNewlines
Skips newlines and returns the number of newlines skipped.- Throws:
IOException
-
locationAugs
protected final org.apache.xerces.xni.Augmentations locationAugs()Returns an augmentations object with a location item added. -
synthesizedAugs
protected final org.apache.xerces.xni.Augmentations synthesizedAugs()Returns an augmentations object with a synthesized item added. -
resourceId
protected final org.apache.xerces.xni.XMLResourceIdentifier resourceId()Returns an empty resource identifier. -
builtinXmlRef
Returns true if the name is a built-in XML general entity reference. -
readPreservingBufferContent
Reads a single character, preserving the old buffer content- Throws:
IOException
-