<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>
<html xmlns=“www.w3.org/1999/xhtml”>
http-equiv=“Content-Type” content=“text/html; charset=UTF-8” /><link
rel=“SHORTCUT ICON” href=“/favicon.ico” /><style type=“text/css”>
TD {font-family: Verdana,Arial,Helvetica} BODY {font-family:
Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right:
0em} H1 {font-family: Verdana,Arial,Helvetica} H2 {font-family:
Verdana,Arial,Helvetica} H3 {font-family: Verdana,Arial,Helvetica} A:link,
A:visited, A:active { text-decoration: underline }
</style><title>The parser
interfaces</title></head><body bgcolor=“#8b7765”
text=“#000000” link=“#a06060” vlink=“#000000”><table border=“0”
width=“100%” cellpadding=“5” cellspacing=“0”
align=“center”><tr><td width=“120”><a href=“swpat.ffii.org/”>
src=“epatents.png” alt=“Action against software patents”
/></a></td><td width=“180”><a href=“
www.gnome.org/”>
src=“gnome2.png” alt=“Gnome2 Logo” /></a><a href=“
www.w3.org/Status”>
src=“w3c.png” alt=“W3C Logo” /></a><a href=“
www.redhat.com/”>
src=“redhat.gif” alt=“Red Hat Logo” /></a><div
align=“left”><a href=“
xmlsoft.org/”>
src=“Libxml2-Logo-180x168.gif” alt=“Made with Libxml2 Logo”
/></a></div></td><td><table border=“0”
width=“90%” cellpadding=“2” cellspacing=“0” align=“center”
bgcolor=“#000000”><tr><td><table width=“100%” border=“0”
cellspacing=“1” cellpadding=“3” bgcolor=“#fffacd”><tr><td
align=“center”><h1>The XML C parser and toolkit of
Gnome</h1><h2>The parser
interfaces</h2></td></tr></table></td></tr></table></td></tr></table><table
border=“0” cellpadding=“4” cellspacing=“0” width=“100%”
align=“center”><tr><td bgcolor=“#8b7765”><table
border=“0” cellspacing=“0” cellpadding=“2” width=“100%”><tr><td
valign=“top” width=“200” bgcolor=“#8b7765”><table border=“0”
cellspacing=“0” cellpadding=“1” width=“100%”
bgcolor=“#000000”><tr><td><table width=“100%” border=“0”
cellspacing=“1” cellpadding=“3”><tr><td colspan=“1”
bgcolor=“#eecfa1” align=“center”><center>Developer
Menu</center></td></tr><tr><td
bgcolor=“#fffacd”><form action=“search.php”
enctype=“application/x-www-form-urlencoded” method=“get”><input
name=“query” type=“text” size=“20” value=“” /><input name=“submit”
type=“submit” value=“Search …” /></form><ul><li><a
href=“index.html” style=“font-weight:bold”>Main
Menu</a></li><li><a href=“html/index.html”
style=“font-weight:bold”>Reference
Manual</a></li><li><a href=“examples/index.html”
style=“font-weight:bold”>Code
Examples</a></li><li><a href=“guidelines.html”>XML
Guidelines</a></li><li><a
href=“tutorial/index.html”>Tutorial</a></li><li><a
href=“xmlreader.html”>The Reader
Interface</a></li><li><a
href=“ChangeLog.html”>ChangeLog</a></li><li><a
href=“XSLT.html”>XSLT</a></li><li><a
href=“python.html”>Python and
bindings</a></li><li><a
href=“architecture.html”>libxml2
architecture</a></li><li><a href=“tree.html”>The
tree output</a></li><li><a
href=“interface.html”>The SAX
interface</a></li><li><a href=“xmlmem.html”>Memory
Management</a></li><li><a href=“xmlio.html”>I/O
Interfaces</a></li><li><a href=“library.html”>The
parser interfaces</a></li><li><a
href=“entities.html”>Entities or no
entities</a></li><li><a
href=“namespaces.html”>Namespaces</a></li><li><a
href=“upgrade.html”>Upgrading 1.x
code</a></li><li><a href=“threads.html”>Thread
safety</a></li><li><a href=“DOM.html”>DOM
Principles</a></li><li><a href=“example.html”>A
real example</a></li><li><a href=“xml.html”>flat
page</a>, <a
href=“site.xsl”>stylesheet</a></li></ul></td></tr></table><table
width=“100%” border=“0” cellspacing=“1” cellpadding=“3”><tr><td
colspan=“1” bgcolor=“#eecfa1” align=“center”><center>API
Indexes</center></td></tr><tr><td
bgcolor=“#fffacd”><ul><li><a
href=“APIchunk0.html”>Alphabetic</a></li><li><a
href=“APIconstructors.html”>Constructors</a></li><li><a
href=“APIfunctions.html”>Functions/Types</a></li><li><a
href=“APIfiles.html”>Modules</a></li><li><a
href=“APIsymbols.html”>Symbols</a></li></ul></td></tr></table><table
width=“100%” border=“0” cellspacing=“1” cellpadding=“3”><tr><td
colspan=“1” bgcolor=“#eecfa1”
align=“center”><center>Related
links</center></td></tr><tr><td
bgcolor=“#fffacd”><ul><li><a href=“Mail”>mail.gnome.org/archives/xml/“>Mail
archive</a></li><li><a href=”XSLT“>xmlsoft.org/XSLT/”>XSLT
libxslt</a></li><li><a href=“DOM”>phd.cs.unibo.it/gdome2/“>DOM
gdome2</a></li><li><a href=”XML-DSig“>www.aleksey.com/xmlsec/”>XML-DSig
xmlsec</a></li><li><a href=“FTP
<dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt> <dd><p>Parse a null-terminated string containing the document.</p> </dd>
</dl><dl>
<dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt> <dd><p>Parse an XML document contained in a (possibly compressed) file.</p> </dd>
</dl><p>The parser returns a pointer to the document structure (or NULL in case of failure).</p><h3 id=“Invoking1”>Invoking the parser: the push method</h3><p>In order for the application to keep the control when the document is being fetched (which is common for GUI based programs) libxml2 provides a push interface, too, as of version 1.8.3. Here are the interface functions:</p><pre>xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax,
void *user_data, const char *chunk, int size, const char *filename);
int xmlParseChunk (xmlParserCtxtPtr ctxt,
const char *chunk, int size, int terminate);</pre><p>and here is a simple example showing how to use the interface:</p><pre> FILE *f; f = fopen(filename, "r"); if (f != NULL) { int res, size = 1024; char chars[1024]; xmlParserCtxtPtr ctxt; res = fread(chars, 1, 4, f); if (res > 0) { ctxt = xmlCreatePushParserCtxt(NULL, NULL, chars, res, filename); while ((res = fread(chars, 1, size, f)) > 0) { xmlParseChunk(ctxt, chars, res, 0); } xmlParseChunk(ctxt, chars, 0, 1); doc = ctxt->myDoc; xmlFreeParserCtxt(ctxt); } }</pre><p>The HTML parser embedded into libxml2 also has a push interface; the
functions are just prefixed by “html” rather than “xml”.</p><h3
id=“Invoking2”>Invoking the parser: the SAX
interface</h3><p>The tree-building interface makes the parser
memory-hungry, first loading the document in memory and then building the
tree itself. Reading a document without building the tree is possible using
the SAX interfaces (see SAX.h and <a href=“James”>www.daa.com.au/~james/gnome/xml-sax/xml-sax.html“>James
Henstridge's documentation</a>). Note also that the push
interface can be limited to SAX: just use the two first arguments of
xmlCreatePushParserCtxt()
.</p><h3><a
name=”Building“ id=”Building“>Building a tree from
scratch</a></h3><p>The other way to get an XML tree in
memory is by building it. Basically there is a set of functions dedicated
to building new elements. (These are also described in
<libxml/tree.h>.) For example, here is a piece of code that
produces the XML document used in the previous
examples:</p><pre> include <libxml/tree.h>
xmlDocPtr doc; xmlNodePtr tree, subtree; doc = xmlNewDoc("1.0"); doc->children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL); xmlSetProp(doc->children, "prop1", "gnome is great"); xmlSetProp(doc->children, "prop2", "& linux too"); tree = xmlNewChild(doc->children, NULL, "head", NULL); subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome"); tree = xmlNewChild(doc->children, NULL, "chapter", NULL); subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure"); subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ..."); subtree = xmlNewChild(tree, NULL, "image", NULL); xmlSetProp(subtree, "href", "linus.gif");</pre><p>Not really rocket science ...</p><h3><a name="Traversing" id="Traversing">Traversing the tree</a></h3><p>Basically by <a href="html/libxml-tree.html">including "tree.h"</a> your
code has access to the internal structure of all the elements of the tree.
The names should be somewhat simple like
<strong>parent</strong>, <strong>children</strong>,
<strong>next</strong>, <strong>prev</strong>,
<strong>properties</strong>, etc… For example, still with the
previous
example:</p><pre>doc->children->children->children
</pre><p>points
to the title
element,</p><pre>doc->children->children->next->children->children</pre><p>points
to the text node containing the chapter title “The Linux
adventure”.</p><p><strong>NOTE</strong>: XML allows
PIs and comments to be present before the document root,
so doc->children
may point to an element which is not
the document Root Element; a function xmlDocGetRootElement()
was added for this purpose.</p><h3><a name=“Modifying”
id=“Modifying”>Modifying the tree</a></h3><p>Functions
are provided for reading and writing the document content. Here is an
excerpt from the <a href=“html/libxml-tree.html”>tree
API</a>:</p><dl>
<dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, const xmlChar *value);</code></dt> <dd><p>This sets (or changes) an attribute carried by an ELEMENT node. The value can be NULL.</p> </dd>
</dl><dl>
<dt><code>const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar *name);</code></dt> <dd><p>This function returns a pointer to new copy of the property content. Note that the user must deallocate the result.</p> </dd>
</dl><p>Two functions are provided for reading and writing the text associated with elements:</p><dl>
<dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar *value);</code></dt> <dd><p>This function takes an "external" string and converts it to one text node or possibly to a list of entity and text nodes. All non-predefined entity references like &Gnome; will be stored internally as entity nodes, hence the result of the function may not be a single node.</p> </dd>
</dl><dl>
<dt><code>xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int inLine);</code></dt> <dd><p>This function is the inverse of <code>xmlStringGetNodeList()</code>. It generates a new string containing the content of the text and entity nodes. Note the extra argument inLine. If this argument is set to 1, the function will expand entity references. For example, instead of returning the &Gnome; XML encoding in the string, it will substitute it with its value (say, "GNU Network Object Model Environment").</p> </dd>
</dl><h3><a name=“Saving” id=“Saving”>Saving a tree</a></h3><p>Basically 3 options are possible:</p><dl>
<dt><code>void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int *size);</code></dt> <dd><p>Returns a buffer into which the document has been saved.</p> </dd>
</dl><dl>
<dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt> <dd><p>Dumps a document to an open file descriptor.</p> </dd>
</dl><dl>
<dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt> <dd><p>Saves the document to a file. In this case, the compression interface is triggered if it has been turned on.</p> </dd>
</dl><h3><a name=“Compressio” id=“Compressio”>Compression</a></h3><p>The library transparently handles compression when doing file-based accesses. The level of compression on saves can be turned on either globally or individually for one file:</p><dl>
<dt><code>int xmlGetDocCompressMode (xmlDocPtr doc);</code></dt> <dd><p>Gets the document compression ratio (0-9).</p> </dd>
</dl><dl>
<dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt> <dd><p>Sets the document compression ratio.</p> </dd>
</dl><dl>
<dt><code>int xmlGetCompressMode(void);</code></dt> <dd><p>Gets the default compression ratio.</p> </dd>
</dl><dl>
<dt><code>void xmlSetCompressMode(int mode);</code></dt> <dd><p>Sets the default compression ratio.</p> </dd>
</dl><p><a href=“bugs.html”>Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html>