<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>
<html xmlns=“www.w3.org/1999/xhtml”>
http-equiv=“Content-Type” content=“text/html; charset=UTF-8” /><link
rel=“SHORTCUT ICON” href=“/favicon.ico” /><style type=“text/css”>
TD {font-family: Verdana,Arial,Helvetica} BODY {font-family:
Verdana,Arial,Helvetica; margin-top: 2em; margin-left: 0em; margin-right:
0em} H1 {font-family: Verdana,Arial,Helvetica} H2 {font-family:
Verdana,Arial,Helvetica} H3 {font-family: Verdana,Arial,Helvetica} A:link,
A:visited, A:active { text-decoration: underline }
</style><title>Catalog
support</title></head><body bgcolor=“#8b7765” text=“#000000”
link=“#a06060” vlink=“#000000”><table border=“0” width=“100%”
cellpadding=“5” cellspacing=“0” align=“center”><tr><td
width=“120”><a href=“swpat.ffii.org/”>
src=“epatents.png” alt=“Action against software patents”
/></a></td><td width=“180”><a href=“
www.gnome.org/”>
src=“gnome2.png” alt=“Gnome2 Logo” /></a><a href=“
www.w3.org/Status”>
src=“w3c.png” alt=“W3C Logo” /></a><a href=“
www.redhat.com/”>
src=“redhat.gif” alt=“Red Hat Logo” /></a><div
align=“left”><a href=“
xmlsoft.org/”>
src=“Libxml2-Logo-180x168.gif” alt=“Made with Libxml2 Logo”
/></a></div></td><td><table border=“0”
width=“90%” cellpadding=“2” cellspacing=“0” align=“center”
bgcolor=“#000000”><tr><td><table width=“100%” border=“0”
cellspacing=“1” cellpadding=“3” bgcolor=“#fffacd”><tr><td
align=“center”><h1>The XML C parser and toolkit of
Gnome</h1><h2>Catalog
support</h2></td></tr></table></td></tr></table></td></tr></table><table
border=“0” cellpadding=“4” cellspacing=“0” width=“100%”
align=“center”><tr><td bgcolor=“#8b7765”><table
border=“0” cellspacing=“0” cellpadding=“2” width=“100%”><tr><td
valign=“top” width=“200” bgcolor=“#8b7765”><table border=“0”
cellspacing=“0” cellpadding=“1” width=“100%”
bgcolor=“#000000”><tr><td><table width=“100%” border=“0”
cellspacing=“1” cellpadding=“3”><tr><td colspan=“1”
bgcolor=“#eecfa1” align=“center”><center>Main
Menu</center></td></tr><tr><td
bgcolor=“#fffacd”><form action=“search.php”
enctype=“application/x-www-form-urlencoded” method=“get”><input
name=“query” type=“text” size=“20” value=“” /><input name=“submit”
type=“submit” value=“Search …” /></form><ul><li><a
href=“index.html”>Home</a></li><li><a
href=“html/index.html”>Reference
Manual</a></li><li><a
href=“intro.html”>Introduction</a></li><li><a
href=“FAQ.html”>FAQ</a></li><li><a href=“docs.html”
style=“font-weight:bold”>Developer
Menu</a></li><li><a href=“bugs.html”>Reporting bugs
and getting help</a></li><li><a
href=“help.html”>How to help</a></li><li><a
href=“downloads.html”>Downloads</a></li><li><a
href=“news.html”>Releases</a></li><li><a
href=“XMLinfo.html”>XML</a></li><li><a
href=“XSLT.html”>XSLT</a></li><li><a
href=“xmldtd.html”>Validation &
DTDs</a></li><li><a href=“encoding.html”>Encodings
support</a></li><li><a href=“catalog.html”>Catalog
support</a></li><li><a
href=“namespaces.html”>Namespaces</a></li><li><a
href=“contribs.html”>Contributions</a></li><li><a
href=“examples/index.html” style=“font-weight:bold”>Code
Examples</a></li><li><a href=“html/index.html”
style=“font-weight:bold”>API Menu</a></li><li><a
href=“guidelines.html”>XML
Guidelines</a></li><li><a
href=“ChangeLog.html”>Recent
Changes</a></li></ul></td></tr></table><table
width=“100%” border=“0” cellspacing=“1” cellpadding=“3”><tr><td
colspan=“1” bgcolor=“#eecfa1”
align=“center”><center>Related
links</center></td></tr><tr><td
bgcolor=“#fffacd”><ul><li><a href=“Mail”>mail.gnome.org/archives/xml/“>Mail
archive</a></li><li><a href=”XSLT“>xmlsoft.org/XSLT/”>XSLT
libxslt</a></li><li><a href=“DOM”>phd.cs.unibo.it/gdome2/“>DOM
gdome2</a></li><li><a href=”XML-DSig“>www.aleksey.com/xmlsec/”>XML-DSig
xmlsec</a></li><li><a href=“FTP
<li><a href="General2">General overview</a></li> <li><a href="#definition">The definition</a></li> <li><a href="#Simple">Using catalogs</a></li> <li><a href="#Some">Some examples</a></li> <li><a href="#reference">How to tune catalog usage</a></li> <li><a href="#validate">How to debug catalog processing</a></li> <li><a href="#Declaring">How to create and maintain catalogs</a></li> <li><a href="#implemento">The implementor corner quick review of the API</a></li> <li><a href="#Other">Other resources</a></li>
</ol><h3><a name=“General2” id=“General2”>General overview</a></h3><p>What is a catalog? Basically it's a lookup mechanism used when an entity (a file or a remote resource) references another entity. The catalog lookup is inserted between the moment the reference is recognized by the software (XML parser, stylesheet processing, or even images referenced for inclusion in a rendering) and the time where loading that resource is actually started.</p><p>It is basically used for 3 things:</p><ul>
<li>mapping from "logical" names, the public identifiers and a more concrete name usable for download (and URI). For example it can associate the logical name <p>"-//OASIS//DTD DocBook XML V4.1.2//EN"</p> <p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be downloaded</p> <p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd</p> </li> <li>remapping from a given URL to another one, like an HTTP indirection saying that <p>"http://www.oasis-open.org/committes/tr.xsl"</p> <p>should really be looked at</p> <p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"</p> </li> <li>providing a local cache mechanism allowing to load the entities associated to public identifiers or remote resources, this is a really important feature for any significant deployment of XML or SGML since it allows to avoid the aleas and delays associated to fetching remote resources.</li>
</ul><h3><a name=“definition” id=“definition”>The definitions</a></h3><p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p><ul>
<li>the older SGML catalogs, the official spec is SGML Open Technical Resolution TR9401:1997, but is better understood by reading <a href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from James Clark. This is relatively old and not the preferred mode of operation of libxml.</li> <li><a href="http://www.oasis-open.org/committees/entity/spec.html">XML Catalogs</a> is far more flexible, more recent, uses an XML syntax and should scale quite better. This is the default option of libxml.</li>
</ul><p></p><h3><a name=“Simple” id=“Simple”>Using catalog</a></h3><p>In a normal environment libxml2 will by default check the presence of a catalog in /etc/xml/catalog, and assuming it has been correctly populated, the processing is completely transparent to the document user. To take a concrete example, suppose you are authoring a DocBook document, this one starts with the following DOCTYPE definition:</p><pre><?xml version='1.0'?> <!DOCTYPE book PUBLIC “-//Norman Walsh//DTD DocBk XML V3.1.4//EN”
"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"></pre><p>When validating the document with libxml, the catalog will be
automatically consulted to lookup the public identifier “-//Norman
Walsh//DTD DocBk XML V3.1.4//EN” and the system identifier “nwalsh.com/docbook/xml/3.1.4/db3xml.dtd”,
and if these entities have been installed on your system and the catalogs
actually point to them, libxml will fetch them from the local
disk.</p><p style=“font-size:
10pt”><strong>Note</strong>: Really don't use this
DOCTYPE example it's a really old version, but is fine as an
example.</p><p>Libxml2 will check the catalog each time that it
is requested to load an entity, this includes DTD, external parsed
entities, stylesheets, etc … If your system is correctly configured all the
authoring phase and processing should use only local files, even if your
document stays portable because it uses the canonical public and system ID,
referencing the remote document.</p><h3><a name=“Some”
id=“Some”>Some examples:</a></h3><p>Here is a couple
of fragments from XML Catalogs used in libxml2 early regression tests in
test/catalogs
:</p><pre><?xml
version=“1.0”?> <!DOCTYPE catalog PUBLIC
"-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns=“urn:oasis:names:tc:entity:xmlns:xml:catalog”>
<public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN" uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
…</pre><p>This is the beginning of a catalog for DocBook 4.1.2,
XML Catalogs are written in XML, there is a specific namespace for catalog
elements “urn:oasis:names:tc:entity:xmlns:xml:catalog”. The first entry in
this catalog is a public
mapping it allows to associate a
Public Identifier with an URI.</p><pre>…
<rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/" rewritePrefix="file:///usr/share/xml/docbook/"/>
…</pre><p>A rewriteSystem
is a very powerful
instruction, it says that any URI starting with a given prefix should be
looked at another URI constructed by replacing the prefix with an new one.
In effect this acts like a cache system for a full area of the Web. In
practice it is extremely useful with a file prefix if you have installed a
copy of those resources on your local system.</p><pre>…
<delegatePublic publicIdStartString=“-//OASIS//DTD XML Catalog //”
catalog="file:///usr/share/xml/docbook.xml"/>
<delegatePublic publicIdStartString=“-//OASIS//ENTITIES DocBook XML”
catalog="file:///usr/share/xml/docbook.xml"/>
<delegatePublic publicIdStartString=“-//OASIS//DTD DocBook XML”
catalog="file:///usr/share/xml/docbook.xml"/>
<delegateSystem systemIdStartString=“www.oasis-open.org/docbook/”
catalog="file:///usr/share/xml/docbook.xml"/>
<delegateURI uriStartString=“www.oasis-open.org/docbook/”
catalog="file:///usr/share/xml/docbook.xml"/>
…</pre><p>Delegation is the core features which allows to build
a tree of catalogs, easier to maintain than a single catalog, based on
Public Identifier, System Identifier or URI prefixes it instructs the
catalog software to look up entries in another resource. This feature allow
to build hierarchies of catalogs, the set of entries presented should be
sufficient to redirect the resolution of all DocBook references to the
specific catalog in /usr/share/xml/docbook.xml
this one in
turn could delegate all references for DocBook 4.2.1 to a specific catalog
installed at the same time as the DocBook resources on the local
machine.</p><h3><a name=“reference” id=“reference”>How to
tune catalog usage:</a></h3><p>The user can change the
default catalog behaviour by redirecting queries to its own set of
catalogs, this can be done by setting the XML_CATALOG_FILES
environment variable to a list of catalogs, an empty one should deactivate
loading the default /etc/xml/catalog
default
catalog</p><h3><a name=“validate” id=“validate”>How to
debug catalog processing:</a></h3><p>Setting up the
XML_DEBUG_CATALOG
environment variable will make libxml2
output debugging information for each catalog operations, for
example:</p><pre>orchis:~/XML -> xmllint –memory –noout
test/ent2 warning: failed to load external entity “title.xml” orchis:~/XML
-> export XML_DEBUG_CATALOG= orchis:~/XML -> xmllint –memory
–noout test/ent2 Failed to parse catalog /etc/xml/catalog Failed to parse
catalog /etc/xml/catalog warning: failed to load external entity
“title.xml” Catalogs cleanup orchis:~/XML ->
</pre><p>The test/ent2 references an entity, running the parser
from memory makes the base URI unavailable and the the “title.xml” entity
cannot be loaded. Setting up the debug environment variable allows to
detect that an attempt is made to load the /etc/xml/catalog
but since it's not present the resolution fails.</p><p>But
the most advanced way to debug XML catalog processing is to use the
<strong>xmlcatalog</strong> command shipped with libxml2, it
allows to load catalogs and make resolution queries to see what is going
on. This is also used for the regression
tests:</p><pre>orchis:~/XML -> ./xmlcatalog
test/catalogs/docbook.xml \
"-//OASIS//DTD DocBook XML V4.1.2//EN"
www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd orchis:~/XML -> </pre><p>For debugging what is going on, adding one -v flags increase the verbosity level to indicate the processing done (adding a second flag also indicate what elements are recognized at parsing):</p><pre>orchis:~/XML -> ./xmlcatalog -v test/catalogs/docbook.xml \
"-//OASIS//DTD DocBook XML V4.1.2//EN"
Parsing catalog test/catalogs/docbook.xml's content Found public match -//OASIS//DTD DocBook XML V4.1.2//EN www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd Catalogs cleanup orchis:~/XML -> </pre><p>A shell interface is also available to debug and process multiple queries (and for regression tests):</p><pre>orchis:~/XML -> ./xmlcatalog -shell test/catalogs/docbook.xml \
"-//OASIS//DTD DocBook XML V4.1.2//EN"
> help
Commands available: public PublicID: make a PUBLIC identifier lookup system SystemID: make a SYSTEM identifier lookup resolve PublicID SystemID: do a full resolver lookup add 'type' 'orig' 'replace' : add an entry del 'values' : remove values dump: print the current catalog state debug: increase the verbosity level quiet: decrease the verbosity level exit: quit the shell > public “-//OASIS//DTD DocBook XML V4.1.2//EN” www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd > quit orchis:~/XML -> </pre><p>This should be sufficient for most debugging purpose, this was actually used heavily to debug the XML Catalog implementation itself.</p><h3><a name=“Declaring” id=“Declaring”>How to create and maintain</a> catalogs:</h3><p>Basically XML Catalogs are XML files, you can either use XML tools to manage them or use <strong>xmlcatalog</strong> for this. The basic step is to create a catalog the -create option provide this facility:</p><pre>orchis:~/XML -> ./xmlcatalog –create tst.xml <?xml version=“1.0”?> <!DOCTYPE catalog PUBLIC “-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN”
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog
xmlns=“urn:oasis:names:tc:entity:xmlns:xml:catalog”/> orchis:~/XML
-> </pre><p>By default xmlcatalog does not overwrite the
original catalog and save the result on the standard output, this can be
overridden using the -noout option. The -add
command allows to
add entries in the catalog:</p><pre>orchis:~/XML ->
./xmlcatalog –noout –create –add “public” \
"-//OASIS//DTD DocBook XML V4.1.2//EN" \ http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
orchis:~/XML -> cat tst.xml <?xml version=“1.0”?> <!DOCTYPE catalog PUBLIC “-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN” \
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns=“urn:oasis:names:tc:entity:xmlns:xml:catalog”> <public publicId=“-//OASIS//DTD DocBook XML V4.1.2//EN”
uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/>
</catalog> orchis:~/XML -> </pre><p>The
-add
option will always take 3 parameters even if some of the
XML Catalog constructs (like nextCatalog) will have only a single argument,
just pass a third empty string, it will be
ignored.</p><p>Similarly the -del
option remove
matching entries from the catalog:</p><pre>orchis:~/XML
-> ./xmlcatalog –del \
"http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
<?xml version=“1.0”?> <!DOCTYPE catalog PUBLIC “-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN”
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog
xmlns=“urn:oasis:names:tc:entity:xmlns:xml:catalog”/> orchis:~/XML
-> </pre><p>The catalog is now empty. Note that the
matching of -del
is exact and would have worked in a similar
fashion with the Public ID string.</p><p>This is rudimentary
but should be sufficient to manage a not too complex catalog tree of
resources.</p><h3><a name=“implemento”
id=“implemento”>The implementor corner quick review of the
API:</a></h3><p>First, and like for every other module of
libxml, there is an automatically generated <a
href=“html/libxml-catalog.html”>API page for catalog
support</a>.</p><p>The header for the catalog interfaces
should be included as:</p><pre>#include
<libxml/catalog.h></pre><p>The API is voluntarily
kept very simple. First it is not obvious that applications really need
access to it since it is the default behaviour of libxml2 (Note: it is
possible to completely override libxml2 default catalog by using <a
href=“html/libxml-parser.html”>xmlSetExternalEntityLoader</a> to
plug an application specific resolver).</p><p>Basically libxml2
support 2 catalog lists:</p><ul>
<li>the default one, global shared by all the application</li> <li>a per-document catalog, this one is built if the document uses the <code>oasis-xml-catalog</code> PIs to specify its own catalog list, it is associated to the parser context and destroyed when the parsing context is destroyed.</li>
</ul><p>the document one will be used first if it exists.</p><h4>Initialization routines:</h4><p>xmlInitializeCatalog(), xmlLoadCatalog() and xmlLoadCatalogs() should be used at startup to initialize the catalog, if the catalog should be initialized with specific values xmlLoadCatalog() or xmlLoadCatalogs() should be called before xmlInitializeCatalog() which would otherwise do a default initialization first.</p><p>The xmlCatalogAddLocal() call is used by the parser to grow the document own catalog list if needed.</p><h4>Preferences setup:</h4><p>The XML Catalog spec requires the possibility to select default preferences between public and system delegation, xmlCatalogSetDefaultPrefer() allows this, xmlCatalogSetDefaults() and xmlCatalogGetDefaults() allow to control if XML Catalogs resolution should be forbidden, allowed for global catalog, for document catalog or both, the default is to allow both.</p><p>And of course xmlCatalogSetDebug() allows to generate debug messages (through the xmlGenericError() mechanism).</p><h4>Querying routines:</h4><p>xmlCatalogResolve(), xmlCatalogResolveSystem(), xmlCatalogResolvePublic() and xmlCatalogResolveURI() are relatively explicit if you read the XML Catalog specification they correspond to section 7 algorithms, they should also work if you have loaded an SGML catalog with a simplified semantic.</p><p>xmlCatalogLocalResolve() and xmlCatalogLocalResolveURI() are the same but operate on the document catalog list</p><h4>Cleanup and Miscellaneous:</h4><p>xmlCatalogCleanup() free-up the global catalog, xmlCatalogFreeLocal() is the per-document equivalent.</p><p>xmlCatalogAdd() and xmlCatalogRemove() are used to dynamically modify the first catalog in the global list, and xmlCatalogDump() allows to dump a catalog state, those routines are primarily designed for xmlcatalog, I'm not sure that exposing more complex interfaces (like navigation ones) would be really useful.</p><p>The xmlParseCatalogFile() is a function used to load XML Catalog files, it's similar as xmlParseFile() except it bypass all catalog lookups, it's provided because this functionality may be useful for client tools.</p><h4>threaded environments:</h4><p>Since the catalog tree is built progressively, some care has been taken to try to avoid troubles in multithreaded environments. The code is now thread safe assuming that the libxml2 library has been compiled with threads support.</p><p></p><h3><a name=“Other” id=“Other”>Other resources</a></h3><p>The XML Catalog specification is relatively recent so there isn't much literature to point at:</p><ul>
<li>You can find a good rant from Norm Walsh about <a href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the need for catalogs</a>, it provides a lot of context information even if I don't agree with everything presented. Norm also wrote a more recent article <a href="http://wwws.sun.com/software/xml/developers/resolver/article/">XML entities and URI resolvers</a> describing them.</li> <li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML catalog proposal</a> from John Cowan</li> <li>The <a href="http://www.rddl.org/">Resource Directory Description Language</a> (RDDL) another catalog system but more oriented toward providing metadata for XML namespaces.</li> <li>the page from the OASIS Technical <a href="http://www.oasis-open.org/committees/entity/">Committee on Entity Resolution</a> who maintains XML Catalog, you will find pointers to the specification update, some background and pointers to others tools providing XML Catalog support</li> <li>There is a <a href="buildDocBookCatalog">shell script</a> to generate XML Catalogs for DocBook 4.1.2 . If it can write to the /etc/xml/ directory, it will set-up /etc/xml/catalog and /etc/xml/docbook based on the resources found on the system. Otherwise it will just create ~/xmlcatalog and ~/dbkxmlcatalog and doing: <p><code>export XML_CATALOG_FILES=$HOME/xmlcatalog</code></p> <p>should allow to process DocBook documentations without requiring network accesses for the DTD or stylesheets</p> </li> <li>I have uploaded <a href="ftp://xmlsoft.org/libxml2/test/dbk412catalog.tar.gz">a small tarball</a> containing XML Catalogs for DocBook 4.1.2 which seems to work fine for me too</li> <li>The <a href="http://www.xmlsoft.org/xmlcatalog_man.html">xmlcatalog manual page</a></li>
</ul><p>If you have suggestions for corrections or additions, simply contact me:</p><p><a href=“bugs.html”>Daniel Veillard</a></p></td></tr></table></td></tr></table></td></tr></table></td></tr></table></td></tr></table></body></html>