Set - 4

Question 16 :

How can I construct an XML document that is valid and conforms to the XML namespaces recommendation?

Answer :

In answering this question, it is important to remember that:
* Validity is a concept defined in XML 1.0,
* XML namespaces are layered on top of XML 1.0 , and
* The XML namespaces recommendation does not redefine validity, such as in terms of universal names .
Thus, validity is the same for a document that uses XML namespaces and one that doesn't. In particular, with respect to validity:
* xmlns attributes are treated as attributes, not XML namespace declarations.
* Qualified names are treated like other names. For example, in the name google:A, google is not treated as a namespace prefix, the colon is not treated as separating a prefix from a local name, and A is not treated as a local name. The name google:A is treated simply as the name google:A.
Because of this, XML documents that you might expect to be valid are not. For example, the following document is not valid because the element type name A is not declared in the DTD, in spite of the fact both google:A and A share the universal name {http://www.google.org/}A:

<?xml version="1.0" ?>
<!DOCTYPE google:A [
<!ELEMENT google:A EMPTY>
<!ATTLIST google:A xmlns:google CDATA #FIXED "http://www.google.org/" xmlns CDATA #FIXED "http://www.google.org/">
]>
<A/>

Similarly, the following is not valid because the xmlns attribute is not declared in the DTD:

<?xml version="1.0" ?>
<!DOCTYPE A [
<!ELEMENT A EMPTY>
]>
<A xmlns="http://www.google.org/" />

Furthermore, documents that you might expect to be invalid are valid. For example, the following document is valid but contains two definitions of the element type with the universal name {http://www.google.org/}A:

<?xml version="1.0" ?>
<!DOCTYPE google:A [
<!ELEMENT google:A (bar:A)>
<!ATTLIST google:A
xmlns:google CDATA #FIXED "http://www.google.org/">
<!ELEMENT bar:A (#PCDATA)>
<!ATTLIST bar:A
xmlns:bar CDATA #FIXED "http://www.google.org/">
]>
<google:A>
<bar:A>abcd</bar:A>
</google:A>

Finally, validity has nothing to do with correct usage of XML namespaces. For example, the following document is valid but does not conform to the XML namespaces recommendation because the google prefix is never declared: 

<?xml version="1.0" ?>
<!DOCTYPE google:A [
<!ELEMENT google:A EMPTY>
]>
<google:A />

Therefore, when constructing an XML document that uses XML namespaces, you need to do both of the following if you want the document to be valid:
* Declare xmlns attributes in the DTD.
* Use the same qualified names in the DTD and the body of the document.
For example:

<?xml version="1.0" ?>
<!DOCTYPE google:A [
<!ELEMENT google:A (google:B)
<!ATTLIST google:A xmlns:google CDATA #FIXED "http://www.google.org/">
<!ELEMENT google:B EMPTY>
]>
<google:A>
<google:B />
</google:A>

There is no requirement that the same prefix always be used for the same XML namespace. For example, the following is also valid:

<?xml version="1.0" ?>
<!DOCTYPE google:A [
<!ELEMENT google:A (bar:B)>
<!ATTLIST google:A
xmlns:google CDATA #FIXED "http://www.google.org/">
<!ELEMENT bar:B EMPTY>
<!ATTLIST bar:B
xmlns:bar CDATA #FIXED "http://www.google.org/">
]>
<google:A>
<bar:B />
</google:A>

However, documents that use multiple prefixes for the same XML namespace or the same prefix for multiple XML namespaces are confusing to read and thus prone to error. They also allow abuses such as defining an element type or attribute with a given universal name more than once, as was seen earlier. Therefore, a better set of guidelines for writing documents that are both valid and conform to the XML namespaces recommendation is: 
* Declare all xmlns attributes in the DTD.
* Use the same qualified names in the DTD and the body of the document.
* Use one prefix per XML namespace.
* Do not use the same prefix for more than one XML namespace.
* Use at most one default XML namespace.

The latter three guidelines guarantee that prefixes are unique. This means that prefixes fulfill the role normally played by namespace names (URIs) -- uniquely identifying an XML namespace -- and that qualified names are equivalent to universal names, so a given universal name is always represented by the same qualified name. Unfortunately, this is contrary to the spirit of prefixes, which were designed for their flexibility. For a slightly better solution.


Question 17 :

How can I allow the prefixes in my document to be different from the prefixes in my DTD?

Answer :

One of the problems with the solution proposed in question is that it requires the prefixes in the document to match those in the DTD. Fortunately, there is a workaround for this problem, although it does require that a single prefix be used for a particular namespace URI throughout the document. (This is a good practice anyway, so it's not too much of a restriction.) The solution assumes that you are using a DTD that is external to the document, which is common practice. 
To use different prefixes in the external DTD and XML documents, you declare the prefix with a pair of parameter entities in the DTD. You can then override these entities with declarations in the internal DTD in a given XML document. This works because the internal DTD is read before the external DTD and the first definition of a particular entity is the one that is used. The following paragraphs describe how to use a single namespace in your DTD. You will need to modify them somewhat to use multiple namespaces. 
To start with, declare three parameter entities in your DTD:

<!ENTITY % p "" >
<!ENTITY % s "" >
<!ENTITY % nsdecl "xmlns%s;" >

The p entity ("p" is short for "prefix") is used in place of the actual prefix in element type and attribute names. The s entity ("s" is short for "suffix") is used in place of the actual prefix in namespace declarations. The nsdecl entity ("nsdecl" is short for "namespace declaration") is used in place of the name of the xmlns attribute in declarations of that attribute. 
Now use the p entity to define parameter entities for each of the names in your namespace. For example, suppose element type names A, B, and C and attribute name D are in your namespace.

<!ENTITY % A "%p;A">
<!ENTITY % B "%p;B">
<!ENTITY % C "%p;C">
<!ENTITY % D "%p;D">

Next, declare your element types and attributes using the "name" entities, not the actual names. For example:

<!ELEMENT %A; ((%B;)*, %C;)>
<!ATTLIST %A;
%nsdecl; CDATA "http://www.google.org/">
<!ELEMENT %B; EMPTY>
<!ATTLIST %B;
%D; NMTOKEN #REQUIRED
E CDATA #REQUIRED>
<!ELEMENT %C; (#PCDATA)>

There are several things to notice here.
* Attribute D is in a namespace, so it is declared with a "name" entity. Attribute E is not in a namespace, so no entity is used.
* The nsdecl entity is used to declare the xmlns attribute. (xmlns attributes must be declared on every element type on which they can occur.) Note that a default value is given for the xmlns attribute.
* The reference to element type B in the content model of A is placed inside parentheses. The reason for this is that a modifier -- * in this case -- is applied to it. Using parentheses is necessary because the replacement values of parameter entities are padded with spaces; directly applying the modifier to the parameter entity reference would result in illegal syntax in the content model.
For example, suppose the value of the A entity is "google:A", the value of the B entity is "google:B", and the value of the C entity is "google:C". The declaration:
<!ELEMENT %A; (%B;*, %C;)>
would resolve to:
<!ELEMENT google:A ( google:B *, google:C )>

This is illegal because the * modifier must directly follow the reference to the google:B element type. By placing the reference to the B entity in parentheses, the declaration resolves to:

<!ELEMENT google:A (( google:B )*, google:C )>

This is legal because the * modifier directly follows the closing parenthesis.

Now let's see how this all works. Suppose our XML document won't use prefixes, but instead wants the default namespace to be the http://www.google.org/ namespace. In this case, no entity declarations are needed in the document. For example, our document might be:

<!DOCTYPE A SYSTEM "http://www.google.org/google.dtd">
<A>
<B D="bar" E="baz buz" />
<B D="boo" E="biz bez" />
<C>bizbuz</C>
</A>

This document is valid because the declarations for p, s, and nsdecl in the DTD set p and s to "" and nsdecl to "xmlns". That is, after replacing the p, s, and nsdecl parameter entities, the DTD is as follows. Notice that both the DTD and document use the element type names A, B, and C and the attribute names D and E.
<!ELEMENT A (( B )*, C )>
<!ATTLIST A 
xmlns CDATA "http://www.google.org/">
<!ELEMENT B EMPTY>
<!ATTLIST B 
D NMTOKEN #REQUIRED
E CDATA #REQUIRED>
<!ELEMENT C (#PCDATA)>

But what if the document wants to use a different prefix, such as google? In this case, the document must override the declarations of the p and s entities in its internal DTD. That is, it must declare these entities so that they use google as a prefix (followed by a colon) and a suffix (preceded by a colon). For example:

<!DOCTYPE google:A SYSTEM "http://www.google.org/google.dtd" [
<!ENTITY % p "google:">
<!ENTITY % s ":google">
]>
<google:A>
<google:B google:D="bar" E="baz buz" />
<google:B google:D="boo" E="biz bez" />
<google:C>bizbuz</google:C>
</google:A>

In this case, the internal DTD is read before the external DTD, so the values of the p and s entities from the document are used. Thus, after replacing the p, s, and nsdecl parameter entities, the DTD is as follows. Notice that both the DTD and document use the element type names google:A, google:B, and google:C and the attribute names google:D and E.

<!ELEMENT google:A (( google:B )*, google:C )>
<!ATTLIST google:A 
xmlns:google CDATA "http://www.google.org/">
<!ELEMENT google:B EMPTY>
<!ATTLIST google:B 
google:D NMTOKEN #REQUIRED
E CDATA #REQUIRED>
<!ELEMENT google:C (#PCDATA)>


Question 18 :

How can I validate an XML document that uses XML namespaces?

Answer :

When people ask this question, they usually assume that validity is different for documents that use XML namespaces and documents that don't. In fact, it isn't -- it's the same for both. Thus, there is no difference between validating a document that uses XML namespaces and validating one that doesn't. In either case, you simply use a validating parser or other software that performs validation.


Question 19 :

If I start using XML namespaces, do I need to change my existing DTDs?

Answer :

Probably. If you want your XML documents to be both valid and conform to the XML namespaces recommendation, you need to declare any xmlns attributes and use the same qualified names in the DTD as in the body of the document.
If your DTD contains element type and attribute names from a single XML namespace, the easiest thing to do is to use your XML namespace as the default XML namespace. To do this, declare the attribute xmlns (no prefix) for each possible root element type. If you can guarantee that the DTD is always read , set the default value in each xmlns attribute declaration to the URI used as your namespace name. Otherwise, declare your XML namespace as the default XML namespace on the root element of each instance document.
If your DTD contains element type and attribute names from multiple XML namespaces, you need to choose a single prefix for each XML namespace and use these consistently in qualified names in both the DTD and the body of each document. You also need to declare your xmlns attributes in the DTD and declare your XML namespaces. As in the single XML namespace case, the easiest way to do this is add xmlns attributes to each possible root element type and use default values if possible.


Question 20 :

How do I create documents that use XML namespaces?

Answer :

The same as you create documents that don't use XML namespaces. If you're currently using Notepad on Windows or emacs on Linux, you can continue using Notepad or emacs. If you're using an XML editor that is not namespace-aware, you can also continue to use that, as qualified names are legal names in XML documents and xmlns attributes are legal attributes. And if you're using an XML editor that is namespace-aware, it will probably provide features such as automatically declaring XML namespaces and keeping track of prefixes and the default XML namespace for you.