Question 1 :
Using XSLT, how would you extract a specific attribute from an element in an XML document?
Successful candidates should recognize this as one of the most basic applications of XSLT. If they are not able to construct a reply similar to the example below, they should at least be able to identify the components necessary for this operation: xsl:template to match the appropriate XML element, xsl:value-of to select the attribute value, and the optional xsl:apply-templates to continue processing the document.
Extract Attributes from XML Data
<xsl:value-of select="@attribute"/> <xsl:apply-templates/> </xsl:template>
Question 2 :
What is the difference between XML and C or C++ or Java ?
C and C++ (and other languages like FORTRAN, or Pascal, or Visual Basic, or Java or hundreds more) are programming languages with which you specify calculations, actions, and decisions to be carried out in order:
mod curconfig[if left(date,6) = "01-Apr", t.put "April googlel!", f.put days('31102005','DDMMYYYY') - days(sdate,'DDMMYYYY') " more shopping days to Samhain"];
XML is a markup specification language with which you can design ways of describing information (text or data), usually for storage, transmission, or processing by a program. It says nothing about what you should do with the data (although your choice of element names may hint at what they are for):
<part num="DA42" models="LS AR DF HG KJ" update="2001-11-22"> <name>Camshaft end bearing retention circlip</name> <image drawing="RR98-dh37" type="SVG" x="476" y="226"/> <maker id="RQ778">Ringtown Fasteners Ltd</maker> <notes>Angle-nosed insertion tool <tool id="GH25"/> is required for the removal and replacement of this part.</notes> </part>
On its own, an SGML or XML file (including HTML) doesn't do anything. It's a data format which just sits there until you run a program which does something with it.
Question 3 :
What does an XML document actually look like (inside)?
The basic structure of XML is similar to other applications of SGML, including HTML. The basic components can be seen in the following examples. An XML document starts with a Prolog:
1. The XML Declaration
which specifies that this is an XML document;
2. Optionally a Document Type Declaration
which identifies the type of document and says where the Document Type Description (DTD) is stored;
The Prolog is followed by the document instance:
1. A root element, which is the outermost (top level) element (start-tag plus end-tag) which encloses everything else: in the examples below the root elements are conversation and titlepage;
2. A structured mix of descriptive or prescriptive elements enclosing the character data content (text), and optionally any attributes ('name=value' pairs) inside some start-tags.
XML documents can be very simple, with straightforward nested markup of your own design:
<?xml version="1.0" standalone="yes"?> <conversation><br> <greeting>Hello, world!</greeting> <response>Stop the planet, I want to get off!</response> </conversation>
Or they can be more complicated, with a Schema or question C.11, Document Type Description (DTD) or internal subset (local DTD changes in [square brackets]), and an arbitrarily complex nested structure:
<?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE titlepage SYSTEM "http://www.google.bar/dtds/typo.dtd" [<!ENTITY % active.links "INCLUDE">]><titlepage id="BG12273624"> <white-space type="vertical" amount="36"/> <title font="Baskerville" alignment="centered"size="24/30">Hello, world!</title> <white-space type="vertical" amount="12"/> <!-- In some copies the following decoration is hand-colored, presumably by the author --> <image location="http://www.google.bar/fleuron.eps" type="URI" alignment="centered"/> <white-space type="vertical" amount="24"/> <author font="Baskerville" size="18/22" style="italic">Vitam capias</author> <white-space type="vertical" role="filler"/> </titlepage>
Or they can be anywhere between: a lot will depend on how you want to define your document type (or whose you use) and what it will be used for. Database-generated or program-generated XML documents used in e-commerce is usually unformatted (not for human reading) and may use very long names or values, with multiple redundancy and sometimes no character data content at all, just values in attributes:
<?xml version="1.0"?> <ORDER-UPDATE AUTHMD5="4baf7d7cff5faa3ce67acf66ccda8248" ORDER-UPDATE-ISSUE="193E22C2-EAF3-11D9-9736-CAFC705A30B3" ORDER-UPDATE-DATE="2005-07-01T15:34:22.46" ORDER-UPDATE-DESTINATION="6B197E02-EAF3-11D9-85D5-997710D9978F" ORDER-UPDATE-ORDERNO="8316ADEA-EAF3-11D9-9955-D289ECBC99F3"> <ORDER-UPDATE-DELTA-MODIFICATION-DETAIL ORDER-UPDATE-ID="BAC352437484"> <ORDER-UPDATE-DELTA-MODIFICATION-VALUE ORDER-UPDATE-ITEM="56" ORDER-UPDATE-QUANTITY="2000"/> </ORDER-UPDATE-DELTA-MODIFICATION-DETAIL> </ORDER-UPDATE>
Question 4 :
How does XML handle white-space in my documents?
All white-space, including linebreaks, TAB characters, and normal spaces, even between 'structural' elements where no text can ever appear, is passed by the parser unchanged to the application (browser, formatter, viewer, converter, etc), identifying the context in which the white-space was found (element content, data content, or mixed content, if this information is available to the parser, eg from a DTD or Schema). This means it is the application's responsibility to decide what to do with such space, not the parser's:
* insignificant white-space between structural elements (space which occurs where only element content is allowed, ie between other elements, where text data never occurs) will get passed to the application (in SGML this white-space gets suppressed, which is why you can put all that extra space in HTML documents and not worry about it)
* significant white-space (space which occurs within elements which can contain text and markup mixed together, usually mixed content or PCDATA) will still get passed to the application exactly as under SGML. It is the application's responsibility to handle it correctly.
The parser must inform the application that white-space has occurred in element content, if it can detect it. (Users of SGML will recognize that this information is not in the ESIS, but it is in the Grove.)
<chapter> <title> My title for Chapter 1. </title> <para> text </para> </chapter>
In the example above, the application will receive all the pretty-printing linebreaks, TABs, and spaces between the elements as well as those embedded in the chapter title. It is the function of the application, not the parser, to decide which type of white-space to discard and which to retain. Many XML applications have configurable options to allow programmers or users to control how such white-space is handled.
Question 5 :
Which parts of an XML document are case-sensitive?
All of it, both markup and text. This is significantly different from HTML and most other SGML applications. It was done to allow markup in non-Latin-alphabet languages, and to obviate problems with case-folding in writing systems which are caseless.
* Element type names are case-sensitive: you must follow whatever combination of upper- or lower-case you use to define them (either by first usage or in a DTD or Schema). So you can't say <BODY>…</body>: upper- and lower-case must match; thus <Img/>, <IMG/>, and <img/> are three different element types;
* For well-formed XML documents with no DTD, the first occurrence of an element type name defines the casing;
* Attribute names are also case-sensitive, for example the two width attributes in <PIC width="7in"/> and <PIC WIDTH="6in"/> (if they occurred in the same file) are separate attributes, because of the different case of width and WIDTH;
* Attribute values are also case-sensitive. CDATA values (eg Url="MyFile.SGML") always have been, but NAME types (ID and IDREF attributes, and token list attributes) are now case-sensitive as well;
* All general and parameter entity names (eg Á), and your data content (text), are case-sensitive as always.