A Simple Introduction To XML

Image icon xml_tree[1].JPG100.63 KB

XML: XML is an acronym for ‘Extensible Markup Language’. It is used for defining data
elements on a webpage and business to business document. XML uses a similar tag
structure as HTML; however, whereas HTML defines how the elements are displayed,
XML defines what those elements contain. While HTML uses predefined tags, XML allows
tags to be defined by developer of the page.


XML is a flexible way to create common information formats and share both the format and
data on the World Wide Web, intranets, and else where.
E.g.: Computer makers might agree on a standard or common way to describe the
information about a computer product ( processor speed, memory size and so forth)
and then describe the product information format with XML.


In general HTML describes the content of a webpage (mainly text and graphic images)
only in terms of how it is to be displayed and interact with. XML describes the content in
the terms of what data is being described.


XML stands for ‘Extensible Markup Language’. XML is a markup language much like
HTML. XML was designed to carry data, not to display data. XML tags are not predefined. You must define your own tags . XML is designed to be self descriptive. XML is a W3C recommendation. XML is designed to transport and store data, with focus on what data is.

What type of data should be stored in XML?

1. The data which is described in hierarchal format.
2. If schema is constantly changing or evolving.
3. Many attributes of data are empty or unknown.
4. If we have a complex structure and if we store such data in relational tables, we will have complicated a relational schema, which means we require many tables. Managing these tables can have overhead. The SQL query to access such data requires joining many tables. If we have to process this data together with other data, the SQL query will be much more complicated . In this case we should store data in XML.
XML is not really a new language, it’s a meta language. It is used to define other languages.

<book_title> XML Programming </book_title>
        <author> Mark Wilson </author>
        <publisher> Manning Publications </publisher>
        <copydata> 2000 </copydata>
        <isbn> 188477872 </isbn>

Entities, Elements and Attributes:
Elements look like this

<book>…<title> XML Programming for VB and ASP developers </title>…</book>

Attributes are the parts or properties or of elements.
<a href = “demo.asp”>

Attributes provide additional information about elements
<file type = “gif” > computer.gif </file>

Any file or web resource that can be included into an XML file is an entity. Entity is also used to refer to special character representations and substitutions of text strings and includes.

An example of using entities to substitute entities for text strings are:

<! ENTITY Book Name “XML programming for VB and ASP developers”>

Now you can use entity – Book Name, in a document where ever you refer to it, the entire string of “XML Programming for VB and ASP developers” will be substituted. In VB this is similar to using the constant.

XML Tree:
XML documents form a tree structure that starts at “the root” and branches to “leaves”. XML documents use a self-describing and simple syntax.
Reference is attached to this article

(HTML style sheets) HTML uses pre defined tags and the meaning of the tags are well understood.
The table element in HTML defines a table and a browser knows how to display it.
Adding style to HTML elements is simple. Telling a browser to display an element in a special font or color, is easy with CSS.

(XML style sheet) In order to display XML documents, it is necessary to have a mechanism to describe how the document should be displayed. One of the mechanisms is Cascading Style Sheet (CSS), but XSL (Extensible Style Sheet Language) is the perfect style sheet language of XML, and XSL is more sophisticated than CSS used by HTML.

What can XSL do?:

XSL ensures that XML documents are formatted the same way no matter which application or platform they appear on. XSL consists of three parts:
1. XSLT— to convert XML documents to other formats such as HTML. (Method for transforming XML documents).
2. XSL-FO— a method for formatting XML documents.
3. XPath— a language for navigating into XML documents.
In general terms we can say XSL as a language that can transform XML into HTML, a language that can filter and sort XML data and language that can format XML data, based on the data value, like displaying negative numbers in red.

XSL can be used to define how an XML file should be displayed by transforming XML file into a format that is recognizable to a browser. One such format is HTML.Normally XSL does this by transforming each XML element into HTML element.

XSL can also add completely new elements into the output file, or remove elements. It can rearrange and sort the elements, test and make decisions about which elements to display and a lot more.

The purpose of a DTD is to define the legal building blockz of an XML document. It defines the document structure with list of legal elements. A DTD can be declared inline in your XML document, or as an external reference.

Example of Internal DTD::

<?xml version="1.0"?>
<!DOCTYPE note [
  <!ELEMENT note    (to,from,heading,body)>
  <!ELEMENT to      (#PCDATA)>
  <!ELEMENT from    (#PCDATA)>
  <!ELEMENT heading (#PCDATA)>
  <!ELEMENT body    (#PCDATA)>
<body>Don't forget me this weekend!</body>

The DTD is interpreted like this::
ELEMENT note (in line 2) defines the element "note" as having four elements: "to,from,heading,body".
!ELEMENT to (in line 3) defines the "to" element to be of the type "CDATA".
!ELEMENT from (in line 4) defines the "from" element to be of the type "CDATA"
and so on.....

Example of an External DTD:

The same XML document is

<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<body>Don't forget me this weekend!</body>

The copy of note.dtd file is as follows

<?xml version="1.0"?>
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT heading (#PCDATA)>

XML documents (and HTML documents) are made up building blocks like
Elements, Tags, Attributes, Entities, PCDATA, and CDATA.

Because a DTD gives a standard format for information related to a specific subject it can be used to simplify the exchange of information between different sources. Many kinds of applications have or will have standard DTDs. This means that systems can use these common DTDs to exchange information with each other, regardless of their internal format. The main applications of this will probably be the exchange of data between companies in the same industry or researchers within an academic field, although many other applications for ordinary users are imaginable.

XML schema is an XML based alternative to DTD. XML schema defines the structure of a XML document. The XML schema language is also referred to XML schema definition (XSD).

An XML Schema (XSD):
• defines elements that can appear in a document
• defines attributes that can appear in a document
• defines which elements are child elements
• defines the order of child elements
• defines the number of child elements
• defines whether an element is empty or can include text
• defines data types for elements and attributes
• defines default and fixed values for elements and attributes
XML schemas are successors of DTD’s
• XML Schemas are extensible to future additions
• XML Schemas are richer and more powerful than DTDs
• XML Schemas are written in XML
• XML Schemas support data types
• XML Schemas support namespaces
E.g.: Example of an XSD (very similar to DTD mentioned above if reference required…)
XML schema file note.xsd

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
<xs:element name="note">
	<xs:element name="to" type="xs:string"/>
	<xs:element name="from" type="xs:string"/>
	<xs:element name="heading" type="xs:string"/>
	<xs:element name="body" type="xs:string"/>

A reference to XML Schema (XSD)

<?xml version="1.0"?>
xsi:schemaLocation="http://www.w3schools.com note.xsd">

<body>Don't forget me this weekend!</body>

XSD has several advantages over earlier XML schema languages, such as document type definition (DTD) or Simple Object XML (SOX). For example, it's more direct: XSD, in contrast to the earlier languages, is written in XML, which means that it doesn't require intermediary processing by a parser. Other benefits include self-documentation, automatic schema creation, and the ability to be queried through XML Transformations (XSLT). Despite the advantages of XSD, it has some detractors who claim, for example, that the language is unnecessarily complex.
Examples of simple elements and their XML are below:

Sample XSD

<xs:element name="Customer_dob"
<xs:element name="Customer_address"
<xs:element name="OrderID"
<xs:element name="Body"

Sample XML

   99 London Road
   (a type can be defined as
   a string but not have any
   content; this is not true
   of all data types, however).

In terms of validation functionality, XSD can define all the constraints that a DTD can define, and many more. To take a simple example, XSD can say that a particular attribute must be a valid date, or a number, or a list of URIs, or a string that is exactly 8 characters long. To take another example, XSD can define much richer constraints on uniqueness of values within a document.

XPATH is a language for finding information in an XML document. XPATH is used to navigate through elements and attributes in an XML document.

XPath is a fourth generation declarative language for locating nodes in XML documents. An XPath location path says which nodes from the document you want. It says nothing about what algorithm is used to find these nodes. You simply pass an XPath statement to a method, and the XPath engine is responsible for figuring out how to find all the nodes satisfying that expression. This is much more robust than writing the detailed search and navigation code yourself using DOM, SAX, or JDOM. XPath searches often succeed even when the document format is not quite what you expected. For example, a comment in the middle of a paragraph of text may break DOM code that expects to see contiguous text. XPath wouldn’t be phased by this. Many XPath expressions are resistant even to much more significant alterations such as changing the names or namespaces of ancestor elements, reordering the children of an element, or even adding or subtracting entire levels from the tree hierarchy.
XPath can be thought of as a query language like SQL. However, rather than extracting information from a database, it extracts information from an XML document.
Weather data in XML

<?xml version="1.0" encoding="ISO-8859-1"?>
<weather time="2002-06-06T15:35:00-05:00">
  <report latitude="41.2° N" longitude="71.6° W">
    <locality>Block Island</locality>
    <temperature units="°C">16</temperature>
    <dewpoint units="°C">14</dewpoint>
      <speed units="km/h">16.1</speed>
      <gust units="km/h">31</gust>
    <pressure units="hPa">1014</pressure>
    <visibility>13 km</visibility>
  <report latitude="34.1° N" longitude="118.4° W">
    <locality>Santa Monica</locality>
    <temperature units="°C">19</temperature>
    <dewpoint units="°C">16</dewpoint>
      <speed units="km/h">14.5</speed>
    <pressure units="hPa">1010</pressure>
    <visibility>5 km</visibility>

Here are some XPath expressions that identify particular parts of this document:
• /weather/report is an XPath expression that selects the two report elements.
• /weather/report[1] is an XPath expression that selects the first report element.
• /weather/report/temperature is an XPath expression that selects the two temperature elements.
• /weather/report[locality="Santa Monica"] is an XPath expression that selects the second report element.
• //report[locality="Block Island"]/attribute::longitude is an XPath expression that selects the longitude attribute of the first report element.
• /child::weather/child::report/child::wind/child::* is an XPath expression that selects all the direction, speed, and gust elements.
• 9 * number(/weather/report[locality="Block Island"]/temperature) div 5 + 32 is an XPath expression that returns the temperature on Block Island in degrees Fahrenheit.
• /descendant::* is an XPath expression that selects all the elements in the document.

[With XSLT you can transform XML document into HTML]
XSLT is an language used for transforming of XML documents into other XML or “human readable” documents. The original document is not changed; rather a new document is created based on the content of the existing one. The new document may be serialized (output) by processor in standard XML syntax or in any other format, such as HTML or plain text.
XSLT is most often used to convert data between XML schemas or to convert XML data into HTML or XHTML documents for web pages, creating a dynamic webpage, or into an intermediate XML format that can be converted to PDF documents.
In general, XSLT is a language used to specify the transformations of XML documents. It takes an XML document and transforms into another XML document. The HTML conversion is simply a special case of XML transformation.
E.g.: Transforming XML to XHTML using XSLT:
The root element that declares the document to be an XSL style sheet is

<xsl: style sheet> 

<xml: transform>

1. Method of declaring XSL style sheet is

<xsl: style sheet version= “1.0”
xmlns:xsl = http://www.w3.org/1999/xsl/transform>
<xsl: transform version= “1.0”
xmlns:xsl = http://www.w3.org/1999/xsl/transform>

2. Sample of RAW XML document (cdcatalog.xml)

<?xml version="1.0" encoding="ISO-8859-1" ?> 
  <title>Empire Burlesque</title> 
  <artist>Bob Dylan</artist> 
  . . . 

3. Style sheet (cdcatalog.xsl)

<?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/">
    <h2>My CD Collection</h2>
    <table border="1">
    <tr bgcolor="#9acd32">
      <th align="left">Title</th>
      <th align="left">Artist</th>
    <xsl:for-each select="catalog/cd">
      <td><xsl:value-of select="title"/></td>
      <td><xsl:value-of select="artist"/></td>

4. Linking xsl style sheet to xml document (cdcatalog1.xml)

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?>
    <title>Empire Burlesque</title>
    <artist>Bob Dylan</artist>

Linking in XML is divided into two parts: XLink and XPointer. XLink defines a standard way of creating hyperlinks in XML documants. XPointer allows the hyperlinks to point to more specific parts (fragments) in the XML document.
XPointer is a language for locating data within an Extensible Markup Language (XML) document based on properties such as location within the document, character content, and attribute values. XPointer consists of a description that comes after the #symbol in a Uniform Resource Locator(URL). XPointer can be used alone or together with XPath for locating data within an XML document.
In Hypertext Markup Language (HTML), the # symbol enables linking to a specific marked point within an HTML page. XPointer allows linking to a point based on content as well. In this way, a reader can, for example, be enabled to link to the next instance of a certain word, phrase, or sequence of characters within an XML document.
XML Pointer Language (XPointer) allows addressing the internal structures of XML documents. It allows for examination of a hierarchical document structure and choice of its internal parts based on various properties, such as element types, attribute values, character content, and relative position.

What is XPointer?:
• A tool to identify resources within an XML document
• Has a string-based syntax.
• Can point to things that don't have IDs
• Technically, an extension of XPath
• Syntax for fragment identifiers in XML

XPointer Examples:
• #xml
• #/1/2/5
• #xpointer(//prod[@num='22'])
• #xpointer(//body/para[1]/citetitle[1])
• #xpointer(string-range(//text(), 'prolog')
• #xpointer(//prod[@num='1']/range-to(//prod[@num='22']))
The expression: #xpointer(id("Rottweiler")) refers to the element in the target document, with the id value of "Rottweiler".
So the xlink:href attribute would look like this: xlink:href="http://dog.com/dogbreeds.xml#xpointer(id('Rottweiler'))"
However, XPointer allows a shorthand form when linking to an element with an id. You can use the value of the id directly, like this: xlink:href="http://dog.com/dogbreeds.xml#Rottweiler"

<?xml version="1.0" encoding="ISO-8859-1"?>
<mydogs xmlns:xlink="http://www.w3.org/1999/xlink">
<mydog xlink:type="simple"
  <description xlink:type="simple"
  Anton is my favorite dog. He has won a lot of.....
<mydog xlink:type="simple"
  <description xlink:type="simple"
  Pluto is the sweetest dog on earth......

XQuery is a query language (with some programming language features) that is designed to query collections of XML data. It is semantically similar to SQL.

XML Query (XQuery) is the language for querying XML data. The best way to explain XQuery is to XML what SQL is to database.

Suppose sample data in a file resolution.xml looks as follows:

<resolution dms-id="42" public-private="public"> 
<committee-name>Committee on International Relations 
<official-title>Welcoming the accession of Bulgaria, Estonia, Latvia, Lithuania, Romania, Slovakia, and Slovenia to the North Atlantic Treaty Organization (NATO), and for other purposes. 
<paragraph>welcomes with enthusiasm the accession of Bulgaria, Estonia, Latvia, Lithuania, Romania, Slovakia, and Slovenia to the North Atlantic Treaty Organization (NATO);</paragraph> 
<paragraph>reaffirms that the process of NATO enlargement enhances the security of the United States and the entire North Atlantic area;</paragraph> 
<paragraph>agrees that the process of NATO enlargement should be open to potential membership by any interested European democracy that meets the criteria for NATO membership as set forth in the 1995 Study on NATO Enlargement and whose admission would further the principles of the Washington Treaty of 1949 and would enhance security in the North Atlantic area; and</paragraph> 
<paragraph>recommends that NATO heads of state and government should review the enlargement process, including the applications of Albania, Croatia, and Macedonia, at a summit meeting to be held no later than 2007.</paragraph> 

XQuery a complete language for querying XML encompasses XPath . The core expression in XQuery is the FLWOR (pronounced "flower") expression. FLWOR stands for "for ... let ... where ... order by ... return", is the general shape of a FLWOR expression.

Simple XQuery :

for $r in doc("/public/oow04/resolution.xml")/resolution 
let $a := $r/action 
where $a/action-date="20040311" 
order by $r/legis-num ascending 

Results of XQuery:


Simple XQuery says:

• for – iterate over each resolution in the document "/public/oow04/resolution.xml". Note this may be in the file system, or on the internet, or in a database.
• let – bind $r/action to the variable $a. This gives XQuery a convenient way to "bookmark" a point in the XML structure, to be easily referred to later in the query.
• where – for each resolution specified by the "for" clause, select only those resolutions where resolution/action/action-date equals "20040311".
• order by – produce results ordered by resolution/legis-num
• return – return the result. Note that you can construct XML elements - such as the new "all-sponsors" element - explicitly in the return clause. If you do, you need to delineate XQuery expressions inside constructed elements with squiggly brackets {}.

Simple XMLTABLE in SQL/XML is a simple SQL/XML query using the XMLTable function. XMLTable takes in an XPath or XQuery string, and returns the result as a table. This result can be included in the from clause of an SQL query, or it can be used to create an SQL view.


select * from 
xmltable('for $r in 
let $a := $r/action 
where $a/action-date="20040311" 
order by $r/legis-num ascending 



Comparision of CSS and XSL:

Cascading Style Sheets (CSS) allows us to specify the style of our page elements (spacing, margins etc.) separately from the structure of our document (section headers, body text, links etc.).
Extensible Stylesheet Language (XSL) is used to format XML documents and consists of two parts. The first part of XSL - (XSL Transformation Language), transforms an XML document from one form to another. The second part of XSL is XSL formatting objects, which provide an alternative to CSS for formatting and styling an XML document.
In particular cases XSL might be necessary to achieve a very sophisticated layout, but for the majority of documents printed or streamed over the Web CSS is the superior choice.
According to W3C Use CSS when you can, use XSL when you must! In particular cases XSL might be necessary to achieve a very sophisticated layout, but for the majority of documents printed or streamed over the Web CSS is the superior choice.
CSS can be used with HTML and XML and it is not a transformation language. XSL cannot be used with HTML but can be used with XML and it is a transformation language.

Comparing DTD and XSD:

DTD defines the elements that may be included in your document, what attributes these elements have, and the ordering and nesting of the elements.
The DTD is declared in a DOCTYPE declaration beneath the XML declaration contained within an XML document.

XSD (or) XML Schemas provide a much more powerful means by which to define your XML document structure and limitations. XML Schemas are themselves XML documents. They reference the XML Schema Namespace. XML Schemas provide an Object Oriented approach to defining the format of an XML document. XML Schemas provide a set of basic types. These types are much wider ranging than the basic PCDATA and CDATA of DTDs. They include most basic programming types such as integer, byte, string and floating point numbers, but they also expand into Internet data types such as ISO country and language codes.

Comparison between DOM and SAX and various parsers in market and vendors who provide it:
(DOM) The document object model is a platform – and language –neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style documents. The document can be further processed and the results of the processing can be incorporated back to the present page. The XML DOM defines a standard way for accessing and manipulating XML documents. The DOM presents an XML document as a tree structure, with elements, attributes, and text as nodes. (Tree Based Approach to navigate XML document)

(SAX) Simple API for XML, originally a Java-only API. SAX was the first widely adopted API for XML in Java, and is “de facto’ standard. SAX is a serial access parser API for XML. SAX provides a mechanism for reading data from an XML document. It is a popular alternative to the Document Object Model (DOM). ( The SAX specification defines an event based approach where by parsers scan through XML data, calling handler functions whenever certain parts of the document e.g., text nodes or processing instructions) are found.

Parser: Most browsers have built in XML parser to read and manipulate XML. The parser converts XML into a Java Script accessible language.

DOM parser: DOM parser processes XML data and create an object – oriented hierarchical representation of the document that you can navigate at run time.

SAX parser: SAX parser doesn’t create any internal representation of the document. Instead, the parser calls handler function when certain events (defined by the SAX specification) take place. These events include start and the end of the document, finding a text node, finding child elements, and hitting malformed element.

Comparison between DOM and SAX:

• Tree approach is useful for small documents in which the program needs to process large portion of the document
• SAX parsers generally require you to write a bit more code than DOM interface.
• Unless you build a DOM style tree for your application’s internal representation for the data, you can’t as easily write the XML file back to the disk.
• DOM parses an XML document and returns the instance of org.w3c.dom.Document. This document object’s tree must then be “walked” in order to process the different elements.
• Stores the entire XML doument into the memory before processing.
• Occupies more memory.
• We can insert or delete the modes
• Traverse in any direction

• Event driven approach is useful for large documents in which program only needs to process a small portion of the documents.
• If you use DOM to construct the tree, extract the data and throw away the tree, then SAX might me more efficient.
• The DOM tree is not constructed, so there are potentially less memory allocation.
• If you convert the data in the DOM tree to another format, the SAP API may help remove the intermediate step.
• If you do not need all the XML data in the memory, the SAX API allows you to process the data as it is parsed.
• SAX uses an event call back mechanism requiring you to code methods to handle events thrown by the parser as it encounters different entities within the XML document.
• Parses node by node.
• Doesn’t store the XML in memory.
• We can’t insert or delete a node.
• Top to bottom traversing.