The D Programming Language

Warning: This module is considered out-dated and not up to Phobos' current standards. It will remain until we have a suitable replacement, but be aware that it will not remain long term.

Classes and functions for creating and parsing XML

The basic architecture of this module is that there are standalone functions, classes for constructing an XML document from scratch (Tag, Element and Document), and also classes for parsing a pre-existing XML file (ElementParser and DocumentParser). The parsing classes may be used to build a Document, but that is not their primary purpose. The handling capabilities of DocumentParser and ElementParser are sufficiently customizable that you can make them do pretty much whatever you want.

Example:
This example creates a DOM (Document Object Model) tree from an XML file.
import std.xml;
import std.stdio;
import std.string;
import std.file;

// books.xml is used in various samples throughout the Microsoft XML Core

// Services (MSXML) SDK.

//

// See http://msdn2.microsoft.com/en-us/library/ms762271(VS.85).aspx


void main()
{
    string s = cast(string)std.file.read("books.xml");

    // Check for well-formedness

    check(s);

    // Make a DOM tree

    auto doc = new Document(s);

    // Plain-print it

    writeln(doc);
}
Example:
This example does much the same thing, except that the file is deconstructed and reconstructed by hand. This is more work, but the techniques involved offer vastly more power.
import std.xml;
import std.stdio;
import std.string;

struct Book
{
    string id;
    string author;
    string title;
    string genre;
    string price;
    string pubDate;
    string description;
}

void main()
{
    string s = cast(string)std.file.read("books.xml");

    // Check for well-formedness

    check(s);

    // Take it apart

    Book[] books;

    auto xml = new DocumentParser(s);
    xml.onStartTag["book"] = (ElementParser xml)
    {
        Book book;
        book.id = xml.tag.attr["id"];

        xml.onEndTag["author"]       = (in Element e) { book.author      = e.text(); };
        xml.onEndTag["title"]        = (in Element e) { book.title       = e.text(); };
        xml.onEndTag["genre"]        = (in Element e) { book.genre       = e.text(); };
        xml.onEndTag["price"]        = (in Element e) { book.price       = e.text(); };
        xml.onEndTag["publish-date"] = (in Element e) { book.pubDate     = e.text(); };
        xml.onEndTag["description"]  = (in Element e) { book.description = e.text(); };

        xml.parse();

        books ~= book;
    };
    xml.parse();

    // Put it back together again;

    auto doc = new Document(new Tag("catalog"));
    foreach(book;books)
    {
        auto element = new Element("book");
        element.tag.attr["id"] = book.id;

        element ~= new Element("author",      book.author);
        element ~= new Element("title",       book.title);
        element ~= new Element("genre",       book.genre);
        element ~= new Element("price",       book.price);
        element ~= new Element("publish-date",book.pubDate);
        element ~= new Element("description", book.description);

        doc ~= element;
    }

    // Pretty-print it

    writefln(join(doc.pretty(3),"\n"));
}
License
Boost License 1.0.
Authors
Janice Caron
Source:
std/xml.d

bool  isChar(dchar c);

Returns true if the character is a character according to the XML standard

Standards
XML 1.0
Parameters
dchar c the character to be tested

bool  isSpace(dchar c);

Returns true if the character is whitespace according to the XML standard

Only the following characters are considered whitespace in XML - space, tab, carriage return and linefeed

Standards
XML 1.0
Parameters
dchar c the character to be tested

bool  isDigit(dchar c);

Returns true if the character is a digit according to the XML standard

Standards
XML 1.0
Parameters
dchar c the character to be tested

bool  isLetter(dchar c);

Returns true if the character is a letter according to the XML standard

Standards
XML 1.0
Parameters
dchar c the character to be tested

bool  isIdeographic(dchar c);

Returns true if the character is an ideographic character according to the XML standard

Standards
XML 1.0
Parameters
dchar c the character to be tested

bool  isBaseChar(dchar c);

Returns true if the character is a base character according to the XML standard

Standards
XML 1.0
Parameters
dchar c the character to be tested

bool  isCombiningChar(dchar c);

Returns true if the character is a combining character according to the XML standard

Standards
XML 1.0
Parameters
dchar c the character to be tested

bool  isExtender(dchar c);

Returns true if the character is an extender according to the XML standard

Standards
XML 1.0
Parameters
dchar c the character to be tested

S  encode(S)(S s);

Encodes a string by replacing all characters which need to be escaped with appropriate predefined XML entities.

 encode() escapes certain characters (ampersand, quote, apostrophe, less-than and greater-than), and similarly, decode() unescapes them. These functions are provided for convenience only. You do not need to use them when using the std.xml classes, because then all the encoding and decoding will be done for you automatically.

If the string is not modified, the original will be returned.

Standards
XML 1.0
Parameters
S s The string to be encoded
Returns
The encoded string
Examples
writefln(encode("a > b")); // writes "a > b"


enum  DecodeMode: int;

Mode to use for decoding.

NONE
Do not decode
LOOSE
Decode, but ignore errors
STRICT
Decode, and throw exception on error


string  decode(string s, DecodeMode mode = DecodeMode.LOOSE);

Decodes a string by unescaping all predefined XML entities.

encode() escapes certain characters (ampersand, quote, apostrophe, less-than and greater-than), and similarly,  decode() unescapes them. These functions are provided for convenience only. You do not need to use them when using the std.xml classes, because then all the encoding and decoding will be done for you automatically.

This function decodes the entities &, ", ', < and &gt, as well as decimal and hexadecimal entities such as €

If the string does not contain an ampersand, the original will be returned.

Note that the "mode" parameter can be one of DecodeMode.NONE (do not  decode), DecodeMode.LOOSE ( decode, but ignore errors), or DecodeMode.STRICT ( decode, and throw a DecodeException in the event of an error).

Standards
XML 1.0
Parameters
string s The string to be decoded
DecodeMode mode (optional) Mode to use for decoding. (Defaults to LOOSE).
Throws
DecodeException if mode == DecodeMode.STRICT and  decode fails
Returns
The decoded string
Examples
writefln(decode("a > b")); // writes "a > b"


class  Document: std.xml.Element;

Class representing an XML document.

Standards
XML 1.0

string  prolog;

Contains all text which occurs before the root element. Defaults to <?xml version="1.0"?>


string  epilog;

Contains all text which occurs after the root element. Defaults to the empty string


this(string s);

Constructs a Document by parsing XML text.

This function creates a complete DOM (Document Object Model) tree.

The input to this function MUST be valid XML. This is enforced by DocumentParser's in contract.

Parameters
string s the complete XML text.

this(const(Tag) tag);

Constructs a Document from a Tag.

Parameters
const(Tag) tag the start tag of the document.

const bool  opEquals(Object o);

Compares two Documents for equality

Examples
Document d1,d2;
if (d1 == d2) { }

const int  opCmp(Object o);

Compares two Documents

You should rarely need to call this function. It exists so that Documents can be used as associative array keys.

Examples
Document d1,d2;
if (d1 < d2) { }

const nothrow @trusted size_t  toHash();

Returns the hash of a Document

You should rarely need to call this function. It exists so that Documents can be used as associative array keys.


const string  toString();

Returns the string representation of a Document. (That is, the complete XML of a document).


class  Element: std.xml.Item;

Class representing an XML element.

Standards
XML 1.0

Tag  tag;

The start  tag of the element


Item[]  items;

The element's  items


Text[]  texts;

The element's text items


CData[]  cdatas;

The element's CData items


Comment[]  comments;

The element's  comments


ProcessingInstruction[]  pis;

The element's processing instructions


Element[]  elements;

The element's child  elements


this(string name, string interior = null);

Constructs an Element given a name and a string to be used as a Text interior.

Parameters
string name the name of the element.
string interior (optional) the string interior.
Examples
auto element = new Element("title","Serenity")
    // constructs the element <title>Serenity</title>


this(const(Tag) tag_);

Constructs an Element from a Tag.

Parameters
const(Tag) tag_ the start or empty tag of the element.

void  opCatAssign(Text item);

Append a text item to the interior of this element

Parameters
Text item the item you wish to append.
Examples
Element element;
element ~= new Text("hello");

void  opCatAssign(CData item);

Append a CData item to the interior of this element

Parameters
CData item the item you wish to append.
Examples
Element element;
element ~= new CData("hello");

void  opCatAssign(Comment item);

Append a comment to the interior of this element

Parameters
Comment item the item you wish to append.
Examples
Element element;
element ~= new Comment("hello");

void  opCatAssign(ProcessingInstruction item);

Append a processing instruction to the interior of this element

Parameters
ProcessingInstruction item the item you wish to append.
Examples
Element element;
element ~= new ProcessingInstruction("hello");

void  opCatAssign(Element item);

Append a complete element to the interior of this element

Parameters
Element item the item you wish to append.
Examples
Element element;
Element other = new Element("br");
element ~= other;
   // appends element representing <br />


bool  opEquals(Object o);

Compares two Elements for equality

Examples
Element e1,e2;
if (e1 == e2) { }

int  opCmp(Object o);

Compares two Elements

You should rarely need to call this function. It exists so that Elements can be used as associative array keys.

Examples
Element e1,e2;
if (e1 < e2) { }

const nothrow @safe size_t  toHash();

Returns the hash of an Element

You should rarely need to call this function. It exists so that Elements can be used as associative array keys.


const string  text(DecodeMode mode = DecodeMode.LOOSE);

Returns the decoded interior of an element.

The element is assumed to containt  text only. So, for example, given XML such as "<title>Good &amp; Bad</title>", will return "Good & Bad".

Parameters
DecodeMode mode (optional) Mode to use for decoding. (Defaults to LOOSE).
Throws
DecodeException if decode fails

const string[]  pretty(uint indent = 2);

Returns an indented string representation of this item

Parameters
uint indent (optional) number of spaces by which to indent this element. Defaults to 2.

const string  toString();

Returns the string representation of an Element

Examples
auto element = new Element("br");
writefln(element.toString()); // writes "<br />"


enum  TagType: int;

Tag types.

START
Used for start tags
END
Used for end tags
EMPTY
Used for empty tags


class  Tag;

Class representing an XML tag.

Standards
XML 1.0

The class invariant guarantees
  • that type is a valid enum TagType value
  • that name consists of valid characters
  • that each attribute name consists of valid characters

TagType  type;

Type of tag


string  name;

Tag  name


string[string]  attr;

Associative array of attributes


this(string name, TagType type = TagType.START);

Constructs an instance of Tag with a specified name and type

The constructor does not initialize the attributes. To initialize the attributes, you access the attr member variable.

Parameters
string name the Tag's name
TagType type (optional) the Tag's type. If omitted, defaults to TagType.START.
Examples
auto tag = new Tag("img",Tag.EMPTY);
tag.attr["src"] = "http://example.com/example.jpg";

const bool  opEquals(Object o);

Compares two Tags for equality

You should rarely need to call this function. It exists so that Tags can be used as associative array keys.

Examples
Tag tag1,tag2
if (tag1 == tag2) { }

const int  opCmp(Object o);

Compares two Tags

Examples
Tag tag1,tag2
if (tag1 < tag2) { }

const nothrow @safe size_t  toHash();

Returns the hash of a Tag

You should rarely need to call this function. It exists so that Tags can be used as associative array keys.


const string  toString();

Returns the string representation of a Tag

Examples
auto tag = new Tag("book",TagType.START);
writefln(tag.toString()); // writes "<book>"


const @property bool  isStart();

Returns true if the Tag is a start tag

Examples
if (tag.isStart) { }

const @property bool  isEnd();

Returns true if the Tag is an end tag

Examples
if (tag.isEnd) { }

const @property bool  isEmpty();

Returns true if the Tag is an empty tag

Examples
if (tag.isEmpty) { }

class  Comment: std.xml.Item;

Class representing a comment


this(string content);

Construct a comment

Parameters
string content the body of the comment
Throws
CommentException if the comment body is illegal (contains "--" or exactly equals "-")
Examples
auto item = new Comment("This is a comment");
   // constructs <!--This is a comment-->


bool  opEquals(Object o);

Compares two comments for equality

Examples
Comment item1,item2;
if (item1 == item2) { }

int  opCmp(Object o);

Compares two comments

You should rarely need to call this function. It exists so that Comments can be used as associative array keys.

Examples
Comment item1,item2;
if (item1 < item2) { }

const nothrow @safe size_t  toHash();

Returns the hash of a Comment

You should rarely need to call this function. It exists so that Comments can be used as associative array keys.


const string  toString();

Returns a string representation of this comment


const @property bool  isEmptyXML();

Returns false always


class  CData: std.xml.Item;

Class representing a Character Data section


this(string content);

Construct a chraracter data section

Parameters
string content the body of the character data segment
Throws
CDataException if the segment body is illegal (contains "]]>")
Examples
auto item = new CData("<b>hello</b>");
   // constructs <![CDATA[<b>hello</b>]]>


bool  opEquals(Object o);

Compares two CDatas for equality

Examples
CData item1,item2;
if (item1 == item2) { }

int  opCmp(Object o);

Compares two CDatas

You should rarely need to call this function. It exists so that CDatas can be used as associative array keys.

Examples
CData item1,item2;
if (item1 < item2) { }

const nothrow @safe size_t  toHash();

Returns the hash of a CData

You should rarely need to call this function. It exists so that CDatas can be used as associative array keys.


const string  toString();

Returns a string representation of this CData section


const @property bool  isEmptyXML();

Returns false always


class  Text: std.xml.Item;

Class representing a text (aka Parsed Character Data) section


this(string content);

Construct a text (aka PCData) section

Parameters
string content the text. This function encodes the text before insertion, so it is safe to insert any text
Examples
auto Text = new CData("a < b");
   // constructs a &lt; b


bool  opEquals(Object o);

Compares two text sections for equality

Examples
Text item1,item2;
if (item1 == item2) { }

int  opCmp(Object o);

Compares two text sections

You should rarely need to call this function. It exists so that Texts can be used as associative array keys.

Examples
Text item1,item2;
if (item1 < item2) { }

const nothrow @safe size_t  toHash();

Returns the hash of a text section

You should rarely need to call this function. It exists so that Texts can be used as associative array keys.


const string  toString();

Returns a string representation of this Text section


const @property bool  isEmptyXML();

Returns true if the content is the empty string


class  XMLInstruction: std.xml.Item;

Class representing an XML Instruction section


this(string content);

Construct an XML Instruction section

Parameters
string content the body of the instruction segment
Throws
XIException if the segment body is illegal (contains ">")
Examples
auto item = new XMLInstruction("ATTLIST");
   // constructs <!ATTLIST>


bool  opEquals(Object o);

Compares two XML instructions for equality

Examples
XMLInstruction item1,item2;
if (item1 == item2) { }

int  opCmp(Object o);

Compares two XML instructions

You should rarely need to call this function. It exists so that XmlInstructions can be used as associative array keys.

Examples
XMLInstruction item1,item2;
if (item1 < item2) { }

const nothrow @safe size_t  toHash();

Returns the hash of an XMLInstruction

You should rarely need to call this function. It exists so that XmlInstructions can be used as associative array keys.


const string  toString();

Returns a string representation of this XmlInstruction


const @property bool  isEmptyXML();

Returns false always


class  ProcessingInstruction: std.xml.Item;

Class representing a Processing Instruction section


this(string content);

Construct a Processing Instruction section

Parameters
string content the body of the instruction segment
Throws
PIException if the segment body is illegal (contains "?>")
Examples
auto item = new ProcessingInstruction("php");
   // constructs <?php?>


bool  opEquals(Object o);

Compares two processing instructions for equality

Examples
ProcessingInstruction item1,item2;
if (item1 == item2) { }

int  opCmp(Object o);

Compares two processing instructions

You should rarely need to call this function. It exists so that ProcessingInstructions can be used as associative array keys.

Examples
ProcessingInstruction item1,item2;
if (item1 < item2) { }

const nothrow @safe size_t  toHash();

Returns the hash of a ProcessingInstruction

You should rarely need to call this function. It exists so that ProcessingInstructions can be used as associative array keys.


const string  toString();

Returns a string representation of this ProcessingInstruction


const @property bool  isEmptyXML();

Returns false always


abstract class  Item;

Abstract base class for XML items


abstract bool  opEquals(Object o);

Compares with another Item of same type for equality


abstract int  opCmp(Object o);

Compares with another Item of same type


abstract const nothrow @safe size_t  toHash();

Returns the hash of this item


abstract const string  toString();

Returns a string representation of this item


const string[]  pretty(uint indent);

Returns an indented string representation of this item

Parameters
uint indent number of spaces by which to indent child elements

abstract const @property bool  isEmptyXML();

Returns true if the item represents empty XML text


class  DocumentParser: std.xml.ElementParser;

Class for parsing an XML Document.

This is a subclass of ElementParser. Most of the useful functions are documented there.

Standards
XML 1.0
Known Bugs
Currently only supports UTF documents.

If there is an encoding attribute in the prolog, it is ignored.

this(string xmlText_);

Constructs a DocumentParser.

The input to this function MUST be valid XML. This is enforced by the function's in contract.

Parameters
string xmlText_ the entire XML document as text

class  ElementParser;

Class for parsing an XML element.

Standards
XML 1.0

Note that you cannot construct instances of this class directly. You can construct a DocumentParser (which is a subclass of  ElementParser), but otherwise, Instances of  ElementParser will be created for you by the library, and passed your way via onStartTag handlers.

const @property const(Tag)  tag();

The Tag at the start of the element being parsed. You can read this to determine the  tag's name and attributes.


ParserHandler[string]  onStartTag;

Register a handler which will be called whenever a start tag is encountered which matches the specified name. You can also pass null as the name, in which case the handler will be called for any unmatched start tag.

Examples
// Call this function whenever a <podcast> start tag is encountered

onStartTag["podcast"] = (ElementParser xml)
{
    // Your code here

    //

    // This is a a closure, so code here may reference

    // variables which are outside of this scope

};

// call myEpisodeStartHandler (defined elsewhere) whenever an <episode>

// start tag is encountered

onStartTag["episode"] = &myEpisodeStartHandler;

// call delegate dg for all other start tags

onStartTag[null] = dg;


This library will supply your function with a new instance of ElementHandler, which may be used to parse inside the element whose start tag was just found, or to identify the tag attributes of the element, etc.

Note that your function will be called for both start tags and empty tags. That is, we make no distinction between <br></br> and <br/>.

ElementHandler[string]  onEndTag;

Register a handler which will be called whenever an end tag is encountered which matches the specified name. You can also pass null as the name, in which case the handler will be called for any unmatched end tag.

Examples
// Call this function whenever a </podcast> end tag is encountered

onEndTag["podcast"] = (in Element e)
{
    // Your code here

    //

    // This is a a closure, so code here may reference

    // variables which are outside of this scope

};

// call myEpisodeEndHandler (defined elsewhere) whenever an </episode>

// end tag is encountered

onEndTag["episode"] = &myEpisodeEndHandler;

// call delegate dg for all other end tags

onEndTag[null] = dg;


Note that your function will be called for both start tags and empty tags. That is, we make no distinction between <br></br> and <br/>.

@property void  onText(Handler handler);

Register a handler which will be called whenever text is encountered.

Examples
// Call this function whenever text is encountered

onText = (string s)
{
    // Your code here


    // The passed parameter s will have been decoded by the time you see

    // it, and so may contain any character.

    //

    // This is a a closure, so code here may reference

    // variables which are outside of this scope

};

void  onTextRaw(Handler handler);

Register an alternative handler which will be called whenever text is encountered. This differs from onText in that onText will decode the text, wheras  onTextRaw will not. This allows you to make design choices, since onText will be more accurate, but slower, while  onTextRaw will be faster, but less accurate. Of course, you can still call decode() within your handler, if you want, but you'd probably want to use  onTextRaw only in circumstances where you know that decoding is unnecessary.

Examples
// Call this function whenever text is encountered

onText = (string s)
{
    // Your code here


    // The passed parameter s will NOT have been decoded.

    //

    // This is a a closure, so code here may reference

    // variables which are outside of this scope

};

@property void  onCData(Handler handler);

Register a handler which will be called whenever a character data segement is encountered.

Examples
// Call this function whenever a CData section is encountered

onCData = (string s)
{
    // Your code here


    // The passed parameter s does not include the opening <![CDATA[

    // nor closing ]]>

    //

    // This is a a closure, so code here may reference

    // variables which are outside of this scope

};

@property void  onComment(Handler handler);

Register a handler which will be called whenever a comment is encountered.

Examples
// Call this function whenever a comment is encountered

onComment = (string s)
{
    // Your code here


    // The passed parameter s does not include the opening <!-- nor

    // closing -->

    //

    // This is a a closure, so code here may reference

    // variables which are outside of this scope

};

@property void  onPI(Handler handler);

Register a handler which will be called whenever a processing instruction is encountered.

Examples
// Call this function whenever a processing instruction is encountered

onPI = (string s)
{
    // Your code here


    // The passed parameter s does not include the opening <? nor

    // closing ?>

    //

    // This is a a closure, so code here may reference

    // variables which are outside of this scope

};

@property void  onXI(Handler handler);

Register a handler which will be called whenever an XML instruction is encountered.

Examples
// Call this function whenever an XML instruction is encountered

// (Note: XML instructions may only occur preceeding the root tag of a

// document).

onPI = (string s)
{
    // Your code here


    // The passed parameter s does not include the opening <! nor

    // closing >

    //

    // This is a a closure, so code here may reference

    // variables which are outside of this scope

};

void  parse();

Parse an XML element.

Parsing will continue until the end of the current element. Any items encountered for which a handler has been registered will invoke that handler.

Throws
various kinds of XMLException

const string  toString();

Returns that part of the element which has already been parsed


void  check(string s);

Check an entire XML document for well-formedness

Parameters
string s the document to be checked, passed as a string
Throws
CheckException if the document is not well formed

CheckException's toString() method will yield the complete heirarchy of parse failure (the XML equivalent of a stack trace), giving the line and column number of every failure at every level.

class  XMLException: object.Exception;

The base class for exceptions thrown by this module


class  CommentException: std.xml.XMLException;

Thrown during Comment constructor


class  CDataException: std.xml.XMLException;

Thrown during CData constructor


class  XIException: std.xml.XMLException;

Thrown during XMLInstruction constructor


class  PIException: std.xml.XMLException;

Thrown during ProcessingInstruction constructor


class  TextException: std.xml.XMLException;

Thrown during Text constructor


class  DecodeException: std.xml.XMLException;

Thrown during decode()


class  InvalidTypeException: std.xml.XMLException;

Thrown if comparing with wrong type


class  TagException: std.xml.XMLException;

Thrown when parsing for Tags


class  CheckException: std.xml.XMLException;

Thrown during check()


CheckException  err;

Parent in heirarchy


string  msg;

Name of production rule which failed to parse, or specific error message


size_t  line;

Line number at which parse failure occurred


size_t  column;

Column number at which parse failure occurred