pugixml 0.9 manual | Overview | Installation | Document: Object model · Loading · Accessing · Modifying · Saving | XPath | API Reference | Table of Contents
PrevUpHomeNext

Accessing document data

Basic traversal functions
Getting node data
Getting attribute data
Contents-based traversal functions
Traversing node/attribute lists via iterators
Recursive traversal with xml_tree_walker
Searching for nodes/attributes with predicates
Miscellaneous functions

pugixml features an extensive interface for getting various types of data from the document and for traversing the document. This section provides documentation for all such functions that do not modify the tree except for XPath-related functions; see XPath for XPath reference. As discussed in C++ interface, there are two types of handles to tree data - xml_node and xml_attribute. The handles have special null (empty) values which propagate through various functions and thus are useful for writing more concise code; see this description for details. The documentation in this section will explicitly state the results of all function in case of null inputs.

The internal representation of the document is a tree, where each node has a list of child nodes (the order of children corresponds to their order in the XML representation), and additionally element nodes have a list of attributes, which is also ordered. Several functions are provided in order to let you get from one node in the tree to the other. These functions roughly correspond to the internal representation, and thus are usually building blocks for other methods of traversing (i.e. XPath traversals are based on these functions).

xml_node xml_node::parent() const;
xml_node xml_node::first_child() const;
xml_node xml_node::last_child() const;
xml_node xml_node::next_sibling() const;
xml_node xml_node::previous_sibling() const;

xml_attribute xml_node::first_attribute() const;
xml_attribute xml_node::last_attribute() const;
xml_attribute xml_attribute::next_attribute() const;
xml_attribute xml_attribute::previous_attribute() const;

parent function returns the node's parent; all nodes except the document have non-null parent. first_child and last_child return the first and last child of the node, respectively; note that only document nodes and element nodes can have non-empty child node list. If node has no children, both functions return null nodes. next_sibling and previous_sibling return the node that's immediately to the right/left of this node in the children list, respectively - for example, in <a/><b/><c/>, calling next_sibling for a handle that points to <b/> results in a handle pointing to <c/>, and calling previous_sibling results in handle pointing to <a/>. If node does not have next/previous sibling (this happens if it is the last/first node in the list, respectively), the functions return null nodes. first_attribute, last_attribute, next_attribute and previous_attribute functions behave the same way as corresponding child node functions and allow to iterate through attribute list in the same way.

[Note] Note

Because of memory consumption reasons, attributes do not have a link to their parent nodes. Thus there is no xml_attribute::parent() function.

Calling any of the functions above on the null handle results in a null handle - i.e. node.first_child().next_sibling() returns the second child of node, and null handle if there is no children at all or if there is only one.

With these functions, you can iterate through all child nodes and display all attributes like this (samples/traverse_base.cpp):

for (pugi::xml_node tool = tools.first_child(); tool; tool = tool.next_sibling())
{
    std::cout << "Tool:";

    for (pugi::xml_attribute attr = tool.first_attribute(); attr; attr = attr.next_attribute())
    {
        std::cout << " " << attr.name() << "=" << attr.value();
    }

    std::cout << std::endl;
}

Apart from structural information (parent, child nodes, attributes), nodes can have name and value, both of which are strings. Depending on node type, name or value may be absent. node_document nodes do not have name or value, node_element and node_declaration nodes always have a name but never have a value, node_pcdata, node_cdata and node_comment nodes never have a name but always have a value (it may be empty though), node_pi nodes always have a name and a value (again, value may be empty). In order to get node's name or value, you can use the following functions:

const char_t* xml_node::name() const;
const char_t* xml_node::value() const;

In case node does not have a name or value or if the node handle is null, both functions return empty strings - they never return null pointers.

It is common to store data as text contents of some node - i.e. <node><description>This is a node</description></node>. In this case, <description> node does not have a value, but instead has a child of type node_pcdata with value "This is a node". pugixml provides two helper functions to parse such data:

const char_t* xml_node::child_value() const;
const char_t* xml_node::child_value(const char_t* name) const;

child_value() returns the value of the first child with type node_pcdata or node_cdata; child_value(name) is a simple wrapper for child(name).child_value(). For the above example, calling node.child_value("description") and description.child_value() will both produce string "This is a node". If there is no child with relevant type, or if the handle is null, child_value functions return empty string.

There is an example of using some of these functions at the end of the next section.

All attributes have name and value, both of which are strings (value may be empty). There are two corresponding accessors, like for xml_node:

const char_t* xml_attribute::name() const;
const char_t* xml_attribute::value() const;

In case attribute handle is null, both functions return empty strings - they never return null pointers.

In many cases attribute values have types that are not strings - i.e. an attribute may always contain values that should be treated as integers, despite the fact that they are represented as strings in XML. pugixml provides several accessors that convert attribute value to some other type. The accessors are as follows:

int xml_attribute::as_int() const;
unsigned int xml_attribute::as_uint() const;
double xml_attribute::as_double() const;
float xml_attribute::as_float() const;
bool xml_attribute::as_bool() const;

as_int, as_uint, as_double and as_float convert attribute values to numbers. If attribute handle is null or attribute value is empty, 0 is returned. Otherwise, all leading whitespace characters are truncated, and the remaining string is parsed as a decimal number (as_int or as_uint) or as a floating point number in either decimal or scientific form (as_double or as_float). Any extra characters are silently discarded, i.e. as_int will return 1 for string "1abc".

In case the input string contains a number that is out of the target numeric range, the result is undefined.

[Caution] Caution

Number conversion functions depend on current C locale as set with setlocale, so may return unexpected results if the locale is different from "C".

as_bool converts attribute value to boolean as follows: if attribute handle is null or attribute value is empty, false is returned. Otherwise, true is returned if first character is one of '1', 't', 'T', 'y', 'Y'. This means that strings like "true" and "yes" are recognized as true, while strings like "false" and "no" are recognized as false. For more complex matching you'll have to write your own function.

[Note] Note

There are no portable 64-bit types in C++, so there is no corresponding conversion function. If your platform has a 64-bit integer, you can easily write a conversion function yourself.

This is an example of using these functions, along with node data retrieval ones (samples/traverse_base.cpp):

for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
{
    std::cout << "Tool " << tool.attribute("Filename").value();
    std::cout << ": AllowRemote " << tool.attribute("AllowRemote").as_bool();
    std::cout << ", Timeout " << tool.attribute("Timeout").as_int();
    std::cout << ", Description '" << tool.child_value("Description") << "'\n";
}

Since a lot of document traversal consists of finding the node/attribute with the correct name, there are special functions for that purpose:

xml_node xml_node::child(const char_t* name) const;
xml_attribute xml_node::attribute(const char_t* name) const;
xml_node xml_node::next_sibling(const char_t* name) const;
xml_node xml_node::previous_sibling(const char_t* name) const;

child and attribute return the first child/attribute with the specified name; next_sibling and previous_sibling return the first sibling in the corresponding direction with the specified name. All string comparisons are case-sensitive. In case the node handle is null or there is no node/attribute with the specified name, null handle is returned.

child and next_sibling functions can be used together to loop through all child nodes with the desired name like this:

for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))

Occasionally the needed node is specified not by the unique name but instead by the value of some attribute; for example, it is common to have node collections with each node having a unique id: <group><item id="1"/> <item id="2"/></group>. There are two functions for finding child nodes based on the attribute values:

xml_node xml_node::find_child_by_attribute(const char_t* name, const char_t* attr_name, const char_t* attr_value) const;
xml_node xml_node::find_child_by_attribute(const char_t* attr_name, const char_t* attr_value) const;

The three-argument function returns the first child node with the specified name which has an attribute with the specified name/value; the two-argument function skips the name test for the node, which can be useful for searching in heterogeneous collections. If the node handle is null or if no node is found, null handle is returned. All string comparisons are case-sensitive.

In all of the above functions, all arguments have to be valid strings; passing null pointers results in undefined behavior.

This is an example of using these functions (samples/traverse_base.cpp):

std::cout << "Tool for *.dae generation: " << tools.find_child_by_attribute("Tool", "OutputFileMasks", "*.dae").attribute("Filename").value() << "\n";

for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
{
    std::cout << "Tool " << tool.attribute("Filename").value() << "\n";
}

Child node lists and attribute lists are simply double-linked lists; while you can use previous_sibling/next_sibling and other such functions for iteration, pugixml additionally provides node and attribute iterators, so that you can treat nodes as containers of other nodes or attributes:

class xml_node_iterator;
class xml_attribute_iterator;

typedef xml_node_iterator xml_node::iterator;
iterator xml_node::begin() const;
iterator xml_node::end() const;

typedef xml_attribute_iterator xml_node::attribute_iterator;
attribute_iterator xml_node::attributes_begin() const;
attribute_iterator xml_node::attributes_end() const;

begin and attributes_begin return iterators that point to the first node/attribute, respectively; end and attributes_end return past-the-end iterator for node/attribute list, respectively - this iterator can't be dereferenced, but decrementing it results in an iterator pointing to the last element in the list (except for empty lists, where decrementing past-the-end iterator is not defined). Past-the-end iterator is commonly used as a termination value for iteration loops (see sample below). If you want to get an iterator that points to an existing handle, you can construct the iterator with the handle as a single constructor argument, like so: xml_node_iterator(node). For xml_attribute_iterator, you'll have to provide both an attribute and its parent node.

begin and end return equal iterators if called on null node; such iterators can't be dereferenced. attributes_begin and attributes_end behave the same way. For correct iterator usage this means that child node/attribute collections of null nodes appear to be empty.

Both types of iterators have bidirectional iterator semantics (i.e. they can be incremented and decremented, but efficient random access is not supported) and support all usual iterator operations - comparison, dereference, etc. The iterators are invalidated if the node/attribute objects they're pointing to are removed from the tree; adding nodes/attributes does not invalidate any iterators.

Here is an example of using iterators for document traversal (samples/traverse_iter.cpp):

for (pugi::xml_node_iterator it = tools.begin(); it != tools.end(); ++it)
{
    std::cout << "Tool:";

    for (pugi::xml_attribute_iterator ait = it->attributes_begin(); ait != it->attributes_end(); ++ait)
    {
        std::cout << " " << ait->name() << "=" << ait->value();
    }

    std::cout << std::endl;
}

[Caution] Caution

Node and attribute iterators are somewhere in the middle between const and non-const iterators. While dereference operation yields a non-constant reference to the object, so that you can use it for tree modification operations, modifying this reference by assignment - i.e. passing iterators to a function like std::sort - will not give expected results, as assignment modifies local handle that's stored in the iterator.

The methods described above allow traversal of immediate children of some node; if you want to do a deep tree traversal, you'll have to do it via a recursive function or some equivalent method. However, pugixml provides a helper for depth-first traversal of a subtree. In order to use it, you have to implement xml_tree_walker interface and to call traverse function:

class xml_tree_walker
{
public:
    virtual bool begin(xml_node& node);
    virtual bool for_each(xml_node& node) = 0;
    virtual bool end(xml_node& node);

    int depth() const;
};

bool xml_node::traverse(xml_tree_walker& walker);

The traversal is launched by calling traverse function on traversal root and proceeds as follows:

  • First, begin function is called with traversal root as its argument.
  • Then, for_each function is called for all nodes in the traversal subtree in depth first order, excluding the traversal root. Node is passed as an argument.
  • Finally, end function is called with traversal root as its argument.

If begin, end or any of the for_each calls return false, the traversal is terminated and false is returned as the traversal result; otherwise, the traversal results in true. Note that you don't have to override begin or end functions; their default implementations return true.

You can get the node's depth relative to the traversal root at any point by calling depth function. It returns -1 if called from begin/end, and returns 0-based depth if called from for_each - depth is 0 for all children of the traversal root, 1 for all grandchildren and so on.

This is an example of traversing tree hierarchy with xml_tree_walker (samples/traverse_walker.cpp):

struct simple_walker: pugi::xml_tree_walker
{
    virtual bool for_each(pugi::xml_node& node)
    {
        for (int i = 0; i < depth(); ++i) std::cout << "  "; // indentation

        std::cout << node_types[node.type()] << ": name='" << node.name() << "', value='" << node.value() << "'\n";

        return true; // continue traversal
    }
};

simple_walker walker;
doc.traverse(walker);

While there are existing functions for getting a node/attribute with known contents, they are often not sufficient for simple queries. As an alternative to iterating manually through nodes/attributes until the needed one is found, you can make a predicate and call one of find_ functions:

template <typename Predicate> xml_attribute xml_node::find_attribute(Predicate pred) const;
template <typename Predicate> xml_node xml_node::find_child(Predicate pred) const;
template <typename Predicate> xml_node xml_node::find_node(Predicate pred) const;

The predicate should be either a plain function or a function object which accepts one argument of type xml_attribute (for find_attribute) or xml_node (for find_child and find_node), and returns bool. The predicate is never called with null handle as an argument.

find_attribute function iterates through all attributes of the specified node, and returns the first attribute for which predicate returned true. If predicate returned false for all attributes or if there were no attributes (including the case where the node is null), null attribute is returned.

find_child function iterates through all child nodes of the specified node, and returns the first node for which predicate returned true. If predicate returned false for all nodes or if there were no child nodes (including the case where the node is null), null node is returned.

find_node function performs a depth-first traversal through the subtree of the specified node (excluding the node itself), and returns the first node for which predicate returned true. If predicate returned false for all nodes or if subtree was empty, null node is returned.

This is an example of using predicate-based functions (samples/traverse_predicate.cpp):

bool small_timeout(pugi::xml_node node)
{
    return node.attribute("Timeout").as_int() < 20;
}

struct allow_remote_predicate
{
    bool operator()(pugi::xml_attribute attr) const
    {
        return strcmp(attr.name(), "AllowRemote") == 0;
    }

    bool operator()(pugi::xml_node node) const
    {
        return node.attribute("AllowRemote").as_bool();
    }
};

// Find child via predicate (looks for direct children only)
std::cout << tools.find_child(allow_remote_predicate()).attribute("Filename").value() << std::endl;

// Find node via predicate (looks for all descendants in depth-first order)
std::cout << doc.find_node(allow_remote_predicate()).attribute("Filename").value() << std::endl;

// Find attribute via predicate
std::cout << tools.last_child().find_attribute(allow_remote_predicate()).value() << std::endl;

// We can use simple functions instead of function objects
std::cout << tools.find_child(small_timeout).attribute("Filename").value() << std::endl;

If you need to get the document root of some node, you can use the following function:

xml_node xml_node::root() const;

This function returns the node with type node_document, which is the root node of the document the node belongs to (unless the node is null, in which case null node is returned).

While pugixml supports complex XPath expressions, sometimes a simple path handling facility is needed. There are two functions, for getting node path and for converting path to a node:

string_t xml_node::path(char_t delimiter = '/') const;
xml_node xml_node::first_element_by_path(const char_t* path, char_t delimiter = '/') const;

Node paths consist of node names, separated with a delimiter (which is / by default); also paths can contain self (.) and parent (..) pseudo-names, so that this is a valid path: "../../foo/./bar". path returns the path to the node from the document root, first_element_by_path looks for a node represented by a given path; a path can be an absolute one (absolute paths start with delimiter), in which case the rest of the path is treated as document root relative, and relative to the given node. For example, in the following document: <a><b><c/></b></a>, node <c/> has path "a/b/c"; calling first_element_by_path for document with path "a/b" results in node <b/>; calling first_element_by_path for node <a/> with path "../a/./b/../." results in node <a/>; calling first_element_by_path with path "/a" results in node <a/> for any node.

In case path component is ambiguous (if there are two nodes with given name), the first one is selected; paths are not guaranteed to uniquely identify nodes in a document. If any component of a path is not found, the result of first_element_by_path is null node; also first_element_by_path returns null node for null nodes, in which case the path does not matter. path returns an empty string for null nodes.

[Note] Note

path function returns the result as STL string, and thus is not available if PUGIXML_NO_STL is defined.

pugixml does not record row/column information for nodes upon parsing for efficiency reasons. However, if the node has not changed in a significant way since parsing (the name/value are not changed, and the node itself is the original one, i.e. it was not deleted from the tree and re-added later), it is possible to get the offset from the beginning of XML buffer:

ptrdiff_t xml_node::offset_debug() const;

If the offset is not available (this happens if the node is null, was not originally parsed from a stream, or has changed in a significant way), the function returns -1. Otherwise it returns the offset to node's data from the beginning of XML buffer in pugi::char_t units. For more information on parsing offsets, see parsing error handling documentation.


pugixml 0.9 manual | Overview | Installation | Document: Object model · Loading · Accessing · Modifying · Saving | XPath | API Reference | Table of Contents
PrevUpHomeNext