Accessing document data

+ Accessing document data +

Basic traversal functions
Getting node data
Getting attribute data
Contents-based traversal functions
Traversing node/attribute lists + via iterators
Recursive traversal with xml_tree_walker
Searching for nodes/attributes + with predicates
Miscellaneous functions

+ pugixml features an extensive interface for getting various types of data from + the document and for traversing the document. This section provides documentation + for all such functions that do not modify the tree except for XPath-related + functions; see XPath for XPath reference. As discussed in C++ interface, + there are two types of handles to tree data - xml_node + and xml_attribute. The handles have special + null (empty) values which propagate through various functions and thus are + useful for writing more concise code; see this description + for details. The documentation in this section will explicitly state the results + of all function in case of null inputs. +

+ Basic traversal functions +

+ The internal representation of the document is a tree, where each node has + a list of child nodes (the order of children corresponds to their order in + the XML representation), and additionally element nodes have a list of attributes, + which is also ordered. Several functions are provided in order to let you + get from one node in the tree to the other. These functions roughly correspond + to the internal representation, and thus are usually building blocks for + other methods of traversing (i.e. XPath traversals are based on these functions). +

xml_node xml_node::parent() const;
+xml_node xml_node::first_child() const;
+xml_node xml_node::last_child() const;
+xml_node xml_node::next_sibling() const;
+xml_node xml_node::previous_sibling() const;
+
+xml_attribute xml_node::first_attribute() const;
+xml_attribute xml_node::last_attribute() const;
+xml_attribute xml_attribute::next_attribute() const;
+xml_attribute xml_attribute::previous_attribute() const;
+

+ parent function returns the + node's parent; all nodes except the document have non-null parent. first_child and last_child + return the first and last child of the node, respectively; note that only + document nodes and element nodes can have non-empty child node list. If node + has no children, both functions return null nodes. next_sibling + and previous_sibling return + the node that's immediately to the right/left of this node in the children + list, respectively - for example, in <a/><c/>, + calling next_sibling for + a handle that points to  + results in a handle pointing to <c/>, + and calling previous_sibling + results in handle pointing to <a/>. + If node does not have next/previous sibling (this happens if it is the last/first + node in the list, respectively), the functions return null nodes. first_attribute, last_attribute, + next_attribute and previous_attribute functions behave the + same way as corresponding child node functions and allow to iterate through + attribute list in the same way. +

+ + + + + +

	Note
	+ Because of memory consumption reasons, attributes do not have a link to + their parent nodes. Thus there is no `xml_attribute::parent()` function. +

+ Calling any of the functions above on the null handle results in a null handle + - i.e. node.first_child().next_sibling() + returns the second child of node, + and null handle if there is no children at all or if there is only one. +

+ With these functions, you can iterate through all child nodes and display + all attributes like this (samples/traverse_base.cpp): +

+ +

for (pugi::xml_node tool = tools.first_child(); tool; tool = tool.next_sibling())
+{
+    std::cout << "Tool:";
+
+    for (pugi::xml_attribute attr = tool.first_attribute(); attr; attr = attr.next_attribute())
+    {
+        std::cout << " " << attr.name() << "=" << attr.value();
+    }
+
+    std::cout << std::endl;
+}
+

+ Getting node data +

+ Apart from structural information (parent, child nodes, attributes), nodes + can have name and value, both of which are strings. Depending on node type, + name or value may be absent. node_document + nodes do not have name or value, node_element + and node_declaration nodes + always have a name but never have a value, node_pcdata, + node_cdata and node_comment nodes never have a name but + always have a value (it may be empty though), node_pi + nodes always have a name and a value (again, value may be empty). In order + to get node's name or value, you can use the following functions: +

const char_t* xml_node::name() const;
+const char_t* xml_node::value() const;
+

+ In case node does not have a name or value or if the node handle is null, + both functions return empty strings - they never return null pointers. +

+ It is common to store data as text contents of some node - i.e. <node><description>This is a node</description></node>. + In this case, <description> node does not have a value, but instead + has a child of type node_pcdata + with value "This is a node". + pugixml provides two helper functions to parse such data: +

const char_t* xml_node::child_value() const;
+const char_t* xml_node::child_value(const char_t* name) const;
+

+ child_value() + returns the value of the first child with type node_pcdata + or node_cdata; child_value(name) is + a simple wrapper for child(name).child_value(). + For the above example, calling node.child_value("description") and description.child_value() will both produce string "This is a node". If there is no + child with relevant type, or if the handle is null, child_value + functions return empty string. +

+ There is an example of using some of these functions at + the end of the next section. +

+ Getting attribute data +

+ All attributes have name and value, both of which are strings (value may + be empty). There are two corresponding accessors, like for xml_node: +

const char_t* xml_attribute::name() const;
+const char_t* xml_attribute::value() const;
+

+ In case attribute handle is null, both functions return empty strings - they + never return null pointers. +

+ In many cases attribute values have types that are not strings - i.e. an + attribute may always contain values that should be treated as integers, despite + the fact that they are represented as strings in XML. pugixml provides several + accessors that convert attribute value to some other type. The accessors + are as follows: +

int xml_attribute::as_int() const;
+unsigned int xml_attribute::as_uint() const;
+double xml_attribute::as_double() const;
+float xml_attribute::as_float() const;
+bool xml_attribute::as_bool() const;
+

+ as_int, as_uint, + as_double and as_float convert attribute values to numbers. + If attribute handle is null or attribute value is empty, 0 + is returned. Otherwise, all leading whitespace characters are truncated, + and the remaining string is parsed as a decimal number (as_int + or as_uint) or as a floating + point number in either decimal or scientific form (as_double + or as_float). Any extra characters + are silently discarded, i.e. as_int + will return 1 for string "1abc". +

+ In case the input string contains a number that is out of the target numeric + range, the result is undefined. +

+ + + + + +

	Caution
	+ Number conversion functions depend on current C locale as set with `setlocale`, so may return unexpected results + if the locale is different from `"C"`. +

+ as_bool converts attribute + value to boolean as follows: if attribute handle is null or attribute value + is empty, false is returned. + Otherwise, true is returned + if first character is one of '1', 't', + 'T', 'y', 'Y'. + This means that strings like "true" + and "yes" are recognized + as true, while strings like + "false" and "no" are recognized as false. For more complex matching you'll have + to write your own function. +

+ + + + + +

	Note
	+ There are no portable 64-bit types in C++, so there is no corresponding + conversion function. If your platform has a 64-bit integer, you can easily + write a conversion function yourself. +

+ This is an example of using these functions, along with node data retrieval + ones (samples/traverse_base.cpp): +

+ +

for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
+{
+    std::cout << "Tool " << tool.attribute("Filename").value();
+    std::cout << ": AllowRemote " << tool.attribute("AllowRemote").as_bool();
+    std::cout << ", Timeout " << tool.attribute("Timeout").as_int();
+    std::cout << ", Description '" << tool.child_value("Description") << "'\n";
+}
+

+ Contents-based traversal functions +

+ Since a lot of document traversal consists of finding the node/attribute + with the correct name, there are special functions for that purpose: +

xml_node xml_node::child(const char_t* name) const;
+xml_attribute xml_node::attribute(const char_t* name) const;
+xml_node xml_node::next_sibling(const char_t* name) const;
+xml_node xml_node::previous_sibling(const char_t* name) const;
+

+ child and attribute + return the first child/attribute with the specified name; next_sibling + and previous_sibling return + the first sibling in the corresponding direction with the specified name. + All string comparisons are case-sensitive. In case the node handle is null + or there is no node/attribute with the specified name, null handle is returned. +

+ child and next_sibling + functions can be used together to loop through all child nodes with the desired + name like this: +

for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
+

+ Occasionally the needed node is specified not by the unique name but instead + by the value of some attribute; for example, it is common to have node collections + with each node having a unique id: <group><item id="1"/> <item id="2"/></group>. There are two functions for finding + child nodes based on the attribute values: +

xml_node xml_node::find_child_by_attribute(const char_t* name, const char_t* attr_name, const char_t* attr_value) const;
+xml_node xml_node::find_child_by_attribute(const char_t* attr_name, const char_t* attr_value) const;
+

+ The three-argument function returns the first child node with the specified + name which has an attribute with the specified name/value; the two-argument + function skips the name test for the node, which can be useful for searching + in heterogeneous collections. If the node handle is null or if no node is + found, null handle is returned. All string comparisons are case-sensitive. +

+ In all of the above functions, all arguments have to be valid strings; passing + null pointers results in undefined behavior. +

+ This is an example of using these functions (samples/traverse_base.cpp): +

+ +

std::cout << "Tool for *.dae generation: " << tools.find_child_by_attribute("Tool", "OutputFileMasks", "*.dae").attribute("Filename").value() << "\n";
+
+for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
+{
+    std::cout << "Tool " << tool.attribute("Filename").value() << "\n";
+}
+

+ Traversing node/attribute lists + via iterators +

+ Child node lists and attribute lists are simply double-linked lists; while + you can use previous_sibling/next_sibling and other such functions for + iteration, pugixml additionally provides node and attribute iterators, so + that you can treat nodes as containers of other nodes or attributes: +

class xml_node_iterator;
+class xml_attribute_iterator;
+
+typedef xml_node_iterator xml_node::iterator;
+iterator xml_node::begin() const;
+iterator xml_node::end() const;
+
+typedef xml_attribute_iterator xml_node::attribute_iterator;
+attribute_iterator xml_node::attributes_begin() const;
+attribute_iterator xml_node::attributes_end() const;
+

+ begin and attributes_begin + return iterators that point to the first node/attribute, respectively; end and attributes_end + return past-the-end iterator for node/attribute list, respectively - this + iterator can't be dereferenced, but decrementing it results in an iterator + pointing to the last element in the list (except for empty lists, where decrementing + past-the-end iterator is not defined). Past-the-end iterator is commonly + used as a termination value for iteration loops (see sample below). If you + want to get an iterator that points to an existing handle, you can construct + the iterator with the handle as a single constructor argument, like so: + xml_node_iterator(node). + For xml_attribute_iterator, + you'll have to provide both an attribute and its parent node. +

+ begin and end + return equal iterators if called on null node; such iterators can't be dereferenced. + attributes_begin and attributes_end behave the same way. For + correct iterator usage this means that child node/attribute collections of + null nodes appear to be empty. +

+ Both types of iterators have bidirectional iterator semantics (i.e. they + can be incremented and decremented, but efficient random access is not supported) + and support all usual iterator operations - comparison, dereference, etc. + The iterators are invalidated if the node/attribute objects they're pointing + to are removed from the tree; adding nodes/attributes does not invalidate + any iterators. +

+ Here is an example of using iterators for document traversal (samples/traverse_iter.cpp): +

+ +

for (pugi::xml_node_iterator it = tools.begin(); it != tools.end(); ++it)
+{
+    std::cout << "Tool:";
+
+    for (pugi::xml_attribute_iterator ait = it->attributes_begin(); ait != it->attributes_end(); ++ait)
+    {
+        std::cout << " " << ait->name() << "=" << ait->value();
+    }
+
+    std::cout << std::endl;
+}
+

+ + + + + +

Caution

+ Node and attribute iterators are somewhere in the middle between const + and non-const iterators. While dereference operation yields a non-constant + reference to the object, so that you can use it for tree modification operations, + modifying this reference by assignment - i.e. passing iterators to a function + like std::sort - will not give expected results, + as assignment modifies local handle that's stored in the iterator. +

+ Recursive traversal with xml_tree_walker +

+ The methods described above allow traversal of immediate children of some + node; if you want to do a deep tree traversal, you'll have to do it via a + recursive function or some equivalent method. However, pugixml provides a + helper for depth-first traversal of a subtree. In order to use it, you have + to implement xml_tree_walker + interface and to call traverse + function: +

class xml_tree_walker
+{
+public:
+    virtual bool begin(xml_node& node);
+    virtual bool for_each(xml_node& node) = 0;
+    virtual bool end(xml_node& node);
+
+    int depth() const;
+};
+
+bool xml_node::traverse(xml_tree_walker& walker);
+

+ The traversal is launched by calling traverse + function on traversal root and proceeds as follows: +

+ First, begin function + is called with traversal root as its argument. +
+ Then, for_each function + is called for all nodes in the traversal subtree in depth first order, + excluding the traversal root. Node is passed as an argument. +
+ Finally, end function + is called with traversal root as its argument. +

+ If begin, end + or any of the for_each calls + return false, the traversal + is terminated and false is returned + as the traversal result; otherwise, the traversal results in true. Note that you don't have to override + begin or end + functions; their default implementations return true. +

+ You can get the node's depth relative to the traversal root at any point + by calling depth function. + It returns -1 + if called from begin/end, and returns 0-based depth if called + from for_each - depth is + 0 for all children of the traversal root, 1 for all grandchildren and so + on. +

+ This is an example of traversing tree hierarchy with xml_tree_walker (samples/traverse_walker.cpp): +

+ +

struct simple_walker: pugi::xml_tree_walker
+{
+    virtual bool for_each(pugi::xml_node& node)
+    {
+        for (int i = 0; i < depth(); ++i) std::cout << "  "; // indentation
+
+        std::cout << node_types[node.type()] << ": name='" << node.name() << "', value='" << node.value() << "'\n";
+
+        return true; // continue traversal
+    }
+};
+

+ +

simple_walker walker;
+doc.traverse(walker);
+

+ Searching for nodes/attributes + with predicates +

+ While there are existing functions for getting a node/attribute with known + contents, they are often not sufficient for simple queries. As an alternative + to iterating manually through nodes/attributes until the needed one is found, + you can make a predicate and call one of find_ + functions: +

template <typename Predicate> xml_attribute xml_node::find_attribute(Predicate pred) const;
+template <typename Predicate> xml_node xml_node::find_child(Predicate pred) const;
+template <typename Predicate> xml_node xml_node::find_node(Predicate pred) const;
+

+ The predicate should be either a plain function or a function object which + accepts one argument of type xml_attribute + (for find_attribute) or + xml_node (for find_child and find_node), + and returns bool. The predicate + is never called with null handle as an argument. +

+ find_attribute function iterates + through all attributes of the specified node, and returns the first attribute + for which predicate returned true. + If predicate returned false + for all attributes or if there were no attributes (including the case where + the node is null), null attribute is returned. +

+ find_child function iterates + through all child nodes of the specified node, and returns the first node + for which predicate returned true. + If predicate returned false + for all nodes or if there were no child nodes (including the case where the + node is null), null node is returned. +

+ find_node function performs + a depth-first traversal through the subtree of the specified node (excluding + the node itself), and returns the first node for which predicate returned + true. If predicate returned + false for all nodes or if subtree + was empty, null node is returned. +

+ This is an example of using predicate-based functions (samples/traverse_predicate.cpp): +

+ +

bool small_timeout(pugi::xml_node node)
+{
+    return node.attribute("Timeout").as_int() < 20;
+}
+
+struct allow_remote_predicate
+{
+    bool operator()(pugi::xml_attribute attr) const
+    {
+        return strcmp(attr.name(), "AllowRemote") == 0;
+    }
+
+    bool operator()(pugi::xml_node node) const
+    {
+        return node.attribute("AllowRemote").as_bool();
+    }
+};
+

+ +

// Find child via predicate (looks for direct children only)
+std::cout << tools.find_child(allow_remote_predicate()).attribute("Filename").value() << std::endl;
+
+// Find node via predicate (looks for all descendants in depth-first order)
+std::cout << doc.find_node(allow_remote_predicate()).attribute("Filename").value() << std::endl;
+
+// Find attribute via predicate
+std::cout << tools.last_child().find_attribute(allow_remote_predicate()).value() << std::endl;
+
+// We can use simple functions instead of function objects
+std::cout << tools.find_child(small_timeout).attribute("Filename").value() << std::endl;
+

+ Miscellaneous functions +

+ If you need to get the document root of some node, you can use the following + function: +

xml_node xml_node::root() const;
+

+ This function returns the node with type node_document, + which is the root node of the document the node belongs to (unless the node + is null, in which case null node is returned). Currently this function has + logarithmic complexity, since it simply finds such ancestor of the given + node which itself has no parent. +

+ While pugixml supports complex XPath expressions, sometimes a simple path + handling facility is needed. There are two functions, for getting node path + and for converting path to a node: +

string_t xml_node::path(char_t delimiter = '/') const;
+xml_node xml_node::first_element_by_path(const char_t* path, char_t delimiter = '/') const;
+

+ Node paths consist of node names, separated with a delimiter (which is / by default); also paths can contain self + (.) and parent (..) pseudo-names, so that this is a valid + path: "../../foo/./bar". + path returns the path to + the node from the document root, first_element_by_path + looks for a node represented by a given path; a path can be an absolute one + (absolute paths start with delimiter), in which case the rest of the path + is treated as document root relative, and relative to the given node. For + example, in the following document: <a><c/></a>, + node <c/> has path "a/b/c"; + calling first_element_by_path + for document with path "a/b" + results in node ; calling first_element_by_path + for node <a/> with path "../a/./b/../." + results in node <a/>; calling first_element_by_path + with path "/a" results + in node <a/> for any node. +

+ In case path component is ambiguous (if there are two nodes with given name), + the first one is selected; paths are not guaranteed to uniquely identify + nodes in a document. If any component of a path is not found, the result + of first_element_by_path + is null node; also first_element_by_path + returns null node for null nodes, in which case the path does not matter. + path returns an empty string + for null nodes. +

+ + + + + +

	Note
	+ `path` function returns the + result as STL string, and thus is not available if `PUGIXML_NO_STL` + is defined. +

+ pugixml does not record row/column information for nodes upon parsing for + efficiency reasons. However, if the node has not changed in a significant + way since parsing (the name/value are not changed, and the node itself is + the original one, i.e. it was not deleted from the tree and re-added later), + it is possible to get the offset from the beginning of XML buffer: +

ptrdiff_t xml_node::offset_debug() const;
+

+ If the offset is not available (this happens if the node is null, was not + originally parsed from a stream, or has changed in a significant way), the + function returns -1. Otherwise it returns the offset to node's data from + the beginning of XML buffer in pugi::char_t + units. For more information on parsing offsets, see parsing + error handling documentation. +