pugixml 0.9 manual | Overview | Installation | Document: Object model · Loading · Accessing · Modifying · Saving | XPath | API Reference | Table of Contents |
pugixml features an extensive interface for getting various types of data from the document and for traversing the document. This section provides documentation for all such functions that do not modify the tree except for XPath-related functions; see XPath for XPath reference. As discussed in C++ interface, there are two types of handles to tree data - xml_node and xml_attribute. The handles have special null (empty) values which propagate through various functions and thus are useful for writing more concise code; see this description for details. The documentation in this section will explicitly state the results of all function in case of null inputs.
The internal representation of the document is a tree, where each node has a list of child nodes (the order of children corresponds to their order in the XML representation), and additionally element nodes have a list of attributes, which is also ordered. Several functions are provided in order to let you get from one node in the tree to the other. These functions roughly correspond to the internal representation, and thus are usually building blocks for other methods of traversing (i.e. XPath traversals are based on these functions).
xml_node xml_node::parent() const; xml_node xml_node::first_child() const; xml_node xml_node::last_child() const; xml_node xml_node::next_sibling() const; xml_node xml_node::previous_sibling() const; xml_attribute xml_node::first_attribute() const; xml_attribute xml_node::last_attribute() const; xml_attribute xml_attribute::next_attribute() const; xml_attribute xml_attribute::previous_attribute() const;
parent
function returns the
node's parent; all nodes except the document have non-null parent. first_child
and last_child
return the first and last child of the node, respectively; note that only
document nodes and element nodes can have non-empty child node list. If node
has no children, both functions return null nodes. next_sibling
and previous_sibling
return
the node that's immediately to the right/left of this node in the children
list, respectively - for example, in <a/><b/><c/>
,
calling next_sibling
for
a handle that points to <b/>
results in a handle pointing to <c/>
,
and calling previous_sibling
results in handle pointing to <a/>
.
If node does not have next/previous sibling (this happens if it is the last/first
node in the list, respectively), the functions return null nodes. first_attribute
, last_attribute
,
next_attribute
and previous_attribute
functions behave the
same way as corresponding child node functions and allow to iterate through
attribute list in the same way.
Note | |
---|---|
Because of memory consumption reasons, attributes do not have a link to
their parent nodes. Thus there is no |
Calling any of the functions above on the null handle results in a null handle
- i.e. node.first_child().next_sibling()
returns the second child of node
,
and null handle if there is no children at all or if there is only one.
With these functions, you can iterate through all child nodes and display all attributes like this (samples/traverse_base.cpp):
for (pugi::xml_node tool = tools.first_child(); tool; tool = tool.next_sibling()) { std::cout << "Tool:"; for (pugi::xml_attribute attr = tool.first_attribute(); attr; attr = attr.next_attribute()) { std::cout << " " << attr.name() << "=" << attr.value(); } std::cout << std::endl; }
Apart from structural information (parent, child nodes, attributes), nodes
can have name and value, both of which are strings. Depending on node type,
name or value may be absent. node_document
nodes do not have name or value, node_element
and node_declaration
nodes
always have a name but never have a value, node_pcdata
,
node_cdata
and node_comment
nodes never have a name but
always have a value (it may be empty though), node_pi
nodes always have a name and a value (again, value may be empty). In order
to get node's name or value, you can use the following functions:
const char_t* xml_node::name() const; const char_t* xml_node::value() const;
In case node does not have a name or value or if the node handle is null, both functions return empty strings - they never return null pointers.
It is common to store data as text contents of some node - i.e. <node><description>This is a node</description></node>
.
In this case, <description>
node does not have a value, but instead
has a child of type node_pcdata
with value "This is a node"
.
pugixml provides two helper functions to parse such data:
const char_t* xml_node::child_value() const; const char_t* xml_node::child_value(const char_t* name) const;
child_value()
returns the value of the first child with type node_pcdata
or node_cdata
; child_value(name)
is
a simple wrapper for child(name).child_value()
.
For the above example, calling node.child_value("description")
and description.child_value()
will both produce string "This is a node"
. If there is no
child with relevant type, or if the handle is null, child_value
functions return empty string.
There is an example of using some of these functions at the end of the next section.
All attributes have name and value, both of which are strings (value may
be empty). There are two corresponding accessors, like for xml_node
:
const char_t* xml_attribute::name() const; const char_t* xml_attribute::value() const;
In case attribute handle is null, both functions return empty strings - they never return null pointers.
In many cases attribute values have types that are not strings - i.e. an attribute may always contain values that should be treated as integers, despite the fact that they are represented as strings in XML. pugixml provides several accessors that convert attribute value to some other type. The accessors are as follows:
int xml_attribute::as_int() const; unsigned int xml_attribute::as_uint() const; double xml_attribute::as_double() const; float xml_attribute::as_float() const; bool xml_attribute::as_bool() const;
as_int
, as_uint
,
as_double
and as_float
convert attribute values to numbers.
If attribute handle is null or attribute value is empty, 0
is returned. Otherwise, all leading whitespace characters are truncated,
and the remaining string is parsed as a decimal number (as_int
or as_uint
) or as a floating
point number in either decimal or scientific form (as_double
or as_float
). Any extra characters
are silently discarded, i.e. as_int
will return 1
for string "1abc"
.
In case the input string contains a number that is out of the target numeric range, the result is undefined.
Caution | |
---|---|
Number conversion functions depend on current C locale as set with |
as_bool
converts attribute
value to boolean as follows: if attribute handle is null or attribute value
is empty, false
is returned.
Otherwise, true
is returned
if first character is one of '1', 't',
'T', 'y', 'Y'
.
This means that strings like "true"
and "yes"
are recognized
as true
, while strings like
"false"
and "no"
are recognized as false
. For more complex matching you'll have
to write your own function.
Note | |
---|---|
There are no portable 64-bit types in C++, so there is no corresponding conversion function. If your platform has a 64-bit integer, you can easily write a conversion function yourself. |
This is an example of using these functions, along with node data retrieval ones (samples/traverse_base.cpp):
for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool")) { std::cout << "Tool " << tool.attribute("Filename").value(); std::cout << ": AllowRemote " << tool.attribute("AllowRemote").as_bool(); std::cout << ", Timeout " << tool.attribute("Timeout").as_int(); std::cout << ", Description '" << tool.child_value("Description") << "'\n"; }
Since a lot of document traversal consists of finding the node/attribute with the correct name, there are special functions for that purpose:
xml_node xml_node::child(const char_t* name) const; xml_attribute xml_node::attribute(const char_t* name) const; xml_node xml_node::next_sibling(const char_t* name) const; xml_node xml_node::previous_sibling(const char_t* name) const;
child
and attribute
return the first child/attribute with the specified name; next_sibling
and previous_sibling
return
the first sibling in the corresponding direction with the specified name.
All string comparisons are case-sensitive. In case the node handle is null
or there is no node/attribute with the specified name, null handle is returned.
child
and next_sibling
functions can be used together to loop through all child nodes with the desired
name like this:
for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool"))
Occasionally the needed node is specified not by the unique name but instead
by the value of some attribute; for example, it is common to have node collections
with each node having a unique id: <group><item id="1"/> <item id="2"/></group>
. There are two functions for finding
child nodes based on the attribute values:
xml_node xml_node::find_child_by_attribute(const char_t* name, const char_t* attr_name, const char_t* attr_value) const; xml_node xml_node::find_child_by_attribute(const char_t* attr_name, const char_t* attr_value) const;
The three-argument function returns the first child node with the specified name which has an attribute with the specified name/value; the two-argument function skips the name test for the node, which can be useful for searching in heterogeneous collections. If the node handle is null or if no node is found, null handle is returned. All string comparisons are case-sensitive.
In all of the above functions, all arguments have to be valid strings; passing null pointers results in undefined behavior.
This is an example of using these functions (samples/traverse_base.cpp):
std::cout << "Tool for *.dae generation: " << tools.find_child_by_attribute("Tool", "OutputFileMasks", "*.dae").attribute("Filename").value() << "\n"; for (pugi::xml_node tool = tools.child("Tool"); tool; tool = tool.next_sibling("Tool")) { std::cout << "Tool " << tool.attribute("Filename").value() << "\n"; }
Child node lists and attribute lists are simply double-linked lists; while
you can use previous_sibling
/next_sibling
and other such functions for
iteration, pugixml additionally provides node and attribute iterators, so
that you can treat nodes as containers of other nodes or attributes:
class xml_node_iterator; class xml_attribute_iterator; typedef xml_node_iterator xml_node::iterator; iterator xml_node::begin() const; iterator xml_node::end() const; typedef xml_attribute_iterator xml_node::attribute_iterator; attribute_iterator xml_node::attributes_begin() const; attribute_iterator xml_node::attributes_end() const;
begin
and attributes_begin
return iterators that point to the first node/attribute, respectively; end
and attributes_end
return past-the-end iterator for node/attribute list, respectively - this
iterator can't be dereferenced, but decrementing it results in an iterator
pointing to the last element in the list (except for empty lists, where decrementing
past-the-end iterator is not defined). Past-the-end iterator is commonly
used as a termination value for iteration loops (see sample below). If you
want to get an iterator that points to an existing handle, you can construct
the iterator with the handle as a single constructor argument, like so:
xml_node_iterator(node)
.
For xml_attribute_iterator
,
you'll have to provide both an attribute and its parent node.
begin
and end
return equal iterators if called on null node; such iterators can't be dereferenced.
attributes_begin
and attributes_end
behave the same way. For
correct iterator usage this means that child node/attribute collections of
null nodes appear to be empty.
Both types of iterators have bidirectional iterator semantics (i.e. they can be incremented and decremented, but efficient random access is not supported) and support all usual iterator operations - comparison, dereference, etc. The iterators are invalidated if the node/attribute objects they're pointing to are removed from the tree; adding nodes/attributes does not invalidate any iterators.
Here is an example of using iterators for document traversal (samples/traverse_iter.cpp):
for (pugi::xml_node_iterator it = tools.begin(); it != tools.end(); ++it) { std::cout << "Tool:"; for (pugi::xml_attribute_iterator ait = it->attributes_begin(); ait != it->attributes_end(); ++ait) { std::cout << " " << ait->name() << "=" << ait->value(); } std::cout << std::endl; }
Caution | |
---|---|
Node and attribute iterators are somewhere in the middle between const
and non-const iterators. While dereference operation yields a non-constant
reference to the object, so that you can use it for tree modification operations,
modifying this reference by assignment - i.e. passing iterators to a function
like |
The methods described above allow traversal of immediate children of some
node; if you want to do a deep tree traversal, you'll have to do it via a
recursive function or some equivalent method. However, pugixml provides a
helper for depth-first traversal of a subtree. In order to use it, you have
to implement xml_tree_walker
interface and to call traverse
function:
class xml_tree_walker { public: virtual bool begin(xml_node& node); virtual bool for_each(xml_node& node) = 0; virtual bool end(xml_node& node); int depth() const; }; bool xml_node::traverse(xml_tree_walker& walker);
The traversal is launched by calling traverse
function on traversal root and proceeds as follows:
begin
function
is called with traversal root as its argument.
for_each
function
is called for all nodes in the traversal subtree in depth first order,
excluding the traversal root. Node is passed as an argument.
end
function
is called with traversal root as its argument.
If begin
, end
or any of the for_each
calls
return false
, the traversal
is terminated and false
is returned
as the traversal result; otherwise, the traversal results in true
. Note that you don't have to override
begin
or end
functions; their default implementations return true
.
You can get the node's depth relative to the traversal root at any point
by calling depth
function.
It returns -1
if called from begin
/end
, and returns 0-based depth if called
from for_each
- depth is
0 for all children of the traversal root, 1 for all grandchildren and so
on.
This is an example of traversing tree hierarchy with xml_tree_walker (samples/traverse_walker.cpp):
struct simple_walker: pugi::xml_tree_walker { virtual bool for_each(pugi::xml_node& node) { for (int i = 0; i < depth(); ++i) std::cout << " "; // indentation std::cout << node_types[node.type()] << ": name='" << node.name() << "', value='" << node.value() << "'\n"; return true; // continue traversal } };
simple_walker walker; doc.traverse(walker);
While there are existing functions for getting a node/attribute with known
contents, they are often not sufficient for simple queries. As an alternative
to iterating manually through nodes/attributes until the needed one is found,
you can make a predicate and call one of find_
functions:
template <typename Predicate> xml_attribute xml_node::find_attribute(Predicate pred) const; template <typename Predicate> xml_node xml_node::find_child(Predicate pred) const; template <typename Predicate> xml_node xml_node::find_node(Predicate pred) const;
The predicate should be either a plain function or a function object which
accepts one argument of type xml_attribute
(for find_attribute
) or
xml_node
(for find_child
and find_node
),
and returns bool
. The predicate
is never called with null handle as an argument.
find_attribute
function iterates
through all attributes of the specified node, and returns the first attribute
for which predicate returned true
.
If predicate returned false
for all attributes or if there were no attributes (including the case where
the node is null), null attribute is returned.
find_child
function iterates
through all child nodes of the specified node, and returns the first node
for which predicate returned true
.
If predicate returned false
for all nodes or if there were no child nodes (including the case where the
node is null), null node is returned.
find_node
function performs
a depth-first traversal through the subtree of the specified node (excluding
the node itself), and returns the first node for which predicate returned
true
. If predicate returned
false
for all nodes or if subtree
was empty, null node is returned.
This is an example of using predicate-based functions (samples/traverse_predicate.cpp):
bool small_timeout(pugi::xml_node node) { return node.attribute("Timeout").as_int() < 20; } struct allow_remote_predicate { bool operator()(pugi::xml_attribute attr) const { return strcmp(attr.name(), "AllowRemote") == 0; } bool operator()(pugi::xml_node node) const { return node.attribute("AllowRemote").as_bool(); } };
// Find child via predicate (looks for direct children only) std::cout << tools.find_child(allow_remote_predicate()).attribute("Filename").value() << std::endl; // Find node via predicate (looks for all descendants in depth-first order) std::cout << doc.find_node(allow_remote_predicate()).attribute("Filename").value() << std::endl; // Find attribute via predicate std::cout << tools.last_child().find_attribute(allow_remote_predicate()).value() << std::endl; // We can use simple functions instead of function objects std::cout << tools.find_child(small_timeout).attribute("Filename").value() << std::endl;
If you need to get the document root of some node, you can use the following function:
xml_node xml_node::root() const;
This function returns the node with type node_document
,
which is the root node of the document the node belongs to (unless the node
is null, in which case null node is returned).
While pugixml supports complex XPath expressions, sometimes a simple path handling facility is needed. There are two functions, for getting node path and for converting path to a node:
string_t xml_node::path(char_t delimiter = '/') const; xml_node xml_node::first_element_by_path(const char_t* path, char_t delimiter = '/') const;
Node paths consist of node names, separated with a delimiter (which is /
by default); also paths can contain self
(.
) and parent (..
) pseudo-names, so that this is a valid
path: "../../foo/./bar"
.
path
returns the path to
the node from the document root, first_element_by_path
looks for a node represented by a given path; a path can be an absolute one
(absolute paths start with delimiter), in which case the rest of the path
is treated as document root relative, and relative to the given node. For
example, in the following document: <a><b><c/></b></a>
,
node <c/>
has path "a/b/c"
;
calling first_element_by_path
for document with path "a/b"
results in node <b/>
; calling first_element_by_path
for node <a/>
with path "../a/./b/../."
results in node <a/>
; calling first_element_by_path
with path "/a"
results
in node <a/>
for any node.
In case path component is ambiguous (if there are two nodes with given name),
the first one is selected; paths are not guaranteed to uniquely identify
nodes in a document. If any component of a path is not found, the result
of first_element_by_path
is null node; also first_element_by_path
returns null node for null nodes, in which case the path does not matter.
path
returns an empty string
for null nodes.
Note | |
---|---|
|
pugixml does not record row/column information for nodes upon parsing for efficiency reasons. However, if the node has not changed in a significant way since parsing (the name/value are not changed, and the node itself is the original one, i.e. it was not deleted from the tree and re-added later), it is possible to get the offset from the beginning of XML buffer:
ptrdiff_t xml_node::offset_debug() const;
If the offset is not available (this happens if the node is null, was not
originally parsed from a stream, or has changed in a significant way), the
function returns -1. Otherwise it returns the offset to node's data from
the beginning of XML buffer in pugi::char_t
units. For more information on parsing offsets, see parsing
error handling documentation.
pugixml 0.9 manual | Overview | Installation | Document: Object model · Loading · Accessing · Modifying · Saving | XPath | API Reference | Table of Contents |