diff options
author | Arseny Kapoulkine <arseny.kapoulkine@gmail.com> | 2014-02-11 06:45:27 +0000 |
---|---|---|
committer | Arseny Kapoulkine <arseny.kapoulkine@gmail.com> | 2014-02-11 06:45:27 +0000 |
commit | 47c15ad949eb6589ee14d208444b4e759a611143 (patch) | |
tree | 35822cba8d2d3c6e5384c960ff8ea503bf3cf235 /src/pugixml.hpp | |
parent | 5fa25a878aa472530cfa981d374d6e9fe4e12c7c (diff) |
Implement document fragment parsing.
Introduce a notable behavior change in default parsing mode: documents without a
document element node are now considered invalid. This is technically a breaking change,
however the amount of documents it affects is very small, all parsed data still persists,
and lack of this check results in very confusing behavior in a number of cases.
In order to be able to parse documents without an element node, a fragment parsing flag is
introduced.
Parsing a buffer in fragment mode treats the buffer as a fragment of a valid XML.
As a consequence, top-level PCDATA is added to the tree; additionally, there are no
restrictions on the number of nodes -- so documents without a document element are considered
valid.
Due to the way parsing works internally, load_buffer_inplace occasionally can not preserve
the document contents if it's parsed in a fragment mode. While unfortunate, this problem is
fundamental; since the use case is relatively obscure, hopefully documenting this shortcoming
will be enough.
git-svn-id: https://pugixml.googlecode.com/svn/trunk@980 99668b35-9821-0410-8761-19e4c4f06640
Diffstat (limited to 'src/pugixml.hpp')
-rw-r--r-- | src/pugixml.hpp | 8 |
1 files changed, 7 insertions, 1 deletions
diff --git a/src/pugixml.hpp b/src/pugixml.hpp index e19a4a3..e5009fe 100644 --- a/src/pugixml.hpp +++ b/src/pugixml.hpp @@ -151,6 +151,10 @@ namespace pugi // This flag is off by default; turning it on may result in slower parsing and more memory consumption. const unsigned int parse_ws_pcdata_single = 0x0400; + // This flag determines if plain character data that does not have a parent node is added to the DOM tree, and if an empty document + // is a valid document. This flag is off by default. + const unsigned int parse_fragment = 0x0800; + // The default parsing mode. // Elements, PCDATA and CDATA sections are added to the DOM tree, character/reference entities are expanded, // End-of-Line characters are normalized, attribute values are normalized using CDATA normalization rules. @@ -880,7 +884,9 @@ namespace pugi status_bad_end_element, // Parsing error occurred while parsing end element tag status_end_element_mismatch,// There was a mismatch of start-end tags (closing tag had incorrect name, some tag was not closed or there was an excessive closing tag) - status_append_invalid_root // Unable to append nodes since root type is not node_element or node_document (exclusive to xml_node::append_buffer) + status_append_invalid_root, // Unable to append nodes since root type is not node_element or node_document (exclusive to xml_node::append_buffer) + + status_no_document_element // Parsing resulted in a document without element nodes }; // Parsing result |